Making a GET request
In order to understand how the next few applications we are going to be developing work, we need to understand how the internet works.
A question you may get in an interview is the following:
What happens when you type a URL in browser and press Enter?
An extremely in-depth answer is available here.
However, a lot of those steps are not needed in our case, as for now we are only worried with the network side of things.
A more suitable example is here: http://edusagar.com/articles/view/70/What-happens-when-you-type-a-URL-in-browser.
But really the steps we are focusing in are just a couple of them. Let's have a go at explaining it while programming in Python.
Install the required library
The required library is called requests, so you can include that library in your requirements.txt file. I would recommend, as always, to find the current version of the library and using that in your requirements.txt file. At the time of writing, the latest version was 2.7.2, and my requirements.txt file looked like this:
requests==2.7.2
Import the library for use in your application
Before using a library in Python, we need to import it. First, create the Python file which will run your application. I tend to call this app.py or run.py.
Then, the amongst the first lines you should write some code to tell Python that this file is going to be using the requests library. Note: the line does not have to be at the top necessarily, but that is the most common place for it.
import requests
__author__ = "Your Name"
That first line now tells the Python interpreter that it needs to load the contents of the requests library for use when executing any code from this file.
Get the content of a page
All of the internet-related communications we are going to be doing in this course happen in a network layer that uses a specific protocol to transfer data: the HyperText Transfer Protocol. You'll know this as it often appears in front of URLs as http://...
This protocol just states how the transfer happens, and what data is transferred to some extent. The name itself tells us what type of data is transferred: hypertext, which is just another name for text that has links to other pieces of text. This protocol is as old as the internet, when pages were just bits of text with links to other pages.
Today, we use HTTP to transfer the pages themselves, images, videos, and everything in between.
Thus, the pages we will be writing are all going to be text. The browser (e.g. Google Chrome, Safari, or others) interpret that text (which is HTML and CSS code, mostly) to show us a renderized version of the page. The way the browser gets the content of the page is by requesting it from a server. A server is just a computer that has a program designed to answer these requests.
Thus, when a browser connects to the server, the server gives it the content of the page that it has asked for, and then the browser renders it and shows you a version that isn't all just plain text.
When we make a request using Python, we do not have a browser, so all we are going to be getting back is the text that makes up the page--the HTML and CSS code. Let's make our first request!
import requests
__author__ = "Your Name"
requests.get("http://google.com")
We've made a program that will ask one of the servers hosting http://google.com for the contents of the page!
However we are not storing that request anywhere, so there's not much we can do with it. Let's store the contents of the request in a variable, and then print the content of the page:
import requests
__author__ = "Your Name"
r = requests.get("http://google.com")
print(r.content)
If you run this program, you'll see an extremely long line be printed out to your console. That's the Google page!
A GET request
What we have done is requests.get("<page>"). This has replicated what a browser would do when asking Google for a page, and it's called a GET request.
A
GETrequest just retrieves something from a server. In this case, the page content.
There are many other types of requests that HTTP supports: POST, PUT, DELETE, and many more. Some a self-explanatory, whereas others are not:
| HTTP "verb" | Meaning |
|---|---|
| GET | Retrieve something from the server |
| POST | Create a new element in the server, using the data provided in the request |
| PUT | Update an existing element in the server, using the data provided in the request |
| DELETE | Remove an element in the server |
The URL
The GET request retrieves something from the server. But when we access http://google.com/ we are not retrieving anything specific. We aren't telling the server what we want. Are we?
It turns out we are, and the key is that last character of the URL: /.
The forward slash character by itself means "the root". In the case of web applications, the root of the application tends to be the home page. So we can access pages, and we are always accessing the root:
http://google.com/
http://schoolofcode.me/
http://facebook.com/
If the forward slash character is not at the end, then it is assumed, so sometimes you may not see it!
Accessing other parts of the page
If we wanted to access School of Code's courses, we could go into the courses "folder":
http://schoolofcode.me/courses
I read this as a folder of courses because it makes sense to then access a specific course, living inside that folder:
http://schoolofcode.me/courses/complete-python-web
What does HTTP look like?
When the browser does the request to view the courses page, really it's doing something like this:
HTTP/1.1 GET http://schoolofcode.me/courses
That is sent to the server, which knows that the browser is expecting something back: the content of the page.
Is it really this simple?
Well... No. Not really. There's a lot more going on. HTTP isn't magical, traffic still has to travel from one server to another, and for that we need a lot of things. Feel free to look these up: Physical Network Layer, Ethernet, TCP, IP, DNS. That will help understand a bit more of what is happening behind the scenes, although it is not required for this course.