未加星标

Building a basic HTTP Server from scratch in Python

字体大小 | |
[开发(python) 所属分类 开发(python) | 发布者 店小二05 | 时间 2017 | 作者 红领巾 ] 0人收藏点击收藏

In its essence, the modern web is just text going back and forth between clients and servers. As developers, we use web frameworks to help us build strings to send to clients. Web frameworks abstract us from the underlying "textual reality" by parsing the incoming http requests (which is just a string), call the corresponding function, and build a string response (mostly by using templates). Clients parse those strings and do whatever they want from it.

This blog post shows how to build a bare-bones HTTP server from scratch and it is based on an exercise I gave to my MSc students. The only pre-requisite is a basic understanding of python 3. If you want to implement this as we go along, you can grab the starting applicationfrom this link. The final source code can be found in this gist .

HTTP is just text

HTTP is the protocol that browsers use to retrieve and push information to servers. In its essence HTTP is just text that follows a certain pattern: on the first line you specify which resource you want, then it follows the headers, and then you have a blank line that separates the headers from the body of the message (if any). Here's how you would retrieve the about page from a website:

GET /about.html HTTP/1.0
User-Agent: Mozilla/5.0

And here's how you could send some form data to a web server using the POST method:

POST /form.php HTTP/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 21
name=John&surname=Doe

You could simply copy-paste the text above and use something that allows you to send text over a network. Let's use telnet for that:

$> telnet google.com 80
Trying 84.91.171.170...
Connected to google.com.
Escape character is '^]'.
(1)
GET /about/ HTTP/1.0
(2)
HTTP/1.0 200 OK
Vary: Accept-Encoding
Content-Type: text/html
Date: Thu, 09 Feb 2017 16:41:37 GMT
Expires: Thu, 09 Feb 2017 16:41:37 GMT
Cache-Control: private, max-age=0
Last-Modified: Thu, 08 Dec 2016 01:00:57 GMT
X-Content-Type-Options: nosniff
Server: sffe
X-XSS-Protection: 1; mode=block
Accept-Ranges: none
<!DOCTYPE html>
<html class="google mmfb" lang="en">
<head>
...
</html>
Connection closed by foreign host.

and how to send a message to http://httpbin.org/ :

$> telnet httpbin.org 80
Trying 54.175.219.8...
Connected to httpbin.org.
Escape character is '^]'.
(1)
POST /post HTTP/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 21
name=John&surname=Doe
(2)
HTTP/1.1 200 OK
Server: nginx
Date: Thu, 09 Feb 2017 16:38:26 GMT
Content-Type: application/json
Content-Length: 328
Connection: close
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true
{
"form": {
"name": "John",
"surname": "Doe"
},
"headers": {
"Content-Length": "21",
"Content-Type": "application/x-www-form-urlencoded",
"Host": "httpbin.org"
},
"url": "http://httpbin.org/post"
}
Connection closed by foreign host.

You can see the HTTP requests (1) followed by the HTTP server responses in (2) . Same pattern on requests and responses, and text everywhere! More information about HTTP on the excellent High Performance Browser Networking book.

Sending HTTP responses using sockets

If you are planning to implement network applications from scratch, you'll probably need to work with network sockets . A socket is an abstraction provided by your operating system that allows you to send and receive bytes through a network. Here's our base implementation of an HTTP server:

"""
Implements a simple HTTP/1.0 Server
"""
import socket
# Define socket host and port
SERVER_HOST = '0.0.0.0'
SERVER_PORT = 8000
# Create socket
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server_socket.bind((SERVER_HOST, SERVER_PORT))
server_socket.listen(1)
print('Listening on port %s ...' % SERVER_PORT)
while True:
# Wait for client connections
client_connection, client_address = server_socket.accept()
# Get the client request
request = client_connection.recv(1024).decode()
print(request)
# Send HTTP response
response = 'HTTP/1.0 200 OK\n\nHello World'
client_connection.sendall(response.encode())
client_connection.close()
# Close socket
server_socket.close()

We start by defining the socket host and port. Then, we create the server_socket variable and set it to AF_INET (IPv4 address family) and SOCK_STREAM (TCP, basically). The rest of the code is there to set up the socket to listen for requests on the given (host, port). Check the Python docs on sockets for more info.

The rest of the code is self-explanatory: wait for client connections, read the request string, send an HTTP-formatted string with Hello World on the response body and close the client connection. We do this forever (or until someone presses Ctrl+C). Open your browser on http://localhost:8000/ and you should see your server's response:


Building a basic HTTP Server from scratch in Python

As an exercise, change the Hello World to <h1>Hello World</h1> and see what happens. And did you see the print(request) in the server's source code? Here's what it outputs:

Listening on port 8080 ...
GET / HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:51.0) Gecko/20100101 Firefox/51.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: pt-PT,pt;q=0.8,en;q=0.5,en-US;q=0.3
Accept-Encoding: gzip, deflate
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Cache-Control: max-age=0
Process finished with exit code 0

Yes, it's your browser requesting the root page ("/") of your server..

Index.html

By default, when a browser requests the root of a server (using an HTTP request such as GET / HTTP/1.0 ), we should return the index.html page. Let's change the code inside the while to always return the contents of htdocs/index.html :

while True:
# Wait for client connections
(...)
# Get the client request
(...)
# Get the content of htdocs/index.html
fin = open('htdocs/index.html')
content = fin.read()
fin.close()
# Send HTTP response
response = 'HTTP/1.0 200 OK\n\n' + content
client_connection.sendall(response.encode())
(...)

Basically, we read the contents of the file and add it to the response string as message body, instead of the previous Hello World . The index.html file is just a text file (inside the htdocs directory) with html content:

<html>
<head>
<title>Hello World</title>
</head>
<body>
<h1>Hello World!</h1>
<p>Welcome to the index.html web page..</p>
<p>Here's a link to <a href="ipsum.html">Ipsum</a></p>
</body>
</html>

Here's how it should look like in the browser:


Building a basic HTTP Server from scratch in Python

You can click on the link as many times as you want, but you server will always return the contents of index.html . It is programmed to behave that way!

Return other pages

So far our server returns the index.html page but we should allow it to return other pages. Technically, it means that we must parse the first line of the HTTP request (which is something like GET /ipsum.html HTTP/1.0 ), open the intended file and returns its contents. Here's the changes:

while True:
# Wait for client connections
(...)
# Get the client request
(...)
# Parse HTTP headers
headers = request.split('\n')
filename = headers[0].split()[1]
# Get the content of the file
if filename == '/':
filename = '/index.html'
fin = open('htdocs' + filename)
content = fin.read()
fin.close()
# Send HTTP response
response = 'HTTP/1.0 200 OK\n\n' + content
client_connection.sendall(response.encode())
(...)

We are basically extracting the filename from the request string, opening the file (always assuming they are inside the htdocs folder) and returning its content. You can also check that we correctly return index.html when the clients ask for the root resource ('/').

Here's the content of htdocs/ipsum.html :

<html>
<head>
<title>Ipsum</title>
</head>
<body>
<h1>Ipsum!</h1>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Pellentesque tincidunt libero diam, nec imperdiet libero
sodales quis. Nulla in pulvinar sem. Vivamus placerat
ullamcorper sagittis. Proin varius, erat sed egestas semper,
enim lectus viverra diam, id placerat est augue et turpis.
</p>
</body>
</html>

Try it on you own code, and see if you can open the index.html and ipsum.html files.

404 - Not found

This is what happens if we try to request a file that does not exist, such as http://localhost:8000/hello.html :

GET /hello.html HTTP/1.1
Traceback (most recent call last):
File "httpserver.py", line 36, in <module>
fin = open('htdocs' + filename)
FileNotFoundError: [Errno 2] No such file or directory: 'htdocs/hello.html'

We just need to catch the exception and return a 404 response:

while True:
# Wait for client connections
(...)
# Get the client request
(...)
# Parse HTTP headers
headers = request.split('\n')
filename = headers[0].split()[1]
# Get the content of the file
if filename == '/':
filename = '/index.html'
try:
fin = open('htdocs' + filename)
content = fin.read()
fin.close()
response = 'HTTP/1.0 200 OK\n\n' + content
except FileNotFoundError:
response = 'HTTP/1.0 404 NOT FOUND\n\nFile Not Found'
# Send HTTP response
client_connection.sendall(response.encode())
client_connection.close()

You can change the body of the Http 404 response to have personalized error messages..


Building a basic HTTP Server from scratch in Python

The entire source code for this example can be found in this gist .

本文开发(python)相关术语:python基础教程 python多线程 web开发工程师 软件开发工程师 软件开发流程

主题: PythonGMFirefoxIPv4
分页:12
转载请注明
本文标题:Building a basic HTTP Server from scratch in Python
本站链接:http://www.codesec.net/view/532211.html
分享请点击:


1.凡CodeSecTeam转载的文章,均出自其它媒体或其他官网介绍,目的在于传递更多的信息,并不代表本站赞同其观点和其真实性负责;
2.转载的文章仅代表原创作者观点,与本站无关。其原创性以及文中陈述文字和内容未经本站证实,本站对该文以及其中全部或者部分内容、文字的真实性、完整性、及时性,不作出任何保证或承若;
3.如本站转载稿涉及版权等问题,请作者及时联系本站,我们会及时处理。
登录后可拥有收藏文章、关注作者等权限...
技术大类 技术大类 | 开发(python) | 评论(0) | 阅读(53)