3 minute read

a.k.a. HTTP, a protocol which allows us to make requests for data and resources, such as HTML documents.

It is a client-server structure protocol, this means that a request for data is initiated by the element that will receive it (the client), usually a Web browser.

A complete web page results from the union of different sub-documents , such as: a document that specifies the layout style of the web page (CSS), the text, images, videos, scripts, etc.

How does HTTP work?

The easiest way to explain how HTTP works is by describing how a web page is shown to an user:

  1. An user types example.com in the browser's address bar.
  2. The browser sends that request, the HTTP request, to the web server managing the example.com domain. Typically the client request says something like "Send me this file", but it can also be "Do you have this file?"
  3. The web server receives the HTTP request, finds the file (in our example, the home page of example.com, the index.html file), and sends a header first. This header lets the client know, through a status code, the search result.
  4. If the file has been found and the client has requested to receive it (and not only to know if it exists) the server sends, after the header, the message body (the requested content: in our example, the file index.html).
  5. The browser receives the file and opens it as a web page.

HTTP message structure

HTTP messages are in plain text which makes it more readable and easy to debug. This has the disadvantage of making them longer.

The messages have the following structure:

  1. Start line (ends with carriage return and a line break) with:
    • For requests: the action required by the server (request method) followed by the URL of the resource and the HTTP version that the client supports.
    • For responses: The version of the HTTP used followed by the response code (which indicates what happened with the request followed by the URL of the resource) and the phrase associated with said return.
  2. Message headers that end with a blank line. They are metadata. These headers give the protocol great flexibility
  3. Message body. It's optional. Its presence depends on the previous line of the message and the type of resource that the URL refers to. It typically has the data that is exchanged between client and server. For example, a request could contain certain data that you want to send to the server for processing. For a response it could include the data that the client has requested.

Architecture of HTTP-based systems

Requests are sent by an entity: the user agent (or a proxy at request).

Most of the time the user agent (client) is a Web browser, but it could be any other program, such as an automated script that explores the Web to acquire data.

Each request is sent to a server which manages and answers it.

Between each request and response there are several intermediaries, usually called proxies, which perform different functions such as gateways or caches.

http flow

Client or user-agent

The user agent is any tool that acts on behalf of the user. This function is performed in most cases by a web browser. There are exceptions

The client is always the one that initiates a communication (request).

Server

On the other end of the communication channel you can find the server, which “serves” the data that the client has requested.

Conceptually, a server is a single entity, although it can be made up of several elements which share the load of requests. It can also be programs which manage other devices (such as cache, databases, email servers, …) or programs that generate part or all of the document that has been requested.

Proxy

Between the client and the server there are different devices that handle HTTP messages.

Given the layered architecture of the Web most of these devices only manage these messages at the lower protocol levels (transport layer, network layer or physical layer), being transparent to the HTTP application communications layer.

Those devices that do operate by processing the application layer are known as proxies. These can be transparent, or not (modifying the requests that pass through them), and perform several functions:

  • caching (the cache can be public or private, like a browser's cache)
  • filtering (like an anti-virus, parental control, etc.)
  • request load balancing (allowing multiple servers to respond to the total load of requests they receive)
  • authentication (to control access to resources and data)
  • event log (to have a history of the events that occur)