I’ve come across a basic question in several of my interviews for a new job:

Describe the steps involved in requesting a web page.

Here’s my attempt at capturing those steps, without delving too deep into any particular one.

Bear in mind this deals with HTTP1.1, and we’re not too far from HTTP2.0 (homepage) - though the process shouldn’t be too different.

Timeline, taken from @igrigorik at https://docs.google.com/presentation/d/1MtDBNTH1g7CZzhwlJ1raEJagA8qM3uoV7ta6i66bO2M/present#slide=id.gc03305a_0106

User Interaction

User types URL / clicks bookmark, which is parsed by browser to determine the domain.

DNS Lookup

First step is figuring out the IP, sequentially moving toward more expensive lookups if the previous one fails. This request is generally smaller than one packet.

  1. Browser Cache - Browsers generally cache DNS records for some time, so a request doesn’t actually need to be sent
  2. OS Cache - browser makes a system call to the OS, which has its own cache
  3. Router Cache - router often has its own DNS cache
  4. ISP DNS Cache - request then queries the cache at the ISP, the final cache checked
  5. Recursive Search - ISP’s DNS server begins recursive search
    • Root nameserver, to determine authoritative nameserver for .COM domain
    • .COM top-level nameserver, to determine who is responsible for the domain
    • Domain nameserver to obtain address

"DNS in the real world". Licensed under Public domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:DNS_in_the_real_world.svg#mediaviewer/File:DNS_in_the_real_world.svg

Generally, the DNS server will have the .com nameservers in cache, especially for common domains.

This search ends once the query reaches the authoritative nameserver, or gives an error. If the packet is lost, the request fails or is reiterated.

These are forward lookups, which look up the IP based on the DNS. There may be multiple IP addresses associated with a domain (load balancing, geo IP, etc.), but it is far more common that multiple domains are hosted at a single IP (e.g. cloud hosting).

Many DNS services uses Anycast to achieve high availability and low latency for DNS lookups. Anycast is a routing technique where a single IP maps to multiple physical servers.

Browser makes connection

HTTP is built on top of TCP, which operates in a three step process. First, the browser (client) and server must properly complete the multi-step three-way handshake process, followed by data transmission and connection termination.

The TCP three-way handshake allows client and server to agree on packet sequence numbers (chosen randomly for security reasons), and a number of other variables. It involves the following steps:

  • SYN - Client picks a random sequence number x and sends a SYN packet, which may also include additional TCP flags and options.
  • SYN ACK - Server increments x by one, picks own random sequence number y, appends its own set of flags and options, and dispatches the response.
  • ACK - Client increments both x and y by one and completes the handshake by dispatching the last ACK packet in the handshake.

This process is more complicated when HTTPS is involved.

I recommend you go read about it without watering it down here. Here is a picture if you are lazy, taken from the above link:

TLS handshake

Browser Sends Request

Assuming the page is not served from the browser cache (determined by expiry date header, often set to the past), a GET request is issued to the server, which looks like this:

GET / HTTP/1.1
Host: www.mbates.com
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.44 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cookie: _ga= [...]

Several headers are sent in this payload, including:

  • User-Agent - browser identifies itself
  • Accept / Accept-Encoding - types of responses browser will accept
  • Connection - request to keep connection open
  • cookies - cookies (explained below) for domain requested

Cookies are key-value pairs which track the state of a web site in between page requests. These may include user information or settings, authentication tokens, etc. They are saved as text files on the client, and are sent to the server on every request.

In some cases, e.g. in sending a form, additional content will be sent as part of the payload.

Server Response

The server may process the request and return a response, or take a few other courses of action described 2 steps later. In this scenario, let’s imagine it sends a 301 Redirect.

A server may send a redirect, rather than directly responding for any number of reasons. These often revolve around wanting to serve content from one URL, for purposes of SEO and being cache-friendly.

Browser Handles non-2xx responses

The server may respond with an error, authentication required, etc. (4xx and 5xx responses), or with a 3xx. Commonly, it will return a 301 Redirect. This often occurs when you request a domain with the www hostname (e.g. http://mbates.com vs. http://www.mbates.com).

The browser follows the redirect and issues a new GET, with essentially the same headers.

In HTTP1.0 and prior, TCP connections would close after each request and response. However, opening these connections takes time, memory, and computation. Persist connections are the default in HTTP1.1, and you may see the relevant header Keep-Alive.

Alternatively, a header Connection: close will close the connection, e.g. if the client should not send any more requests through that connection.

Server Handles Request

There is a wide range in what may happen on the server. There is some processing that must occur regardless of the complexity of the page, but dynamic pages require further work.

The server may host multiple domains at a single IP, and delegate the request to be handled by the appropriate domain. This is why the Host header is required as of HTTP1.1.

The server software (e.g. Apache, IIS) then receives the request, decodes it, and executes a request handler. This handler (PHP, Ruby, etc.) reads the request, its parameters, and its cookies, potentially update data on the server, and generate the HTML response.

There is a lot that could happen here. This is arguably the primary domain of backend engineering.

Server sends response

Here is an example request for http://www.mbates.com:

HTTP/1.1 200 OK
Server: GitHub.com
Content-Type: text/html; charset=utf-8
Last-Modified: Tue, 09 Sep 2014 02:04:48 GMT
Expires: Wed, 10 Sep 2014 03:12:00 GMT
Cache-Control: max-age=600
Content-Encoding: gzip
Content-Length: 4243
Accept-Ranges: bytes
Date: Wed, 10 Sep 2014 03:02:10 GMT
Via: 1.1 varnish
Age: 10
Connection: keep-alive
X-Served-By: cache-sjc3121-SJC
X-Cache: HIT
X-Cache-Hits: 1
X-Timer: S1410318130.250401,VS0,VE0
Vary: Accept-Encoding

...[html response]...

This response includes its own headers, regarding caching, privacy, etc. but also including:

  • Content-Type (text/html) telle the browser to render the page as HTML
  • Content-Encoding (gzip - frequently used) tells the browser the page is gzipped

The server may send a header Transfer-Encoding: chunked, which signifies that the server will send the page in chunks.

Browser handles response

The browser handles the file based on the Content-Type header. For example, if the file is an image or data, it’s displayed or downloaded.

Assuming it is HTML, the browser begins parsing the HTML.

Additional Requests

Embedded Objects

Several inline tags, like CSS, scripts, and images from the HTML document are subsequently retrieved through a similar mechanism. When in the head, these requests may block page rendering. Scripts may be blocking (e.g., prevent downloading / parsing of CSS following script tag).

Notably different is these files are often static and can generally be cached by the browser. They often make the bulk of the site, and are frequently served by a CDN.

Additionally, these static resources may include an Etag (“entity tag”) header, which behaves like a version number and allows for web cache validation.

AJAX requests

Some content will be asynchronously retrieved, and do not block page rendering. These requests are generally initiated via javascript.

Content may be downloaded asynchronously to decrease the size of the initial payload.

Browser Rendering

Rendering begins before the whole document is downloaded, but once CSS is downloaded is when the interesting rendering occurs. This subject is key to the role of the front-end engineer in building a performant and responsive site.

There are four major flows involved in rendering a page. A good resource is What Every Frontend Developer Should Know About Webpage Rendering.

  1. Recalculate Style
    • Selector matching
    • Matching CSS classes to elements
    • Minimal concerns, but prevent cascades (e.g. programmatically adding class to the body, affecting children)
  2. Layout
    • Generate the render tree - things we need to paint (subset of DOM, with pseudo elements)
    • Determine geometry, element layout
    • Avoid layout thrashing
  3. Paint (expensive!)
    • Applying pixels to laid out elements
    • Avoid this stage when possible! E.g. changing colors, layout properties
  4. Composite
    • When elements are separated into distinct “compositor layers”, smush them back together
    • Elements can be promoted using will-change or transform layer hacks, described below
Avoiding Performance Bottlenecks

Only change properties which do not trigger layout or paint (relevant article, and another), and put the element in its own compositor layer:

Property CSS Declaration
position transform: translate(*x*px, *y*px)
scale transform: scale(*n*)
rotation transform: rotate(*n*deg)
skew transform: rotate(*n*deg)
matrix transform: rotate(*n*deg)
opacity opacity: *0*..*1*

Recently, Chrome and Firefox added support for will-change which allows the browser to make optimizations, e.g. to move elements to their own compositor layer. Sara Soueidan has a good writeup. Prior to this (and still) you’d see hacks involving -webkit-backface-visibility: hidden or -webkit-transform: translateZ(0);

Lots of steps! And this is only one (potentially simple) scenario.

There were several aspects of an HTTPS request which were glossed over.

We didn’t consider what happens with a POST (two requests) rather than a GET (one request).

Additional Resources

What really happens when you navigate to a URL