Requesting a URL
A common interview question for front-end engineers is to list all the steps involved in requesting a URL to rendering the page the user sees. Here's my attempt at a comprehensive list.
I’ve come across a basic question in several of my interviews for a new job:
Describe the steps involved in requesting a web page.
Here’s my attempt at capturing those steps, without delving too deep into any particular one.
User types URL / clicks bookmark, which is parsed by browser to determine the domain.
First step is figuring out the IP, sequentially moving toward more expensive lookups if the previous one fails. This request is generally smaller than one packet.
- Browser Cache - Browsers generally cache DNS records for some time, so a request doesn’t actually need to be sent
- OS Cache - browser makes a system call to the OS, which has its own cache
- Router Cache - router often has its own DNS cache
- ISP DNS Cache - request then queries the cache at the ISP, the final cache checked
- Recursive Search - ISP’s DNS server begins recursive search
- Root nameserver, to determine authoritative nameserver for .COM domain
- .COM top-level nameserver, to determine who is responsible for the domain
- Domain nameserver to obtain address
Generally, the DNS server will have the .com nameservers in cache, especially for common domains.
This search ends once the query reaches the authoritative nameserver, or gives an error. If the packet is lost, the request fails or is reiterated.
These are forward lookups, which look up the IP based on the DNS. There may be multiple IP addresses associated with a domain (load balancing, geo IP, etc.), but it is far more common that multiple domains are hosted at a single IP (e.g. cloud hosting).
Many DNS services uses Anycast to achieve high availability and low latency for DNS lookups. Anycast is a routing technique where a single IP maps to multiple physical servers.
Browser makes connection
HTTP is built on top of TCP, which operates in a three step process. First, the browser (client) and server must properly complete the multi-step three-way handshake process, followed by data transmission and connection termination.
The TCP three-way handshake allows client and server to agree on packet sequence numbers (chosen randomly for security reasons), and a number of other variables. It involves the following steps:
- SYN - Client picks a random sequence number x and sends a SYN packet, which may also include additional TCP flags and options.
- SYN ACK - Server increments x by one, picks own random sequence number y, appends its own set of flags and options, and dispatches the response.
- ACK - Client increments both x and y by one and completes the handshake by dispatching the last ACK packet in the handshake.
This process is more complicated when HTTPS is involved.
I recommend you go read about it without watering it down here. Here is a picture if you are lazy, taken from the above link:
Browser Sends Request
Assuming the page is not served from the browser cache (determined by expiry date header, often set to the past), a GET request is issued to the server, which looks like this:
GET / HTTP/1.1 Host: www.mbates.com Connection: keep-alive Pragma: no-cache Cache-Control: no-cache Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.44 Safari/537.36 Accept-Encoding: gzip,deflate,sdch Accept-Language: en-US,en;q=0.8 Cookie: _ga= [...]
Several headers are sent in this payload, including:
- User-Agent - browser identifies itself
- Accept / Accept-Encoding - types of responses browser will accept
- Connection - request to keep connection open
- cookies - cookies (explained below) for domain requested
Cookies are key-value pairs which track the state of a web site in between page requests. These may include user information or settings, authentication tokens, etc. They are saved as text files on the client, and are sent to the server on every request.
In some cases, e.g. in sending a form, additional content will be sent as part of the payload.
The server may process the request and return a response, or take a few other courses of action described 2 steps later. In this scenario, let’s imagine it sends a 301 Redirect.
A server may send a redirect, rather than directly responding for any number of reasons. These often revolve around wanting to serve content from one URL, for purposes of SEO and being cache-friendly.
Browser Handles non-2xx responses
The server may respond with an error, authentication required, etc. (4xx and 5xx responses), or with a 3xx. Commonly, it will return a 301 Redirect. This often occurs when you request a domain with the www hostname (e.g.
The browser follows the redirect and issues a new GET, with essentially the same headers.
In HTTP1.0 and prior, TCP connections would close after each request and response. However, opening these connections takes time, memory, and computation. Persist connections are the default in HTTP1.1, and you may see the relevant header
Alternatively, a header
Connection: close will close the connection, e.g. if the client should not send any more requests through that connection.
Server Handles Request
There is a wide range in what may happen on the server. There is some processing that must occur regardless of the complexity of the page, but dynamic pages require further work.
The server may host multiple domains at a single IP, and delegate the request to be handled by the appropriate domain. This is why the
Host header is required as of HTTP1.1.
The server software (e.g. Apache, IIS) then receives the request, decodes it, and executes a request handler. This handler (PHP, Ruby, etc.) reads the request, its parameters, and its cookies, potentially update data on the server, and generate the HTML response.
There is a lot that could happen here. This is arguably the primary domain of backend engineering.
Server sends response
Here is an example request for http://www.mbates.com:
HTTP/1.1 200 OK Server: GitHub.com Content-Type: text/html; charset=utf-8 Last-Modified: Tue, 09 Sep 2014 02:04:48 GMT Expires: Wed, 10 Sep 2014 03:12:00 GMT Cache-Control: max-age=600 Content-Encoding: gzip Content-Length: 4243 Accept-Ranges: bytes Date: Wed, 10 Sep 2014 03:02:10 GMT Via: 1.1 varnish Age: 10 Connection: keep-alive X-Served-By: cache-sjc3121-SJC X-Cache: HIT X-Cache-Hits: 1 X-Timer: S1410318130.250401,VS0,VE0 Vary: Accept-Encoding ...[html response]...
This response includes its own headers, regarding caching, privacy, etc. but also including:
text/html) telle the browser to render the page as HTML
gzip- frequently used) tells the browser the page is gzipped
The server may send a header
Transfer-Encoding: chunked, which signifies that the server will send the page in chunks.
Browser handles response
The browser handles the file based on the
Content-Type header. For example, if the file is an image or data, it’s displayed or downloaded.
Assuming it is HTML, the browser begins parsing the HTML.
Several inline tags, like CSS, scripts, and images from the HTML document are subsequently retrieved through a similar mechanism. When in the head, these requests may block page rendering. Scripts may be blocking (e.g., prevent downloading / parsing of CSS following script tag).
Notably different is these files are often static and can generally be cached by the browser. They often make the bulk of the site, and are frequently served by a CDN.
Additionally, these static resources may include an Etag (“entity tag”) header, which behaves like a version number and allows for web cache validation.
Content may be downloaded asynchronously to decrease the size of the initial payload.
Rendering begins before the whole document is downloaded, but once CSS is downloaded is when the interesting rendering occurs. This subject is key to the role of the front-end engineer in building a performant and responsive site.
There are four major flows involved in rendering a page. A good resource is What Every Frontend Developer Should Know About Webpage Rendering.
- Recalculate Style
- Selector matching
- Matching CSS classes to elements
- Minimal concerns, but prevent cascades (e.g. programmatically adding class to the body, affecting children)
- Generate the render tree - things we need to paint (subset of DOM, with pseudo elements)
- Determine geometry, element layout
- Avoid layout thrashing
- Paint (expensive!)
- Applying pixels to laid out elements
- Avoid this stage when possible! E.g. changing colors, layout properties
- When elements are separated into distinct “compositor layers”, smush them back together
- Elements can be promoted using
will-changeor transform layer hacks, described below
Avoiding Performance Bottlenecks
Recently, Chrome and Firefox added support for
will-change which allows the browser to make optimizations, e.g. to move elements to their own compositor layer. Sara Soueidan has a good writeup. Prior to this (and still) you’d see hacks involving
-webkit-backface-visibility: hidden or
Lots of steps! And this is only one (potentially simple) scenario.
There were several aspects of an HTTPS request which were glossed over.
We didn’t consider what happens with a POST (two requests) rather than a GET (one request).