What every web developer needs to know about HTTP Cookies!!!
With so much information scattered on web about the HTTP cookies (or simply cookies), this article is an attempt to bring all of that into one cohesive tutorial. This article should be enough for most web developers to gain intermediate-to-advanced level of understanding about cookies.
It is assumed that you are familiar with the basics of HTTP and web development in general.
In essence, a cookie is a small piece of data that is —
- Sent by web server to the user’s web browser.
- Data in a cookie is simple textual data. It is not binary data.
- Cookie is stored by browser on user’s computer (on disk).
- Website can only read its own cookie. It cannot read cookie of the other website/domain. This security is ensured by browser.
- Cookie is not shared among different browsers. Means, one browser cannot read the cookie stored by another browser even if it is same domain.
- As per HTTP protocol, size of the cookies cannot be greater than 4KB.
- Number of cookies sent by web server for a given domain cannot be unlimited. The restriction is put by browser to avoid disk space consumption. It is about 20–25 cookies per domain.
There are three main reasons why we need cookies:
- Authentication (session management)
- User tracking
- Personalization (theme, language selection, etc.)
Web is built on top of HTTP which in turn is build using TCP. Even though, TCP is stateful (connection-oriented) protocol, HTTP is stateless protocol. In networking, it is perfectly fine to build stateless protocols on top of stateful protocols or vice-a-versa. Stateless protocols do not maintain any information about the previous communication. HTTP being stateless, HTTP server aka. web server doesn’t maintain any information about the previous request. Thus web server cannot distinguish if the two requests are coming from same browser/machine or multiple browsers.
After realizing the power and simplicity of Web, which was originally meant to serve documents linked via hyperlinks, it evolved into platform. People began building complex e-commerce websites. Somehow a mechanism was required to remember user identity and data — the same underlying problem of maintaining a state — how to make server understand that two HTTP requests are coming from same user/browser.
For example, in case of e-commerce website, user can choose one item on home page, add it to the cart and navigate to other page to choose another item. But the moment user navigates to next page, any information about that user or his selection is vanished. HTTP simply cannot retain that due to its stateless nature.
Thus the clever mechanism of cookies was invented. Whenever, a user visits a website, web server would send a cookie along with a HTML document. Browser would then send this cookie in every subsequent request to web server and create a sort of session between user and website.
Of course, there were other solutions like generating some token on first visit, injecting it into page and making sure to pass that token to-and-fro on every request manually, using hidden form field or putting that token inside URL as part of path or query string. Compared to cookies, these solutions look very cumbersome, manual and error prone. Cookie are much more elegant, secure and reliable.
This is what happens when you enter your credentials and hit Log In button:
- Browser would send an HTTP request to web server pointed by www.facebook.com. It is typically a POST HTTP request containing password and email/phone.
- On arriving a request at web-server, server side code would verify username and password. If it is successful, the server would send new HTML page along with a cookie containing some sort of sessionID (basically a GUID or any identifier unique to server).
- This cookie would be sent as part of HTTP response using
- Browser upon receiving the request, would store this cookie on the disk for persistent storage.
- Now, if user navigates to any other page on facebook.com or open a new tab/window in a same browser, browser would automatically send this cookie as part of the request.
- Facebook server would read this cookie and determine its validity. Server typically maintains a map of all the cookies it has issued so far in some sort of in-memory data structure. If the sessionID is the key of this map, then its value is userID or some information to identify user for whom the given cookie was issued.
- Once the user is identified, web application/server will serve a dynamically created web page with contents tailored for that user. This page will have information specific to that user, e.g. in case of Facebook, he will see his name, profile picture, friends list, unique activity feed, etc.
To set a cookie, server must use
Set-Cookie header. In the below example, we are setting a cookie named username and its value as Harshal. You can also send multiple cookies by specifying
Set-Cookie header as many times:
Typical HTTP response envelope would look like following:
There are two types of cookies:
- Session Cookies
- Permanent Cookies
By default, cookie has a lifetime of browser window. When a browser is closed, the cookie is gone. It is deleted. Such a cookie is called as Session Cookie. You can also create a Permanent Cookie by specifying an expiry:
Set-Cookie: userid=1234; Expires:Sat, 30 Jan 2017;
You can further create a scope on cookies. Like
Expires, there are
Path directives. By default, browsers set the domain of the cookie to the host of the current document i.e. the domain name you see in the browser’s address bar. More on this later.
Path signifies the path of the URL. The default value for the
Path option is the path of the URL that sent the
Set-Cookie header. That is if browser receives
Set-Cookie header on path http://example.com/test, then cookie would be sent to server on following paths:
Other paths within a website will not receive the cookie. You can manually set
Path directive like:
Set-Cookie: id=123; Path=/custom-path
There are certain restrictions about cookies. These restrictions help to provide security and reliability to servers:
- Size: Each cookie can have maximum size of 4KB.
- Number: For each domain, number of cookies are restricted to certain number. This restriction is put by browsers and not the HTTP protocol.
- Domain: Web server can set cookies only for the domain that is pointing to that web server. It cannot set cookie for any other domain. You can use
Domaindirective for this. (Note: Rule do change when we talk about sub-domains, but more on that later)
As discussed earlier, cookies have a
Domain directive which indicates one or more domains for which the cookie should be sent. By default,
domain is set to the host name of the page setting the cookie.
Imagine a website https://google.com setting a following header:
So, browser will send the cookie with every subsequent request to https://google.com domain. Since, default value of the domain is used, browser will not send this cookie to any sub-domain of google.com. Thus cookie will not be sent by browser for requests to following domains:
However if https://google.com sends following header:
Set-Cookie: id=1234; Domain=google.com
Since, server has explicitly specified domain value, browser would send cookie for any sub-domain of https://*.google.com. As explained by Nicolas Zakas, browser performs a tail comparison of this value and the host name to which a request is sent (meaning it starts the comparison from the end of the string) and sends the corresponding
Cookie header when there’s a match. We can also conclude that —
Parent domain can set cookies for Sub-domain and Sub-domain can also set cookies for Parent domain.
Parent domain and sub-domain cookie relationship is very well explained in this Stack Overflow question:
You should also note that:
Two different domains can never share cookie via plain HTTP. If you wish to do that you would need some external IPC mechanism to help you with that.