8.7 Tracking
Tracking refers to the capability of a web server or other online system to create a record of websites visited by a user over time. Tracking can also involve the ability to include in this history all of the web pages visited at each website by the user and what links on each web page the user selects. This data collection technique, particularly if it involves sharing information with third parties that can consolidate tracking information on a single user from multiple sources, raises substantial privacy concerns.
Tracking is a complex and ever-changing technology. This section provides an overview of common tracking technologies, as well as common countermeasures.
Cookies
A cookie is a short block of text that is sent from a web server to a web browser when the browser accesses the server’s web page. The cookie is stored in the user space of the web browser user. The information stored in a cookie includes, at a minimum, the name of the cookie, a unique identification number, and its domain (URL). Typically, a website generates a unique ID number for each visitor and stores the ID number on each user’s machine using a cookie file. When a browser requests a page from the server that sent it a cookie, the browser sends a copy of that cookie back to the server. A website can retrieve only the information that it has placed on the browser machine. It cannot retrieve information from other cookie files.
The method of cookie retrieval is as follows:
A user types the URL of a website or clicks a link to a website.
The browser sends an HTML message requesting a connection. If there are any cookies from that website, the browser sends those cookies along with the URL.
The web server receives the cookies and can use any information stored in those cookies.
Cookies are a convenience for both the user and the web service. Here are some examples:
Saved logon: For example, if a user subscribes to a news site that is protected by a pay wall, the user must log on to access the site. This creates a cookie. Subsequently, the user can go to the news site without having to log on again because the website has the cookie information to say that the user has successfully logged on.
Aggregate visitor information: Cookies enable sites to determine how many visitors arrive, how many are new versus repeat visitors, and how often a visitor has visited.
User preferences: A site can store user preferences so that the site can have a customized appearance for each visitor.
Shopping carts: A cookie contains an ID and lets the site keep track of you as you add different things to your cart. Each item you add to your shopping cart is stored in the site’s database along with your ID value. When you check out, the site knows what is in your cart by retrieving all of your selections from the database. It would be impossible to implement a convenient shopping mechanism without cookies or something like them.
Note that most cookie information is stored on the web server. In most cases, all that is required in the user’s cookie is the unique ID. Note that a user’s removing a cookie does not necessarily remove potential personal data from the server.
Cookies can be characterized along three dimensions: identity, duration, and party (see Table 8.5). If a user visits a website without logging on to the website—such as a news site that does not have a logon requirement or a retail site for the purpose of browsing without shopping—the web server does not know the identity of the user. In this case, the only identifying information associated with the cookie is a unique ID assigned by the server; this is an unidentified cookie. If the user does log on to the site, then typically the web server will associate the user ID with the cookie, either by storing the user ID in the cookie or by maintaining state information at the website that associates the user ID with the cookie ID; this is an identified cookie.
TABLE 8.5 Characteristics of Cookies
Characteristic |
Types |
Identity |
Unidentified cookie: Does not contain user ID Identified cookie: Contains user ID entered at logon to website |
Duration |
Session cookie: Deleted when web session terminates. Persistent cookie: Deleted when time specified in cookie expires. |
Party |
First party: Contains URL of the website the user is visiting Third party: Contains URL of a third party |
With respect to duration, cookies are either session or persistent cookies. A session cookie remains on the user system only while the user has on open window to that website. When the user closes all windows connected to that website or closes the browser, the browser deletes the cookie. Session cookies are useful for temporarily maintaining information about a user, such as for a shopping cart or a chat session.
A persistent cookie includes an expiration date. This means that, for the cookie’s entire life span (which can be as long or as short as its creators want), its information will be transmitted to the server every time the user visits the website that it belongs to or every time the user views a resource belonging to that website from another website (e.g., an advertisement). A persistent identified cookie allows a user to revisit a website that requires a logon without having to go through the logon procedure again. A persistent unidentified cookie can also be useful to the web server, in that it allows the website to track the activity of a single user over multiple visits, even though that user is not identified. The site could use such anonymous information for a number of purposes. One purpose could be to improve the interface so that more frequently visited pages on the site are easier to find. Another possible use is price manipulation: If the same user visits a site multiple times and looks at the same item, this could indicate interest in the item but resistance to the price, and the site may lower the price for that user.
Cookies can also be first party or third party. A first-party cookie is set and read by the web server hosting the website the user is visiting. In this case, the domain portion of the cookie matches the domain that is shown in the web browser’s address bar.
A third-party cookie, however, belongs to a domain different from the one shown in the address bar. This sort of cookie typically appears when web pages feature content from external websites, such as banner advertisements. This type of cookie opens up the potential for tracking a user’s browsing history and is often used by advertisers in an effort to serve relevant advertisements to each user. A third-party cookie is placed on a user’s computer to track the user’s activity on different websites, creating a detailed profile of the user’s behavior. Third-party cookies can only track user activity through pages related to a site’s advertising; they cannot establish full surveillance capability through any website.
There are a number of mechanisms by which web servers can install third-party cookies on web browser machines, including requesting that the browser connect to the third-party website and the installation of a Java plugin.
Third-party cookies enable advertisers, analytics companies, and others to track user activity across multiple sites. For example, suppose a user visits nypost.com to get the news. This site will contain a number of advertisement images. Each advertiser can install a third-party cookie. If the user subsequently visits another site, such as an online clothing site, that has the same advertiser, the advertiser can retrieve its cookie and now knows the user is interested in the news plus in clothing, and possibly which types of clothing. Over time, the advertiser can build up a profile of the user, even if it does not know the identity of the user, and tailor ads for that user. Further, with sufficient information, the advertiser may be able to identify the user. This is where online tracking raises a privacy issue.
Various browsers offer a number of countermeasures to users, including blocking ads and blocking third-party cookies. Some of these techniques may disable certain sites for the user. In addition, third-party trackers are continually trying to come up with new ways to overcome the countermeasures.
Other Tracking Technologies
A flash cookie is a small file stored on a computer by a website that uses Adobe’s Flash Player technology. Flash cookies use Adobe’s Flash Player to store information about your online browsing activities. Flash cookies can be used to replace cookies used for tracking and advertising because they also can store your settings and preferences. Flash cookies are stored in a different location than HTTP cookies; thus users may not know what files to delete in order to eliminate them. In addition, they are stored so that different browsers and standalone Flash widgets installed on a given computer access the same persistent Flash cookies. Flash cookies are not controlled by the browser. Erasing HTTP cookies, clearing history, erasing the cache, or choosing a “delete private data” option within the browser does not affect flash cookies. As countermeasures to flash cookies, recent versions of Flash Player honor the privacy mode setting in modern browsers. In addition, some anti-malware software is able to detect and erase flash cookies.
Device fingerprinting can track devices over time, based on the browser’s configurations and settings. The fingerprint is made up from information that can be gathered passively from web browsers, such as their version, user agent, screen resolution, language, installed plugins, and installed fonts. Because each browser is unique, device fingerprinting can identify your device without using cookies. Because device fingerprinting uses the characteristics of your browser configuration to track you, deleting cookies won’t help. A countermeasure to fingerprinting is to make your device fingerprint anonymous. This approach is taken in a recent version of Safari on macOS, which makes all the Macs in the world look alike to trackers.
Do Not Track
All browsers allow the user to select a “do not track” option. This feature enables users to tell every website, their advertisers, and content providers that they do not want their browsing behavior tracked. When the “do not track” option is selected, the browser sends an identifier in the HTTP header field. Honoring this setting is voluntary; individual websites are not required to respect it. Websites that do honor this setting should automatically stop tracking the user’s behavior without any further action from the user.
Many websites simply ignore the “do not track” field. Websites that listen to the request react to the request in different ways. Some simply disable targeted advertising, showing you generic advertisements instead of ones targeted to your interests, but use the data for other purposes. Some may disable tracking by other websites but still track how you use their websites for their own purposes. Some may disable all tracking. There’s little agreement on how websites should react to “do not track.”