- Cross-Site Tracking
- Advertising
- Advertising Risks
- Other Cross-Site Risks
- Summary
Other Cross-Site Risks
The preceding sections illustrated the risks associated with AdWords, AdSense, and DoubleClick, but cross-site information-disclosure risks do not end with advertising. The web functions thanks to hyperlinks and embedded content. Through these vectors, Google and other large companies can gather tremendous amounts of user information and help link clusters of information to individual users, companies, and other organizations. The following examples share one common characteristic: Each relies upon third-party web masters to embed tracking (or trackable) content into their web sites. Most web masters would not add such content arbitrarily; instead, they are enticed by at least a nominal incentive for cooperation.
Google Analytics
Google Analytics is a free tool for webmasters that provides a powerful and intuitive interface for analyzing web log data (see Figure 7-9).63 Google Analytics is part of a class of applications that provide statistical and graphical analyses of web visitor activity based on web server log data and (optionally) on data gained via cookies placed on users’ computers, web bugs, and JavaScript code. Such tools display site visitor reports (for example, geographic locations of visitors, most active visitors, and browsers used), page view reports (for example, entry/exit pages, most popular time of day, and number of requests for each page), server reports (for example, amount of bandwidth consumed and which files were requested), and referer reports (for example, search queries and referring URLs). Other popular web analytics software includes Webalizer (www.mrunix.net/webalizer/ ) and WebTrends (www.webtrends.com/).
Figure 7-9 Google Analytics gives Google the capability to track users as they visit any Google Analytics member site.
Google Analytics is easy to install. Webmasters need only paste code similar to the following into web pages they want the service to track.
<script src="https://www.google-analytics.com/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct = "UA-994065-1"; urchinTracker(); </script>
This code is straightforward JavaScript. It serves as a hook in each web page to contact Google whenever a page is loaded and download a JavaScript file called urchin.js. The _uacct variable stories a unique tracking code assigned to the webmaster. The script then launches the urchinTracker() function in the newly downloaded urchin.js file. Unfortunately, the code within urchin.js is far more complex and apparently obfuscated.64, 65 The following is a short snippet:66
function urchinTracker(page) { if (_udl.protocol=="file:") return; if (_uff && (!page || page=="")) return; var a,b,c,xx,v,z,k,x="",s="",f=0; var nx=" expires="+_uNx()+";"; var dc=_ubd.cookie; _udh=_uDomain(); if (!_uVG()) return; _uu=Math.round(Math.random()*2147483647); _udt=new Date(); _ust=Math.round(_udt.getTime()/1000); a=dc.indexOf("__utma="+_udh); b=dc.indexOf("__utmb="+_udh); c=dc.indexOf("__utmc="+_udh); if (_udn && _udn!="") { _udo=" domain="+_udn+";"; } if (_utimeout && _utimeout!="") { x=new Date(_udt.getTime()+(_utimeout*1000)); x=" expires="+x.toGMTString()+";"; }
Although many webmasters sing the praise of Google Analytics, the tool also poses a significant privacy concern for web surfers. Each time they visit a web page that contains the request to download urchin.js, the user’s web browser contacts a Google server and downloads and then executes the script, leaving behind all the typical web-browsing footprints described in Chapter 3. The urchin.js script presumably discloses additional information, but Google does not provide specific details. The primary risk of Google Analytics is that it gives Google the capability to track users as they browse from web site to web site, including the use of cookies.67 There is no official count of the number of participating web sites, but several years ago, an analyst estimated the number to be about 237,000.68 The number now is presumably far greater. Some of the most popular sites on the web employ Google Analytics, such as Slashdot.org, which downloads Google’s newer ga.js script, an equally difficult script to interpret. You can see a snippet in Figure 7-10.
Figure 7-10 Screenshot of the Google Analytics script ga.js downloaded from Google when users visit Slashdot.org
The risk of being tracked across 250,000 or more web sites is concerning enough, but the true risk of Google Analytics is that the user data can be combined with web sites participating in Google’s AdSense and AdWords programs, enabling the company to track users across a broad swath of the most popular portions of the web. Users see only a brief flicker in their browser’s status bar as their browser contacts Google’s servers. The potential of “free” web-analytics software is not lost on Google’s competitors; both Yahoo! and Microsoft recently released free web-analytics tools.69, 70
Chat Back
Google’s Chatback service enables web authors to embed a status indicator, a “badge,” directly into their web pages. When the page is loaded, the badge (see Figure 7-11) indicates whether the user is available for communication via Google Talk. Merely visiting the page causes the user’s browser to pull the Chatback badge from Google’s servers, leaving behind footprints in their logs. Clicking the link can start an online conversation, leaving open the eavesdropping and logging risks discussed in Chapter 5, “Communications.” Although this is a text-based service, similar risks exist via VoIP “call-me buttons” offered by companies such as Jajah, Jangle, Jaxtr, Tringme, and Grand Central.71, 72
Figure 7-11 Sample Google Chatback badge. Web authors place small snippets of Google-provided code in their web pages, and visitors to the page can see whether the author is available to chat via Google Talk.
You Tube Videos
Embedding YouTube videos is an extremely popular practice by web authors (see Figure 7-12). When doing so, authors place code similar to the following in their web pages.73
<object height="350" width="425"> <param name="movie" value="http://www.youtube.com/v/KJukKpQDVLQ"> <param name="wmode" value="transparent"> <embed src="https://www.youtube.com/v/KJukKpQDVLQ" type="application/x-shockwave-flash" wmode="transparent" height="350" width="425"> </embed> </object>
Figure 7-12 Example of a YouTube video embedded in a web page. When the image is merely displayed in the user’s browser, that user can be immediately logged by YouTube.
Notice that the code embeds a movie object pulled from Google’s servers. Again, users need only visit a page containing an embedded YouTube video to leave themselves open to tracking by Google, even if the page is run by a third party and there are no DoubleClick or AdSense advertisements.
Search on Your Web Page
Another common practice is for web authors to include a Google Search box on their site (see Figure 7-13). Although some visitors find this useful, it also facilitates the disclosure of search queries, as well as the user’s IP address and the site he or she is visiting, to Google. In some implementations, the disclosure takes places only when the user clicks Submit, as in the following code:74
<form method="get" action="http://www.google.com/search"> <input type="text" name="q" size="31" maxlength="255" value="" /> <input type="submit" value="Google Search" /> <input type="radio" name="sitesearch" value="" /> The Web <input type="radio" name="sitesearch" value="askdavetaylor.com" checked /> Ask Dave Taylor<br /> </form>
Figure 7-13 Many web sites include a Google search field, which encourages users to disclose search terms and the site they are visiting.
However, note the Google logo in the image. If the webmaster includes the logo on the page, he or she can choose to download the image directly from Google; this immediately informs Google when someone visits a given site.
Google also offers AdSense for search, which helps webmasters earn revenue by creating a custom search engine for a site.75 Along with customized search results, users see targeted advertisements.
Friend Connect
Friend Connect is a new service offered by Google that enables web authors to add social networking facets to their sites by embedding small snippets of code. Google Friends Connect “offers a core set of social gadgets such as member management, message board, reviews, and picture sharing.”76 Figure 7-14 shows a sample site provided by Google and illustrates several concerns with Friends Connect. Visitors to the site are offered the opportunity to sign in using their existing credentials, which uniquely identifies them to Google or one of several other participating services, including Yahoo! and AOL. The sites’ members, photos, and comments can be disclosed. In addition, because this site includes an embedded YouTube video and two Friend Connect widgets, the user can be logged three times by Google’s servers by merely visiting the site. Friend Connect is an interesting service that will likely be very popular. Therein lies the risk: Friends Connect and future generations of social networking applications will amplify user disclosure and facilitate uniquely identifying users.
Figure 7-14 Google’s new Friends Connect service enables web authors to add social networking functions to their sites.
Embedded Maps
Another common practice, and subsequent cross-site disclosure risk, is embedding maps within web pages. From hotels to tourist attractions, to business and social events, web authors rely upon third-party mapping services such as Google Maps and MapQuest to provide easy-to-use, interactive maps for their site’s visitors. Unfortunately, the practice also informs the mapping service of the IP address of the visitor, HTTP cookies, the site the user came from, and a location he or she is interested in. For example, in Figure 7-15, a web author for an academic conference directly embedded a Google map into the conference web site.77 Thus, every potential conference attendee who browses the conference’s directions page immediately informs Google, and possibly Yahoo! and MapQuest, of his or her interest in the conference and probable attendance. With thousands, perhaps millions, of embedded maps in sites across the web, this practice greatly extends the cross-site visibility of large online companies such as Google and Yahoo!. The future of information-disclosure risks associated with embedded mapping is likely to worsen. Simple mapping is giving way to mapplets (or mashups), which combine mapping with virtually any type of location-based data (think homes for sale, local coffee shops, or driving ranges). The end result is a growth in the type and quantity of information disclosed via embedded maps and their progeny.
Figure 7-15 Embedded maps in web pages immediately inform the mapping service of the user’s visit to a given web page, as well as that user’s interest in a specific area, as is the case for this academic conference.