The Commercialization of Personal Data

By Hal Abelson, Ken Ledeen, Harry Lewis, Wendy Seltzer
Dec 5, 2020

📄 Contents

␡

⎙ Print

< Back Page 2 of 4 Next >

This chapter is from the book 

Blown to Bits: Your Life, Liberty, and Happiness After the Digital Explosion, 2nd Edition

Learn More Buy

Footprints and Fingerprints

As we do our daily business and lead our private lives, we leave footprints and fingerprints. We can see our footprints in mud on the floor and in the sand and snow outdoors. We would not be surprised that anyone who went to the trouble to match our shoes to our footprints could determine, or guess, where we had been. Fingerprints are different. It doesn’t even occur to us that we are leaving them as we open doors and drink out of tumblers. Those who have guilty consciences may think about fingerprints and worry about where they are leaving them, but the rest of us don’t.

In the digital world, we all leave both electronic footprints and electronic fingerprints—data trails we leave intentionally, and data trails of which we are unaware or unconscious. The identifying data may be useful for forensic purposes. Because most of us don’t consider ourselves criminals, however, we tend not to worry about that. What we don’t think about is that the various small smudges we leave on the digital landscape may be useful to someone else—someone who wants to use the data we left behind to make money or to get something from us. It is therefore important to understand how and where we leave these digital footprints and fingerprints.

Tracing Paper

If I send an email or download a web page, it should come as no surprise that I’ve left some digital footprints. After all, the bits have to get to me, so some part of the system knows where I am. In the old days, if I wanted to be anonymous, I could write a note, but my handwriting might be recognizable, and I might leave fingerprints (the oily kind) on the paper. I might have typed, but Perry Mason regularly solved crimes by matching a typewritten note with the unique signature of the suspect’s typewriter. More fingerprints.

So, today I would laser print the letter and wear gloves. But even that might not suffice to disguise me. Researchers at Purdue have developed techniques for matching laser-printed output to a particular printer.⁹ They analyze printed sheets and detect unique characteristics of each manufacturer and each individual printer—fingerprints that can be used, like the smudges of old typewriter hammers, to match output with source. It may be unnecessary to put the microscope on individual letters to identify what printer produced a page.

The Electronic Frontier Foundation has demonstrated that many color printers nearly invisibly encode the printer serial number, date, and time on every page they print (see Figure 3.1). Therefore, when you print a report, you should not assume that no one can tell who printed it.

Figure 3.1 Fingerprint left by a Xerox DocuColor 12 color laser printer. The dots are very hard to see with the naked eye; the photograph was taken under blue light. The dot pattern encodes the date (2005-05-21), the time (12:50), and the serial number of the printer (21052857).

Source: Electronic Frontier Foundation, http://w2.eff.org/Privacy/printers/docucolor/

There was a sensible rationale behind this technology. The government wanted to make sure that office printers could not be used to turn out sets of hundred-dollar bills. The technology that was intended to frustrate counterfeiters makes it possible to trace every page printed on color laser printers back to the source. Useful technologies often have unintended consequences.

Many people, for perfectly legal and valid reasons, would like to protect their anonymity. They might be whistleblowers or dissidents. Perhaps they are merely railing against injustice in their workplace. Will technologies that undermine anonymity in political discourse also stifle free expression? A measure of anonymity is essential in a healthy democracy—and in the United States, anonymity has been a weapon used to advance free speech since the time of the Revolution. We may regret a complete abandonment of anonymity in favor of communication technologies that leave fingerprints.

The problem is not just the existence of fingerprints but that no one told us that we are creating them.

When NSA contractor Reality Winner leaked classified information to The Intercept, she might have thought that sending a paper copy would thwart attempts to trace the leaks.¹⁰ The Intercept had shared the document with NSA to verify its authenticity, and Winner was arrested a few days later. Initial reports speculated that she was traced through printer microdots, but the truth appears to have been even more mundane: NSA logs showed that only six accounts, including Winner’s, had accessed the document, and Winner had used a personal account to contact The Intercept shortly beforehand.¹¹

Advertising

If you ride the T in Boston, you’ll see lots of advertisements for college and graduate programs. They all have phone numbers and URLs, and many direct you places like college.edu/recruiting/redline. That web address isn’t saying the college has a special program on the Red Line, but it does have a special advertising program there. The “redline” at the end of the URL lets the college know that you were referred there by its subway ad. It might use that to direct you to the particular programs advertised on the poster and to track the effectiveness of this ad campaign.

Ads on the Web use the referring page as just one of many signifiers; others are less visible than the URL decoration visible on the subway poster. When you follow a link to open a web page in your browser, that click kicks off a series of events that starts with an electronic request for the web page and a request for any cookies the site may have set previously. All but the simplest of pages will then trigger requests for more subresources: images, fonts, scripts to make the page dynamic. A commercial site may have dozens of advertisements and tracking pixels, or “web bugs”—invisible elements that make your computer call out to yet another source for the purpose of tracking your activity.

HOW SITES KNOW WHO YOU ARE (AN INCOMPLETE LIST)

1. You tell them. Log in to Gmail, Amazon, or eBay, and you are letting them know exactly who you are.
2. They’ve left cookies on one of your previous visits. A cookie is a small text file stored on your local hard drive that contains information that a particular website wants to have available during your current session (about your shopping cart, for example) or from one session to the next. Cookies give sites persistent information for tracking and personalization. Your browser has a command for showing cookies; if you use it, you may be surprised how many websites have left them!
3. They have your IP address. The web server has to know where you are so that it can ship its web pages to you. Your IP address is a number like 66.82.9.88 that locates your computer in the Internet. That address may change from one day to the next. But in a residential setting, your Internet service provider (ISP; typically your phone or cable company) knows who was assigned each IP address at any time. Those records are often subpoenaed in court cases.
4. You look like someone they already recognize. Users who log in to Facebook often share a lot of detail about their lives and networks: friends and family connections, favorite bands and restaurants, political leanings—and that’s just things they deliberately connect or “like.” Facebook also creates shadow audiences, matching people on whom they have little information with others they already know, who share these characteristics.
5. They’ve fingerprinted your browser and linked it to profiles from previous visits. Websites can access lots of seemingly innocent details about your browser (which type, version, graphics encoding, language, and much more). These tend to remain fairly static, and often will uniquely identify a particular browser instance. This technique is simple, and remarkably accurate and effective.

If you are curious about who is using a particular IP address, you can check the American Registry of Internet Numbers (www.arin.net). Services such as whatismyip.com, whatismyip.org, and ipchicken.com also allow you to check your own IP address. And www.whois.net allows you to check who owns a domain name such as harvard.com—which turns out to be the Harvard Bookstore, a privately owned bookstore right across the street from the university.

Unfortunately, IP address information won’t reveal who is sending you spam, since spammers routinely forge the source of email they send you. In addition, between the time you request a web page and its ads are displayed in your browser, there’s often a real-time auction, in which your eyeballs (or at least the ad spaces in the web page your browser is about to display) are sold to the highest bidder. Ad networks collect the information from tracking pixels and page context to determine what ads to offer and how much to bid to place them in these auctions.

Why are these shoes following me around? Maybe you saw them on Instagram, tagged them on Pinterest, or searched for a new pair of sneakers on your favorite retailer’s website. Maybe you even put them into a shopping cart before deciding they weren’t in your budget at this time. Now, you can’t seem to escape the shoes: whether you’re reading the news or Facebooking with friends, there are the shoes, stalking you from the ad banners, urging you to click “buy.”

Known in the trade as “retargeting,” these ads are some of the products of real-time bidding. The marketer who dropped a tracking cookie in your browser during an earlier browsing session or the shopping visit you cut short is using it to identify you as a shoe-interested shopper and bidding to show you those ads in the hopes of luring you back to purchase. If you clicked through any of the ads, the marketer would register a “conversion” and factor this data further into your profile for future ad opportunities.

Web browsing users haven’t taken all of this sitting quietly. The Economist calls data “the new oil,” and browsers who are unwilling to be seen as gushers download ad blockers. As of early 2020, all of the major web browsers have incorporated tracker-blocking features or announced plans to limit third-party cookies.

Arvind Narayanan and his team at Princeton University have set up a laboratory for web measurement¹² and discovered new techniques for browser tracking. Through web “crawls,” they find tracking techniques used in the wild to identify users and reidentify those who think they’ve cleared all previous interactions. One of the paradoxes of privacy on the Web is that browsers can be fingerprinted by their unique features, including features the user might enable with the goal of securing greater privacy. That means turning on such protections can make the privacy-seeking user stand out. In such cases, privacy depends on the actions of many to provide a crowd in which the privacy-seeking browser can blend. Standardized processes and well-thought-out default settings are necessary to preserve the opportunities for privacy.

Target Knows You’re Pregnant

In 2012, as Charles Duhigg reported in the New York Times,¹³ a man walked into a Minneapolis-area Target store, furiously asking to speak with the manager: “My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”

The store manager apologized to the Minneapolis man for their apparent mistake, but he returned a few weeks later with an apology of his own: His daughter was, in fact, pregnant. The store’s predictive models had recognized the young woman’s pregnancy even before her father had. Target’s models didn’t have access to her private information. They had the power of analytical tools and readily available data.

Like many other stores with loyalty cards or user accounts, Target built statistical models of shopper behavior to predict hot products for inventory and pricing and to make recommendations. Target correlated shopper purchase history based on an internal guest ID and purchased external data to supplement its logs. From those records, the company’s statistician could derive patterns, noticing, for instance, that women in the second trimester of pregnancy would often purchase unscented moisturizing lotions and supplements. After watching this pattern play out many times, the store could anticipate future purchases of baby clothes and diapers from the earlier unscented lotion—and advertise to the mother-to-be at a time when her shopping habits were in flux—responding to a signal she didn’t even know she was sending.

How can we solve a privacy problem that results from many developments, no one of which is really a problem in itself?

You Pay for the Mic, We’ll Just Listen In

Planting tiny microphones where they might pick up conversations of underworld figures used to be risky work for federal authorities. There are much safer alternatives, now that most people carry their own radio-equipped microphones with them all the time or invite Alexa, Siri, Cortana, or Google into their homes.

Many cell phones can be reprogrammed remotely so that the microphone is always on and the phone is transmitting, even if you think you have powered it off. The FBI used this technique in 2004 to listen to John Tomero’s conversations with other members of his organized crime family. A federal court ruled that this “roving bug,” installed after due authorization, constituted a legal form of wiretapping. Tomero could have prevented it by removing the battery, and now some nervous business executives routinely do exactly that.

The microphone in a General Motors car equipped with the OnStar system can also be activated remotely, a feature that can save lives when OnStar operators contact the driver after receiving a crash signal. OnStar warns, “OnStar will cooperate with official court orders regarding criminal investigations from law enforcement and other agencies,” and indeed, the FBI has used this method to eavesdrop on conversations held inside cars. In one case, a federal court ruled against this way of collecting evidence—but not on privacy grounds. The roving bug disabled the normal operation of OnStar, and the court simply thought that the FBI had interfered with the vehicle owner’s contractual right to chat with the OnStar operators!

Danielle, an Amazon Echo customer in Portland, Oregon, was alarmed by a call from one of her husband’s colleagues, who said, “Unplug your Alexa devices right now; you’re being hacked.”¹⁴ The gadget, which was supposed to record only when triggered by the wake word “Alexa,” must have heard both that and a “send message” command in Danielle’s conversation. Her chat about hardwood floors turned into a voice message to a business acquaintance. A freak occurrence, perhaps, but one that may be repeated as we invite tiny networked recorders into more corners of our lives. German authorities banned “My Friend Cayla,”¹⁵ a talking doll, over concerns about its spying and data-collecting abilities. To engage in conversation with children, Cayla uploaded the sounds she heard over the Internet. German parents were told to destroy the “illegal espionage apparatus.” Meanwhile, here in the United States, your smart TV may be watching your viewing habits to tailor advertising. Vizio’s CTO told the Consumer Electronics Show that TVs would cost more if it weren’t for this revenue stream.¹⁶

Venmo: It All Adds Up

Earlier we discussed the tracking that credit cards enable in credit reporting bureaus and data analysis firms. Newer payment technologies bring the reporting directly to you. Venmo lets you send someone money or split a bill by entering the person’s phone number. It’s so easy that as you send money to friends or roommates using the Venmo app, you might not notice that these payment transactions are public, including any memo you write along with the payment. A researcher who found the feed correlated just a few of the threads among millions of transactions into “Venmo stories”:¹⁷ a student’s fast food habit, a cannabis vendor’s sales, a budding relationship? You might not mind sharing your passion for elote (seasoned corn) but might feel differently about recreational marijuana purchases, even in states where those are legal. The researcher, Hang Do Thi Duc, anonymized the details but notes that the feed, which includes everything except dollar values, remained accessible to any visitor to Venmo’s public API. (Every page of the site Duc developed, publicbydefault.fyi, encourages Venmo users to change their privacy settings from the default to make transactions private between sender and recipient.)

DNA: The Ultimate Digital Fingerprint

In April 2018, the state of California arraigned Joseph James DeAngelo on a series of decades-old murder and rape charges. The Golden State Killer had been a cold case until an investigator uploaded DNA from a crime scene to a public genealogy website, GEDmatch. The investigator created a fake profile for the unknown person whose recovered DNA he uploaded. After GEDmatch compared this person’s DNA against its existing database to identify partial genetic matches, it showed profiles of people who were likely distant relatives of the suspected killer. Those names led to family trees and to genealogy that could be traced further through census records, obituaries, gravesites, and commercial and law enforcement databases. After these searches put a name to their suspect, investigators confirmed their suspicions by tracking him down and obtaining another DNA sample, from skin cells he left on the car door when he parked in a Hobby Lobby parking lot. That DNA matched the original crime scene samples.¹⁸

DeAngelo had not posted to the ancestry site, but because a parent passes roughly half of his or her genes to a child (notwithstanding a few mutations along the way), much of DeAngelo’s genetic record could be read or revealed by relatives. If your family members explore their genetic profiles and family trees on GEDmatch, they are also exposing information about traits you might share. Your privacy can be invaded through no actions of your own. While the Genetic Information Nondiscrimination Act prohibits employers or health insurers from discriminating based on DNA, the law doesn’t restrict numerous other ways DNA can be used.

The Golden State Killer case started a boom in DNA forensic genealogy. By the end of 2018, more than a dozen violent criminals and perpetrators of sexual assault had been identified through GEDmatch. But the site also heard privacy alarm and changed its terms of service to prohibit law enforcement matching of DNA profiles unless users opted in for their own records.

< Back Page 2 of 4 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Email Address