Getting Digital
We are blind to the digital world. Unassisted, we have no way of knowing whether our website is thronged with visitors or as empty as a mall after closing time, whether our cash registers are overflowing or stubbornly silent, whether our customers are young or old, whether our content is read with rapt attention or is barely and desultorily skimmed. We need eyes and ears to help us see into the digital world. Certain tools have that very function—to track and make visible the otherwise unseen patterns of that world. These digital analytics tools are powerful and rich. They include hundreds or thousands of possible reports and options that seem to expose every aspect of digital behavior. It’s all too easy to forget how dependent we are on the exact nature of those tools and to assume that what they show us and the way they show it to us is all there is.
Our natural senses in the physical world have evolved to give us many advantages. We have adopted a deep and abiding faith in what we see. Yet even with our physical senses, it’s all too easy to forget that the window they provide into the world is a narrow one.
Remember the image of a dress that went viral in early spring 2015? Many people see the dress as black and blue. Others see it as white and gold. If you look long enough or over some period of time, you might see it each way. If you didn’t hear about the dress and you can’t believe that anyone could see it differently than whichever way happens to strike you, check it out online and show it around. You’ll be surprised.
Optical illusions are just one aspect of how our eyes can mislead us. We see color (no matter how much we might disagree about it), but we don’t see heat.
Why should we see heat?
Well, why shouldn’t we?
Infrared cameras see heat. It’s just another wavelength, and for many purposes, seeing heat is far more useful than seeing light (when hunting at night, for example). For that matter, what if we could see radio waves? Hearing and vision seem fundamentally different to us, but each is a set of waves that different tools inside our body use. What would our radio-wave eyes make of a Madonna song? Probably not much.
The simple fact is this: Our reality is constrained by the tools we experience it with.
What does all this have to do with digital analytics? The digital analytics tools we have are our windows into the digital world. We see only what they can track or think is important. We see what page a user requested from a server, but we don’t see how long that page took to load. We see what link a user clicked on, but we usually don’t see what part of the page that user scrolled to. We (sometimes) see what website the user came from, but we don’t (usually) see what website that user went to. These choices make a profound difference in how we think about the digital world and what we tend to value as important there.
What if our tools aren’t very good? What if the events they choose to capture or the ways they choose to show them to us give us only a shadowy impression of the real digital world or what’s important inside it?
I’ve been around since the very beginning of digital analytics. I witnessed firsthand and, in some small ways, even helped shape how those digital tools evolved. Having seen their history, I know that the decisions about what to track in the digital world and how to track it were often ad hoc and shallow.
The first digital analytics tools were built to read weblogs. These logs weren’t built to understand and measure the digital world. They were built to create a record of what a web server was doing so that IT professionals might be able to track down operational problems (although they were hardly ever used for that, either). These logs recorded IT-focused information about which content file was requested, when exactly it was processed, what IP address requested it, how much content was sent, and whether the request was successful.
Because those were the fields in the logs, those were the fields we used when we first built digital analytics tools. And being clever folk, we interpolated a whole lot from this bare bones little set of fields. We figured out a way to group the records by the device requesting them (which we promptly anthropomorphized into a human visitor). By looking at the time between requests, we could group the requests into batches by creating an arbitrary time limit, and we labeled these batches of requests visits. Then we could look at what page a visitor looked at first in that batch and we called that an entry page. We could also look at what page was last in the batch and call that an exit page.
It’s important to realize just how arbitrary these decisions were. When a visitor first arrives on a website, that website sometimes records which website the visitor came from—this is called the referring site. By saving the referring site for each batch of records (a visit), you can get a sense of which sites are generating traffic to your pages. But here’s a peculiarity: By defining an arbitrary time limit of 30 minutes to group records, we created situations in which a visit sometimes had more than one referring site; in other situations, a visit had a referring site that was the last page the visitor viewed on the same website.
For example, imagine that a visitor searches on Google or Bing, finds your website, and views a page. Then that visitor returns to the search engine, does another search, goes to a different site, and links from there to you within 30 minutes of the first request. You’ll have a single visit with two referring sites. This might sound far-fetched, but in certain permutations, it’s not uncommon. Many sessions will have multiple visits to Google interspersed with views of your pages.
It’s even more likely, especially in our tabbed browser world (which came after all these definitions were created—you remember browsers without tabs, right?), that a visitor will view a page or tab elsewhere, spend some time there, and then return to your website and view another page. Same session? According to our tools, if that happens 25 minutes apart, it is. But if it happens 31 minutes apart, it isn’t. And if it does happen 31 minutes apart, you’ll have a whole new visit with a referring site of your own website!
It would have been perfectly plausible (maybe much more plausible) to decide that a batch of records should be separated by a referring site other than your own domain, regardless of time. But that’s not the way some early vendors did it, so the definition stuck and became an artifact of truth.
And if a visit is merely a rough-and-ready and poorly defined artifact, then so are the entry page (the first page in a visit), the exit page (the last page in a visit), the visit time (the time between the first and last requests that are part of a visit), and the referring site (the domain recorded in the first record of the visit as the referrer, the site from which the user came)—all based on the way we defined a visit.
As with the words we read on a page, the numbers we see in a tool tend to take on privileged status in our minds. But if the only problem with standard web metrics were a certain sloppiness of definition, our situation wouldn’t be all that bad. How much difference can it make whether a visit is defined by 30 minutes of inactivity or a new referring domain? Hard to say.
By showing how arbitrary these standard metrics are in construction, I hope to lessen their privileged status and make it easier to convince you that, not only are they arbitrary, but they are largely misguided.
Web measurement began with weblogs, whose goal was to measure digital assets. This long-ago bias has persisted through every generation of digital analytics tool. The implicit goal of analytics tools is to measure the website or the app. That’s missing, if not the whole point, a big part of it. In the digital world, our goal should be to understand our customers, not our digital assets.
Our tools have improved to do that—probably more than our practice has. Digital analytics tools now deliver significant and interesting segmentation capabilities that enable you to define and track cohorts, segment on fairly complex behaviors, and compare different types of users. They even provide limited capabilities for integrating nondigital data into their reporting.
This is all good, and the technology seems to improve continually. But although the capabilities of the tools have improved, the basic views they provide haven’t changed much. How many reports in a digital analytics tool tell you anything about customers? For that matter, how many of the digital reports you distribute in your organization have anything to do with customers? And how much do they really help you understand the digital world?
Close your eyes and picture your website. Imagine people of all sorts moving through it. They stop here or there. They go down certain pathways and ignore others. They look at this or that. They make a purchase or head for the exit. Can you see it?
Now open your web analytics tool. Do the reports help you visualize that scene? Do they help you understand who your customers are (beer and chips, or milk and eggs)? Can you find the different types of visits (going down every aisle, or just ran out of something) and see which are most common? Do they help you picture which customers do which visits most often (beer and chips, just ran out, Friday night)? In other words, do they actually help you understand the digital world or do they just confront you with a wall of numbers that elude meaning, even while seeming entirely plausible?
I first started measuring the digital world back when websites were brand new and people still talked about the World Wide Web. I’d spent the previous few years working with a couple large credit card companies, analyzing the way people use their credit cards to create marketing programs (yeah, sorry about all that crappy mail). We used some pretty fancy analysis techniques to group people together, to understand how they used their credit cards and then to classify them. It was pretty easy and powerful. It didn’t take analytic genius to know that the cardholder who routinely dropped four figures at Neiman Marcus was a different beast from that two-digit shopper at the local Walmart.
When I first got my hands on web behavioral data, I ran the same kind of (neural net) analysis and proudly sold the results. But whereas my credit card segmentations had truly been interesting and useful, my digital segmentations looked like some inverted Egyptian monstrosity (see Figure 1.1).
Figure 1.1 Inverted pyramid
Nobody ever got rich targeting “people who viewed 3–5 pages.”
I spent years learning that the digital metrics tools provide aren’t that interesting and that, no matter how powerful a tool I used to study digital behavior, I wouldn’t get interesting results if I picked the wrong type of variables.
So put away your digital analytics tools for a minute. Forget all about page reports, referring sites, average page times, top exit pages, number of visits, average conversion rates, and the whole flavorless cornucopia of web metrics and reports that those tools spit out by default. It’s all garbage in the most literal sense—it takes up mental space and it smells bad.
You’re about to find a better way to understand the digital world.