Collecting Analytics from (almost any) Website
- Planning an Approach
- Planning the Workbook Structure
- The VBA Code
- The Analysis Sheets
Save 35% off the list price* of the related book or multi-format eBook (EPUB + MOBI + PDF) with discount code ARTICLE.
* See informit.com/terms
IN THIS CHAPTER
Planning an Approach
Planning the Workbook Structure
The VBA Code
The Analysis Sheets
The word analytics connotes, among other notions, the idea that the raw data that gets analyzed includes quantitative measures of online browsing behavior. Data collection instruments such as Adobe Analytics and Google Analytics are useful in part because they track a variety of behavior—hits, views, downloads, buys, and so on—along with information about the source of the visit. However, there are times that you want to measure product performance but can’t access the traffic data.
Suppose you supply a product that another company or companies resell on their own websites. Although those companies might share their web traffic information with you, it’s more likely that they regard it as proprietary. In that case, if you want to analyze end users’ purchasing behavior, you might be limited to any data that the resellers’ websites make generally available.
That’s the position that I’m in when it comes to sales of my books by web resellers such as Amazon. Although I hear rumors from time to time that Amazon shares data about actual sales with book authors, suppliers of music, and manufacturers of tangible products, it hasn’t reached me in any particularly useful form. Maybe I’m too far down the food chain. Whatever the reason, I have to roll my own.
Fortunately, one of the sales and marketing devices that Amazon employs is product rankings. As to books, there are a couple of different rankings: overall sales, and sales within categories such as Books > Computers & Technology > Software > Microsoft > Microsoft Excel. Although as an author I hope that my books achieve a high sales ranking in a category—that gives them greater visibility—I really hope that they achieve a nice, high overall sales ranking, which I believe to bear a closer relationship to actual sales.
So what I want to do is create a means of accessing information about books’ sales rankings (including titles that compete with mine), save the data in a form that I can use for analysis, and make inferences about the results of the analysis.
In this chapter, I show you how to do just that: get analytics without the active cooperation of the website. The techniques don’t require that you be interested in book sales. They can be used for anything from product sales to stock prices to yards gained per pass attempt.
Planning an Approach
Before you undertake something like the project I’m about to describe, there are a few issues you should keep in mind. If you can’t think of a satisfactory way to deal with them, you might want to consider taking a completely different approach than the one I describe here.
A Meaningful Variable
The most important issue is the availability of one or more variables on a website that bear on what you’re interested in, even if only indirectly.
For example, Amazon does not publish on a publicly accessible web page the number of copies of a book it has sold, whether actual, physical books or downloaded electronic copies. But as I mentioned in the prior section, Amazon does publish sales rankings. For the project described here, I decided that I could live with the sales rankings as an indicator of sales figures.
I also learned that although Amazon usually updates the sales ranking every hour, sometimes the updates don’t take place. Sometimes they’re a little late. Sometimes several hours pass without any update occurring. Sometimes days go by when the updates come every two hours instead of hourly. And sometimes the rankings don’t entirely make sense; particularly in the wee hours, a ranking can drift from, say, 20,000 at 4:00 a.m. to 19,995 at 5:00 a.m. to 19,990 at 6:00 a.m. and so on. That kind of movement can’t reflect sales, and the deltas are much smaller and more regular than at other times of day. But I found that most of the time the updates take place hourly—give or take a couple of minutes—and correspond either to no sales (the rankings get larger) or to a presumed sale (the rankings get smaller, often by a generous amount).
I have also found that on occasion the Amazon site does not recognize a query made by software other than a browser such as Google Chrome, Firefox or Internet Explorer. This sort of “outage” can last for several hours, and can come and go over a period lasting several days. During these time spans it appears that the only option is to arrive at an Amazon page manually, via one of those browsers. Otherwise I get no useful results.
Identifying Sales
I decided that I could live with changes in rankings as a stand-in for actual sales. I also decided that I could make an assumption about an increase in ranking, such as from 25,000 to 20,000. That’s a big enough jump that I can assume a sale took place. I can’t tell for sure how many units were sold. But these books don’t sell like an adventure novel. Amazon sells four or five copies of one of my books each day. So when I see an improvement in ranking of a few thousand ranks, it almost certainly indicates the sale of a single unit.
Given that information (or, maybe more accurately, educated guesses), it’s possible to get a handle on what you need to include in a workbook to access, analyze, and synthesize the data sensibly.