The VBA Code
This section walks you through the VBA code that I use to update my data collector hourly. It runs itself, so I can get it going, walk away, and not come back for hours or days. You can start by using Alt+F11 to open the Visual Basic Editor. Then arrange for a fresh module by choosing Insert, Module. The code that’s shown in this chapter, or your own VBA code, goes into the new module.
Although the figures in this chapter indicate that I’m obtaining data on several books, I’m limiting the code used here to just two. You can extend it to more products than two if you want.
I should also note that I’ve revised the code somewhat to make clearer what’s going on. A number of modifications would structure the code a bit more tightly and run a little faster. But they tend to obscure the general flow of the task.
- Option Explicit
I always specify Option Explicit at the top of my VBA modules. Doing so forces me to declare variables explicitly, often with Dim statements. For example:
- Dim NextRow as Integer
Dim, short for dimension, informs VBA that a variable named NextRow is intended to exist in the module’s code: The variable is declared to exist. Later on, if I mistype the name of the variable in the code
- NextRoe = NextRow + 1
for example, VBA complains that I’m trying to use a variable that I haven’t declared. If I don’t have Option Explicit at the top of the module, VBA assumes that I meant to implicitly declare a new variable, NextRoe, and assign to it the current value of NextRow + 1. The implicit, on-the-fly declaration of variables was common programming practice 40 years ago. VBA still tolerates it unless you protect yourself with Option Explicit.
The DoItAgain Subroutine
This very short, three-statement subroutine causes the main GetNewData subroutine to run an hour from now:
Private Sub DoItAgain() Application.OnTime Now + TimeValue("01:01:00"), "GetNewData" End Sub
The DoItAgain subroutine uses VBA’s OnTime method to schedule the time when a procedure is to run next. The time is specified first, using the fragment Now + TimeValue("01:01:00"). VBA has a function named Now that returns the current system date and time. VBA has another function named TimeValue that converts a string to a time value.
So, Now + TimeValue("01:01:00") gets the time as of right now and adds to it one hour, one minute, and zero seconds.
The second argument to OnTime as used here is GetNewData, which is the code’s main procedure. Its name must be enclosed in quotes when used as an argument.
The full statement instructs VBA to run the procedure named GetNewData, 61 minutes after the OnTime method is invoked.
I settled on an hour and a minute after some experimenting. Amazon states that it updates the rankings hourly, and that seems to be nearly correct. Sometimes the update doesn’t happen exactly when it should, and sometimes several hours pass before the next update takes place. These events are mildly atypical, though, and something close to an hour is normal.
However, it takes a bit of time for my code to run: to acquire new data by way of the queries and to test the changes in rankings for evidence of a sale. So I add a minute to the one hour in order to catch back up with Amazon’s schedule.
The GetNewData Subroutine
Here’s the main procedure: GetNewData. It executes web queries, obtains sales rankings, and copies and pastes formulas that convert rankings to assumed sales figures.
Sub GetNewData() Dim NextRow As Integer, NextRank As Long Dim UnitsLeft As Integer, NextLeft As Integer
I begin by declaring the variables I need in this procedure:
- NextRow is the worksheet row where I write the next set of query data.
- NextRank contains the next sales ranking for a given book.
- NextLeft contains the number of units that Amazon has left in stock, if that figure is reported on the book’s Amazon page.
Application.ScreenUpdating = False Application.Calculation = xlCalculationManual
I don’t like to be distracted by the screen if I happen to be looking at it when the code changes the active worksheet. So I turn off screen updating and the Excel window stays put until the code has finished running, or until a statement turning screen updating back on is executed (you set its value to True instead of False).
Also, I don’t want formulas recalculating as the queries write new data to the worksheets; I let them recalculate after the queries have finished running. So, I set Excel’s calculation property to manual—later on I reset it to automatic.
RefreshSheets "Stats" RefreshSheets "BAXL Kindle"
Now I call the RefreshSheets subroutine with the name of the query sheet as the argument. I call it once for each query sheet. As noted earlier, I actually call this subroutine eight times but I don’t want to subject you to all eight calls here—what happens is largely the same in each call. The RefreshSheets subroutine is shown and described following this main GetNewData subroutine at the end of the current section on the VBA code.
- NextRow = ThisWorkbook.Sheets("Summary").Cells(Rows.Count, 1).End(xlUp).Row + 1
The code needs to find the next available row on the Summary sheet so that it can write the new data from the queries to that sheet without overwriting existing records. Here’s how it does that.
If you’re using an Excel worksheet directly, one of the ways to move around is with Ctrl + arrow, where arrow means up or down arrow, right or left arrow. If the active cell is in a column of data, for example, you can find the last contiguous non-empty cell with Ctrl + Down Arrow. If you’re at the bottom of a column, something such as A65536, and that cell is empty, you can find the lowermost non-blank cell with Ctrl + Up Arrow.
Using VBA, the way to emulate that keyboard action is with the End property, which belongs to a cell object (really a range object because a cell is a special instance of a range). So, Cells(Rows.Count,1).End(xlUp) tells VBA to go to the final row in Column A and act as though you had pressed Ctrl + Up Arrow. That takes VBA to the lowermost non-empty cell, assuming as I do that the cell defined by the final row of Column A is itself empty.
Then all I have to do is get the row where the non-empty cell is found, add 1 to the row number, and I have the row number that I need to write into next. The result is assigned to the variable NextRow.
I do have to tell VBA which cell to regard as the active cell, and I do that by citing the Summary worksheet, and directing VBA’s attention to that worksheet’s final row, first column. I start things out with ThisWorkbook just in case I have more than one workbook open and some other workbook is active at the time that the code starts to run. The ThisWorkbook object specifies the workbook that contains the code that is running.
- ThisWorkbook.Sheets("Summary").Cells(NextRow, 1) = Now
Having found the next row to write in, I put the current date and time into the Summary worksheet, in the first column of that row.
- With ThisWorkbook.Sheets("Summary")
The With statement initiates a block of code in which objects that are preceded by a dot, such as .Cells(NextRow,2), are deemed by VBA to belong to the object named in the With block—here, that’s the worksheet named Summary.
- NextRank = GetRank("Stats", " in")
The GetRank procedure is called with the name of the query sheet named Stats, and the string " in", which helps GetRank locate the sales ranking that has been retrieved by the query for the statistics book.
The GetRank procedure, which is actually a function written in VBA, is discussed later in this section. The value returned by that function, which is the sales rank of the book in question, is assigned to the variable NextRank.
- .Cells(NextRow, 2) = NextRank
Then the value just assigned to NextRank is written to the cell at the intersection of the NextRow-th row and column 2. (Column 2 is specific to and reserved for sales rankings of the statistics book. When queries on other books are run later in the code, the value assigned at that point to NextRank is written to a different column.) Because of the earlier With statement, the cell in the NextRow-th row and the second column is taken to belong to the worksheet named Summary in the workbook that contains the running code.
- NextLeft = GetUnitsLeft("Stats")
Now the code gets into some functionality that I have abandoned because it’s a field that isn’t updated with any frequency by Amazon—not, at least, for the books I’m interested in tracking. When Amazon’s inventory of a book gets down to a certain level, often called the reorder point, the book’s web page gets a text string along the lines of “Only 5 left in stock—order soon (more on the way).”
It occurred to me that by quantifying the number of sales between the dates when this message disappeared, and then reappeared, I might be able to calculate the number of copies Amazon ordered from the publisher. So, for a while I tracked that string, but then it became clear that for some reason Amazon was not putting that message up for the books I was most interested in. Furthermore, the message probably has no relevance to a book’s Kindle edition. So I dropped the code, but I have restored it here in case you are interested in seeing how to deal with a message or other web data that might or might not be on a web page at the time you query it.
I get the figure by invoking another function in my code, GetUnitsLeft, which, like GetRank, takes the name of the book’s query sheet as an argument. The GetUnitsLeft function is also discussed below. It returns a value to the NextLeft variable.
If NextLeft <> -1 Then .Cells(NextRow, 20) = NextLeft End If
If information about the number of units left in stock is missing, NextLeft has been set equal to -1 and the code goes on to the next task. But if NextLeft has some value other than -1, that value is written to the cell in NextRow-th row and the twentieth column. Again, column 20 is reserved for units-on-hand information about the statistics book.
The next two statements get the sales rank for the book whose query sheet is named “BAXL Kindle” and write the sales rank to the NextRow-th row, third column on the Summary sheet:
NextRank = GetRank("BAXL Kindle", " Paid in") .Cells(NextRow, 3) = NextRank
Notice that the second argument to the GetRank function is now " Paid in" instead of " in". The reason is that this is a Kindle book, and Amazon follows the sales rank with " Paid in" rather than simply " in". The string passed to the GetRank function is used to help strip out the actual ranking from its surrounding verbiage; you find out how when we get to that function.
In the prior section titled “Identifying Sales,” I discussed how some worksheet formulas can be used to decide if a sale of a book occurred by comparing two consecutive rankings of the same book. Now the VBA code selects, copies, and pastes those formulas into the next row down. The newly pasted formulas then recalculate based on the differences between the most recent and the now-current rankings. Begin by activating the Summary worksheet:
- .Activate
Select the cells that contain the formulas to copy. They are found in the prior row—that is, in the row that precedes NextRow, where we are now writing the current query data. Select the cells in that row from the tenth column to the seventeenth:
- .Range(.Cells(NextRow - 1, 10), .Cells(NextRow - 1, 17)).Select
Autofill the selection one row down. The addresses used by the formulas automatically adjust so that they point to the newly current row:
Selection.AutoFill Destination:=.Range(.Cells(NextRow - 1, 10), _ .Cells(NextRow, 17)), Type:=xlFillDefault
Notice that the destination of the autofill includes the row NextRow-1 as well as NextRow. It’s a minor peculiarity of the Autofill method that the destination range must include the source range. In this case, the source range is the row NextRow-1 (columns 10 through 17) and that range is to be autofilled into the row NextRow (columns 10 through 17). But that destination range as specified in the VBA statement must include the source range.
.Rows(NextRow + 1).Select Selection.Insert Shift:=xlDown .Cells(NextRow, 1).Select
The prior three statements insert a blank row following the NextRow row—that is, the row that the code has just populated with sales rankings and sales formulas. The reason is that there is a table of analyses (discussed later in this chapter) that are found a few rows below the rankings. It’s helpful to push that table down a row when a new row has been populated with rankings.
- End With
Terminate the With block that deems any object beginning with a dot (such as .Cells) to belong to the Summary worksheet in ThisWorkbook.
- Application.Calculation = xlCalculationAutomatic
Turn automatic calculation back on.
- ThisWorkbook.Save
Save the workbook.
- Application.ScreenUpdating = True
Turn screen updates back on.
- DoItAgain
Run the three-statement DoItAgain subroutine, discussed earlier, that causes the GetNewData subroutine to run again an hour from now.
- End Sub
End the GetNewData subroutine.
The GetRank Function
Here’s the function named GetRank. Notice that it has two arguments, SheetName and TrailingString. The values are passed to the function by the statement that calls it. For example:
- NextRank = GetRank("Stats", " in")
where "Stats" is the worksheet name and " in" is a string of characters from the web page that follows—that is, trails—the actual ranking.
- Function GetRank(SheetName As String, TrailingString As String) As Long
The function returns a value to the statement that called it. That value is often of a certain type: integer, a text string, a decimal number, and so on. To accommodate that, the function itself is declared to be of a certain type. Here, that type is Long. In VBA, Long means an integer value that might be much larger than the maximum value for a regular integer, which tops out at 32,767. There are many more books than that in Amazon’s inventory, and this function is intended to return a book’s ranking. Therefore, to accommodate the possibility that the rank is greater than 32,767, I declare the function as type Long, which can return a value greater than 2 billion—more than large enough.
Dim FoundString As String Dim StartPos As Integer, StopPos As Integer
Three variables are declared. FoundString holds the value of the sales rank. It’s declared as a string because when the code strips it out of the surrounding text, it’s a string. Later it’s converted to a Long integer.
StartPos and StopPos determine the starting and stopping positions of the sales rank within the string. For example, this string:
- Amazon Best Sellers Rank: #64,788 in Books
has its first numeric representation in character number 28 of the string, so StartPos is assigned the value 28.
- On Error GoTo EarlyOut
This code is designed to run 24/7, so I can’t expect myself to nurse it along if it runs into an error that it can’t recover from. The On Error statement tells VBA that if it encounters an error such as a letter right in the middle of what is supposed to be a numeric sales ranking, it shouldn’t terminate processing. Instead, transfer control to a point named EarlyOut. That way, processing can continue. Even if it happens at 2:00 a.m., I can check it out at 9:00 a.m. and fix whatever happened, and I won’t necessarily have lost data from the intervening seven hours.
The next five statements are responsible for finding the cell in the worksheet that contains the sales ranking, stripping the numeric ranking out of that cell, and assigning the result to the GetRank function.
First, find the cell that contains the string "sellers Rank:". The rank we’re looking for comes directly after that string.
Cells.Find(What:="sellers Rank:", LookIn:= _ xlFormulas, LookAt:=xlPart, SearchOrder:=xlByRows, _ SearchDirection:= xlNext, MatchCase:=False).Activate
The Find method is used to locate the string. Most of the arguments, such as LookIn and LookAt, are there to keep the settings in place for the Find dialog box in Excel’s user interface. I don’t much care about forcing a Case match, so long as I get the string I’m looking for. Finally, I use the Activate method at the end to make the found cell active.
- FoundString = ActiveCell
For convenience in subsequent handling, I set the string variable FoundString to the contents of the cell where "sellers Rank:" is found.
- StartPos = InStr(FoundString, " #") + 2
I use VBA’s InStr function to locate the space and the pound sign (#) in the contents of the active cell. I want to start the string that contains the sales rank numbers immediately after the space and pound sign, so I add 2 to the position where that substring is found. The result is stored in StartPos.
- StopPos = InStr(FoundString, TrailingString)
Then I look inside FoundString again, this time to find the trailing string—the characters that Amazon supplies right after the sales ranking. In the case of tangible books, the ranking is followed by " in". In the case of Kindle books, the ranking is followed by " Paid in"—or at least that’s the case in December 2011. So the value of TrailingString is passed to the GetRank function, along with the name of the query sheet. If I’m querying for a tangible book, I pass " in" and if I’m querying for a Kindle book, I pass " Paid in".
- GetRank = 1 * Mid(FoundString, StartPos, StopPos - StartPos)
Finally, the code gets the value of the sales ranking using VBA’s Mid function, which returns a string that is inside another string. In this case, Mid is instructed to look in FoundString, beginning at StartPos, and to return (StopPos – StartPos) characters. Those characters are the ranking, but because they came from a string they are still a string. Therefore I multiply the result by 1 to coerce the string to a numeric value, and assign it to GetRank.
The prior statement assigns the value of the sales rank to the function GetRank itself. When the function terminates and control returns to the statement that called the function, GetRank itself equals the value that it calculated. And in the statement that called the function, which is NextRank = GetRank("BAXL", " in"), the variable NextRank is assigned the value of the GetRank function.
- On Error GoTo 0
This second On Error statement returns the default status. Telling VBA to “go to 0” in the event of an error causes VBA to terminate with a run-time error if it hits another error. When this function cedes control back to the calling statement, the prior On Error statement is no longer in effect and so this On Error GoTo 0 statement could be omitted, but I wanted you to know about it.
- Exit Function
In the normal course of events, the function terminates at the Exit Function statement and control returns to the statement that called the function. But if something goes wrong, the Exit Function statement is bypassed and some minor housekeeping takes place first.
Here’s the label I mentioned earlier. Control transfers here if something goes wrong in the portion of the function that locates the sales ranking and converts it to a Long integer:
EarlyOut: GetRank = 0
The value returned by the function is set to zero. That’s the value that is written to the Summary sheet as the sales rank. When I see that, I know that something went wrong. Perhaps Amazon’s code had removed the old ranking from its page and was getting ready to put the new ranking in place at the moment that this code executed its query.
- End Function
After a value, even an illogical one such as a sales ranking of zero, is assigned to the function, its task is completed and control returns to the calling statement.
The GetUnitsLeft Function
Here’s the function mentioned earlier that determines how many units of a particular book are left.
- Function GetUnitsLeft(SheetName As String) As Long
The name of the query sheet is passed as SheetName to the function. The function returns a value of the Long integer type.
- Dim FoundString As String
As in the GetRank function, FoundString will contain the value in the cell we’re looking for.
- Dim StartPos As Integer, StopPos As Integer
StartPos and StopPos have the same purposes that they have in GetRank: to bracket the value you want to return to the main procedure.
- GetUnitsLeft = -1
Start out by assigning a default value to the function.
- Sheets(SheetName).Activate
Activate the query sheet.
- On Error GoTo Nodata
Establish an error handler: go to the Nodata label in the event of a runtime error.
Cells.Find(What:="left in stock", LookIn:= xlFormulas, _ LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:= _ xlNext, MatchCase:=False).Activate
Execute the Find method. In this case, where you are looking for the number of units in stock, there is a particular, invariant string value that locates the cell we’re after: "left in stock". Therefore, you don’t need to pass a different string to the procedure as an argument, depending on which book we’re querying.
- FoundString = ActiveCell
The contents of the cell that Find found are assigned to the FoundString variable.
- StartPos = InStr(FoundString, "Only ") + 5
The InStr function returns the position in the searched string where "Only " starts. Therefore, add the five characters in "Only " to the result of InStr to locate the position where the number left in stock begins.
- StopPos = InStr(FoundString, " left in")
StopPos locates the position where " left in" begins.
- GetUnitsLeft = 1 * Mid(FoundString, StartPos, StopPos - StartPos)
Finally, look in FoundString starting at StartPos and use the Mid function to strip out as many characters as the difference between StopPos and StartPos. Multiply by 1 to convert the result from a text string to a number, and assign the result to the function.
- Exit Function
The Exit statement returns control to the calling statement. In case an error occurred, though, control has already passed to the Nodata label.
- Nodata:
More often than not the reorder point for a product has not been reached, and so an error usually occurs. When it does, control comes here and you simply end the function. Recall that you began by assigning the function the value of -1, so if the string you’re looking for could not be found then the function retains the value of -1 and passes that value back to the calling statement.
- End Function
The RefreshSheets Subroutine
This subroutine runs once for each query sheet. It’s called by the main GetNewData subroutine.
- Sub RefreshSheets(SheetName As String)
The name of the query sheet is passed as an argument.
- With ThisWorkbook.Sheets(SheetName)
A With block is initiated so that subsequent objects, properties, and methods that begin with a dot are taken to belong to the sheet named SheetName in ThisWorkbook.
- .Activate
The current query sheet is activated so that its query can be run.
- .Cells(1, 1).Select
Select cell A1. Earlier I recommended that you cause the queries to return their results beginning in cell A1. If you adopt that recommendation, you know that the query results include that cell A1 and that if you select it you’re able to refresh the query. (It’s necessary that a cell in the query results range be active for the refresh to take place.)
- Selection.QueryTable.Refresh BackgroundQuery:=False
Refresh the query. Set BackgroundQuery to False so that the code does not continue while the refresh is taking place.
End With End Sub
Terminate the With block and end the subroutine.