The VBA Code
This section walks you through the VBA code that I use to update my data collector hourly. It runs itself, so I can get it going, walk away, and not come back for hours or days. Meantime, the code runs hourly and obtains data from websites during each run. You can view the code by using the Visual Basic command on the Ribbon’s Developer tab.
You arrange for a fresh module by choosing Insert, Module. The code that’s shown in this chapter, or your own VBA code, goes into the new module.
I have provided all the code needed for the example given in this chapter. You’ll find it in a workbook named Collector.xlsm, which you can download from the publisher’s website. After opening the workbook, open the Visual Basic Editor as the prior Note describes. You see a Project window, usually at the left side of the screen. There’s an entry in that window that’s headed VBAProject(Collector.xlsm). If necessary, click the Expand button (with the plus sign) to the left of that project—this displays the objects stored in the project. The final object in the project’s list is named Module1. Double click it to display the VBA code that’s stored in that module. I describe that code in subsequent sections of this chapter.
The figures in this chapter show that I’m obtaining data on eight books. You can extend it to more products (or restrict it to fewer) if you want.
I should also note that I’ve revised the code somewhat to make clearer what’s going on. A number of modifications would structure the code a bit more tightly and run a little faster. But they tend to obscure the general flow of the task.
Option Explicit
I always specify Option Explicit at the top of my VBA modules. Doing so forces me to declare variables explicitly, often with Dim statements. For example:
Dim NextRow as Integer
Dim, short for dimension, informs VBA that a variable named NextRow is intended to exist in the module’s code: The variable is declared to exist. Later on, if I mistype the name of the variable in the code
NextRoe = NextRow + 1
for example, VBA complains that I’m trying to use a variable that I haven’t declared. If I don’t have Option Explicit at the top of the module, VBA assumes that I meant to implicitly declare a new variable, NextRoe, and assigns to it the current value of NextRow + 1. The implicit, on-the-fly declaration of variables was common programming practice 50 years ago. VBA still tolerates it unless you protect yourself with Option Explicit.
The DoItAgain Subroutine
This very short, three-statement subroutine causes the main GetNewData subroutine to run an hour from now:
Private Sub DoItAgain( )
GetNewData True
End Sub
The DoItAgain subroutine runs in two cases:
The user has clicked the button labeled Repeat on the Summary worksheet.
Excel’s timer has run to the point when Excel has been instructed to run the GetNewData subroutine again.
In either case. The DoItAgain subroutine calls the GetNewData subroutine with the argument True. I explain the effect of that argument in the section on GetNewData.
You normally initiate the code in the Collector workbook by clicking the Repeat button on the Summary worksheet, right after opening the workbook. Doing so gets things going and tells Excel to repeat the process an hour later without any further prompting from you.
The DontRepeat Subroutine
This is another very short subroutine, and it’s used at times when you want to query the online data source and record the query’s results, without initiating a sequence of repeating subroutines.
Suppose that you open the workbook at 8:00 p.m. and click the Repeat button. That causes the queries to execute, the results to be saved, and that sequence to repeat at 9:00 p.m. (or a little later). Fifteen minutes later, at 9:15 p.m., you decide that you want a fresh set of results. If you get that fresh set by clicking the Repeat button again, you initiate a new set of repeating subroutines. The first set will repeat at 10:00 p.m., 11:00 p.m. and so on, and the second set will repeat at 10:15 p.m., 11:15 p.m., and so on. You wind up with twice as many sets of query results as you wanted.
That’s the point of the DontRepeat subroutine:
Sub DontRepeat( )
GetNewData False
End Sub
Just like the DoItAgain subroutine, the DontRepeat subroutine calls GetNewData, but it does so with a False argument instead of True. As you’ll see, the False argument prevents DoItAgain from initiating a new set of queries one hour hence.
The PrepForAgain Subroutine
Yet another very brief subroutine. Its code is
Sub PrepForAgain( )
Application.OnTime Now + TimeValue("01:01:00"), "DoItAgain"
End Sub
As you’ll see in the next section, PrepForAgain runs only if the GetNewData subroutine has been called with its Again argument set to True. This True/False distinction enables the user to choose to initiate a new sequence of hourly queries, or to run the queries once only.
The PrepForAgain subroutine uses VBA’s OnTime method to schedule the time when a procedure is to run next. The time is specified first, using the fragment Now + TimeValue("01:01:00"). VBA has a function named Now that returns the current system date and time. VBA has another function named TimeValue that converts a string to a time value.
So, Now + TimeValue("01:01:00") gets the time as of right now and adds to it one hour, one minute, and zero seconds.
The second argument to OnTime as used here is DoItAgain, which calls the project’s main procedure, GetNewData. (The name of the subroutine DoItAgain must be enclosed in quotes when it’s used as an argument.) So, if it’s run, the PrepForAgain subroutine instructs VBA to run the procedure named DoItAgain, 61 minutes after the OnTime method is invoked.
I settled on an hour and a minute after some experimenting. Amazon states that it updates the rankings hourly, and that seems to be nearly correct. During and around national holidays the website seems to update sales figures every two hours instead of each hour. Sometimes the update doesn’t happen exactly when it should, and sometimes several hours pass before the next update takes place. These events are mildly atypical, though, and something close to an hour is normal.
However, it takes a bit of time for my code to run: to acquire new data by way of the queries and to test the changes in rankings for evidence of a sale. So I add a minute to the one hour in order to sync back up with Amazon’s schedule.
The GetNewData Subroutine
Here’s the project’s main procedure: GetNewData. It executes web queries, obtains sales rankings, and copies and pastes formulas that convert rankings to assumed sales figures.
Sub GetNewData(Again As Boolean)
The first statement in GetNewData, the Sub statement, informs VBA that the statement that calls GetNewData also passes a Boolean, True/False value to GetNewData. That Boolean value, which is to be referred to by the variable name Again, later determines whether GetNewData is to run again automatically, 61 minutes later.
Dim NextRow As Integer, NextRank As Long
I begin by declaring the variables I need in this procedure:
NextRow is the worksheet row where I write the next set of query data.
NextRank contains the next sales ranking for a given book.
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
I don’t like to be distracted by the screen if I happen to be looking at it when the code changes the active worksheet. So I turn off screen updating and the Excel window stays put until the code has finished running, or until a statement turning screen updating back on is executed (you set its value to True instead of False). As a side benefit, the code executes a trifle quicker with ScreenUpdating turned off.
Also, I don’t want formulas recalculating as the queries write new data to the worksheets; I let them recalculate after the queries have finished running. That can have a substantial effect on the project’s speed of execution. So, I set Excel’s calculation property to manual—later on I reset it to automatic.
Application.StatusBar = "Read 1"
RefreshSheets ("Decide")
Application.StatusBar = “Read 2"
RefreshSheets ("Decide Kindle")
The next 16 statements consist of 8 pairs of commands. The first command updates Excel’s Status Bar, so that the user can check the code’s progress. (The Status Bar’s value is independent of the ScreenUpdating property’s value.) The pairs of commands are redundant, so I’m showing only four of the 16 commands here.
The second of each pair of commands calls the RefreshSheets subroutine with the name of the query sheet (in quotes) as the argument. It calls RefreshSheets once for each query sheet. As noted earlier, Collector’s code actually calls this subroutine eight times but I don’t want to subject you to all eight calls here—what happens is largely the same in each call. The RefreshSheets subroutine is shown and described following this description of the main GetNewData subroutine.
NextRow = ThisWorkbook.Sheets("Summary").Cells(Rows.Count, 1).End(xlUp).Row + 1
After the query for the eighth query sheet is complete, the code needs to find the next available row on the Summary sheet so that it can write the new data from the queries to that sheet without overwriting existing records. Here’s how it does that.
If you’re using an Excel worksheet directly, one of the ways to move around is with Ctrl + arrow, where arrow means up or down arrow, right or left arrow. If the active cell is in a column of data, for example, you can find the last contiguous non-empty cell with Ctrl + Down Arrow. If you’re at the bottom of a column, something such as A1048576, and that cell is empty, you can find the lowermost non-blank cell with Ctrl + Up Arrow.
Using VBA, the way to emulate that keyboard action is with the End property that belongs to a cell object (really a range object because a cell is a special instance of a range). So, Cells(Rows.Count, 1).End(xlUp) tells VBA to go to the final row (Rows.Count) in Column A (1) and act as though you had pressed Ctrl + Up Arrow. That takes VBA to the lowermost non-empty cell in Column A, assuming as I do that the cell defined by the final row of Column A is itself empty. In the Collector project, I also assume that there are no extraneous values in column A between the final legitimate date and the last row of the worksheet.
Then all I have to do is get the row where the non-empty cell is found, add 1 to the row number, and I have the row number that I need to write into next. The result is assigned to the variable NextRow.
I do have to tell VBA which cell to regard as the active cell, and I do that by citing the Summary worksheet, and directing VBA’s attention to that worksheet’s final row, first column. I start things out with ThisWorkbook just in case I have more than one workbook open and some other workbook is active at the time that the code starts to run. The ThisWorkbook object specifies the workbook that contains the code that is running.
ThisWorkbook.Sheets(“Summary”).Cells(NextRow, 1) = Now
Having found the next row to write in, I put the current date and time into the Summary worksheet, in the first column of that row.
With ThisWorkbook.Sheets(“Summary”)
The With statement initiates a block of code in which objects that are preceded by a dot, such as .Cells(NextRow,2), are deemed by VBA to belong to the object named in the With block—here, that’s the worksheet named Summary. This convention in VBA relieves you of having to repeat the containing object over and over in a block of statements such as this one, whose end is indicated by the End With statement.
Application.StatusBar = "Write 1"
.Cells(NextRow, 2) = GetRank("Stats", " in")
Application.StatusBar = "Write 2"
.Cells(NextRow, 3) = GetRank("Decide", " in")
The four statements just shown are repeated for the next six books, making for eight pairs of statements. The first of each pair again informs the user of the code’s progress in the status bar. The second statement in a pair tells Excel which cell to write a book’s current ranking to (for example, NextRow, 2 and NextRow, 3). The second statement also tells Excel which query sheet contains the ranking of interest ("Stats" and "Decide") as well as a pointer ("in") to help Excel find the ranking in that sheet. More on that pointer shortly.
The GetRank procedure, which is actually a function written in VBA, is discussed later in this section. The value returned by that function, which is the current sales rank of the book in question, is written to the appropriate row and column. (Column 2 is specific to and reserved for sales rankings of the book identified in the worksheet tabs as Stats. When queries on other books are run later in the code, the value assigned at that point to NextRank is written to a different column.) Because of the earlier With statement, the cell in the NextRow-th row and the second column is taken to belong to the worksheet named Summary in the workbook that contains the running code.
Continuing with the code, the next two statements—like the prior four—show the progress, get the sales rank for the book whose query sheet is named "Stats Kindle," and write the sales rank to the NextRow-th row, fourth column on the Summary sheet:
Application.StatusBar = "Write 3"
.Cells(NextRow, 4) = GetRank("Stats Kindle", " Paid in")
Notice that the second argument to the GetRank function is now "Paid in" instead of "in". The reason is that this is a Kindle book, and its Amazon web page follows the sales rank with "Paid in" rather than simply "in". The string passed to the GetRank function is used to help strip out the actual ranking from its surrounding verbiage; you find out how when we get to that function.
In the prior section titled “Identifying Sales,” I discussed how some worksheet formulas can be used to decide if a sale of a book occurred by comparing two consecutive rankings of the same book. Now the VBA code selects, copies, and pastes those formulas into the next row down. The newly pasted formulas then recalculate based on the differences between the most recent and the now-current rankings. Begin by activating the Summary worksheet, which belongs to the ThisWorkbook object:
.Activate
Select the cells that contain the formulas to copy. They are found in the prior row—that is, in the row that precedes NextRow, where we have just written the current query data. Select the cells in that row from the tenth column to the seventeenth:
.Range(.Cells(NextRow - 1, 10), .Cells(NextRow - 1, 17)).Select
Autofill the selection one row down. The addresses used by the formulas automatically adjust so that they point to the newly current row:
Selection.AutoFill Destination:=.Range(.Cells(NextRow - 1, 10), _
.Cells(NextRow, 17)), Type:=xlFillDefault
Notice that the destination of the autofill includes the row NextRow - 1 as well as NextRow. It’s a minor peculiarity of the Autofill method that the destination range must include the source range. In this case, the source range is the row NextRow - 1 (columns 10 through 17) and that range is to be autofilled into the row NextRow (columns 10 through 17). But that destination range as specified in the VBA statement must include the source range.
.Rows(NextRow + 1).Select
Selection.Insert Shift:=xlDown
.Cells(NextRow, 1).Select
The prior three statements insert a blank row following the NextRow row—that is, the row that the code has just populated with sales rankings and sales formulas. The reason is that there is a pivot table (discussed later in this chapter) that is found a few rows below the rankings. It’s helpful to push that table down a row when a new row has been populated with rankings.
End With
Terminate the With block that deems any object beginning with a dot (such as .Cells) to belong to the Summary worksheet in ThisWorkbook.
Application.Calculation = xlCalculationAutomatic
Turn automatic calculation back on.
Application.StatusBar = False
Remove the most recent progress message from the status bar.
ThisWorkbook.Save
Save the workbook.
Application.ScreenUpdating = True
Turn screen updates back on.
If Again Then
PrepForAgain
End If
The value of the Boolean variable Again was passed to the GetNewData procedure by either DoItAgain, which passed the value True, or by DontRepeat, which passed the value False. If Again is True, then PrepForAgain is run, informing Excel to run GetNewData another time, one hour hence. If Again is False, Excel is not so informed and processing will cease until the next time the user clicks one of the buttons on the Summary sheet.
End Sub
End the GetNewData subroutine.
The GetRank Function
Here’s the function named GetRank. Notice that it has two arguments, SheetName and TrailingString. The values are passed to the function by the statement that calls it. For example:
.Cells(NextRow, 2) = GetRank("Stats", " in")
where "Stats" is the worksheet name and "in" is a string of characters that can be found on the web page for the Stats book. The string follows—that is, trails—the actual ranking.
Function GetRank(SheetName As String, TrailingString As String) As Long
The function returns a value to the statement that called it. That value is often of a certain type: integer, a text string, a decimal number, and so on. To accommodate that, the function itself is declared to be of a certain type. Here, that type is Long. In VBA, Long means an integer value that might be much larger than the maximum value for a regular integer, which tops out at 32,767. There are many more books than that in Amazon’s inventory, and this function is intended to return a book’s ranking. Therefore, to accommodate the possibility that the rank is greater than 32,767, I declare the function as type Long, which can return a value greater than 2 billion—more than large enough.
Dim FoundString As String
Dim StartPos As Integer, StopPos As Integer
Three variables are declared. FoundString holds the value of the sales rank. It’s declared as a string because when the code strips it out of the surrounding text from its web page, it’s a string. Later it’s converted to a Long integer.
StartPos and StopPos determine the starting and stopping positions of the sales rank within the string. For example, this string:
Amazon Best Sellers Rank: #64,788 in Books
has its first numeric representation in character number 28 of the string, so StartPos is assigned the value 28.
On Error GoTo EarlyOut
This code is designed to run 24/7, so I can’t expect myself to nurse it along if it runs into an error that it can’t recover from. The On Error statement tells VBA that if it encounters an error such as a letter right in the middle of what is supposed to be a numeric sales ranking, it shouldn’t terminate processing. Instead, transfer control to a point named EarlyOut. That way, processing can continue. Even if it happens at 2:00 a.m., I can check it out at 9:00 a.m. and fix whatever happened, and I won’t necessarily have missed data from the intervening seven hours.
The next five statements are responsible for finding the cell in the worksheet that contains the sales ranking, stripping the numeric ranking out of that cell, and assigning the result to the GetRank function.
First, find the cell that contains the string "sellers Rank:". The rank we’re looking for comes directly after that string.
Cells.Find(What:=”sellers Rank:”, After:=ActiveCell, LookIn:= _
xlFormulas, LookAt:=xlPart, SearchOrder:=xlByRows, SearchDirection:= _
xlNext, MatchCase:=False).Activate
The Find method is used to locate the string—in this case, the string is "sellers Rank:". Most of the arguments, such as LookIn and LookAt, are there to keep the settings in place for the Find dialog box in Excel’s user interface. I don’t much care about forcing a Case match, so long as I get the string I’m looking for. Note that if the string is not found, an error occurs and the earlier On Error statement transfers control to the subroutine’s EarlyOut label.
FoundString = ActiveCell
For convenience in subsequent handling, I set the string variable FoundString to the contents of the newly active cell, where "sellers Rank:" is found.
StartPos = InStr(FoundString, " #") + 2
I use VBA’s InStr function to locate the space and the pound sign (#) in the contents of the active cell. I want to start the string that contains the sales rank numbers immediately after the space and pound sign, so I add 2 to the position where that substring is found. The result is stored in StartPos.
StopPos = InStr(FoundString, TrailingString)
Then I look inside FoundString again, this time to find the trailing string—the characters that Amazon supplies right after the sales ranking. In the case of tangible books, the ranking is followed by "in". In the case of Kindle books, the ranking is followed by "Paid in"—or at least that’s the case in June 2017. So the value of TrailingString is passed to the GetRank function, along with the name of the query sheet. If I’m querying for a tangible book, I pass "in" and if I’m querying for a Kindle book, I pass "Paid in".
GetRank = 1 * Mid(FoundString, StartPos, StopPos - StartPos)
Finally, the code gets the value of the sales ranking using VBA’s Mid function, which returns a string that is inside another string. In this case, Mid is instructed to look in FoundString, beginning at StartPos, and to return (StopPos – StartPos) characters. Those characters are the ranking, but because they came from a string they are still a string. Therefore I multiply the result by 1 to coerce the string to a numeric value, and assign it to the function GetRank, which returns the numeric value to the calling procedure.
The prior statement assigns the value of the sales rank to the function GetRank itself. When the function terminates and control returns to the statement that called the function, GetRank itself equals the value that it calculated. And in the statement that called the function, which is
.Cells(NextRow, 2) = GetRank("Stats", " in")
the value of the GetRank function is written to the specified cell.
On Error GoTo 0
This second On Error statement returns Excel’s behavior if it encounters a runtime error to its default status. Telling VBA to “go to 0” in the event of an error causes VBA to terminate with a runtime error if it hits another error. When this function cedes control back to the calling statement, the prior On Error statement is no longer in effect and so this On Error GoTo 0 statement could be omitted, but I wanted you to know about it.
Exit Function
In the normal course of events, the function terminates at the Exit Function statement and control returns to the statement that called the function. But if something goes wrong, the Exit Function statement is bypassed (when the first On Error statement transfers control to the EarlyOut label) and some minor housekeeping takes place before the function terminates.
Here’s the label I mentioned earlier. Control transfers here if something goes wrong in the portion of the function that locates the sales ranking and converts it to a Long integer:
EarlyOut:
GetRank = 0
The value returned by the function is set to zero. That’s the value that is written to the Summary sheet as the sales rank. When I see that, I know that something went wrong. Perhaps Amazon’s code had removed the old ranking from its page and was getting ready to put the new ranking in place at the moment that this code executed its query.
End Function
After a value, even an illogical one such as a sales ranking of zero, is assigned to the function, its task is completed and control returns to the calling statement in GetNewData.
The RefreshSheets Subroutine
This subroutine runs once for each query sheet. It’s called by the main GetNewData subroutine.
Sub RefreshSheets(SheetName As String)
The name of the query sheet is passed as an argument.
With ThisWorkbook.Sheets(SheetName)
A With block is initiated so that subsequent objects, properties, and methods that begin with a dot are taken to belong to the sheet named SheetName in ThisWorkbook.
.Activate
The current query sheet is activated so that its query can be run.
.Cells(1, 1).Select
Select cell A1. Earlier I recommended that you cause the queries to return their results beginning in cell A1. If you adopt that recommendation, you know that the query results include that cell A1 and that if you select it you’re able to refresh the query. (It’s necessary that a cell in the query results range be active for the refresh to take place.)
Selection.QueryTable.Refresh BackgroundQuery:=False
Refresh the query. Set BackgroundQuery to False so that the code does not continue while the refresh is taking place.
End With
End Sub
Terminate the With block and end the subroutine.