Building the Script
NOTE
The script in this section was actually built first in the Visual Basic 6.0 integrated environment. Being able to compile and debug in the IDE environment speeds the process of program development. Then, to make it more versatile to readers, I converted it to VBScript. Visual Studio is not required to use this script.
The first thing build.vbs does is declare a minimum number of global variables: objExcel, oXMLDoc , rootElement, and workbooklevelElement. In addition, a few constants are declared: ExcelDirectory designates a directory specific to the machine being used (ExcelDirectory must be specific to your machine), and XMLDeclaration indicates the XML prologue and root element.
Next comes a call to the subroutine CreateApplicationandRootElement. The name of the rootElement is passed as a parameter. Then the program calls the GetandProcessFileList subroutine, saves the XML file wwout.xml, and exits Excel (see Listing 3).
Listing 3 Beginning of Script build.vbs
Option Explicit Dim objExcel Dim oXMLDoc Dim rootElement Dim workbooklevelElement Const ExcelDirectory="C:\__Weight Watchers Diaries\" Const XMLDeclaration="<?xml version='1.0' ?>" call CreateApplicationandRootElement("wwdiary") workbooklevelElement="wweek" call GetandProcessFileList() oXMLDoc.save("wwout.xml") objExcel.Quit Set objExcel = Nothing Public Sub CreateApplicationandRootElement(ElementName) Dim rootString rootString=XMLDeclaration & "<" & ElementName & ">" rootString=rootString & "</" & ElementName & ">" Set objExcel = CreateObject("Excel.Application") Set oXMLDoc = CreateObject("MSXML2.DOMDocument") 'Another way to load an XML Document 'oXMLDoc.Load ExcelDirectory & "xmltemplate.xml" oXMLDoc.loadXML (rootString) Set rootElement = oXMLDoc.documentElement End Sub
The GetandProcessFileList subroutine shown in Listing 4 effectively gets and verifies a list of Excel files to be processed. In this example, any document beginning with 2003 and ending in xls is used.
NOTE
Obviously, you need to customize the list with your own algorithm for obtaining a list of your spreadsheets.
Listing 4 GetandProcessFileList Function
Public Sub GetandProcessFileList() Dim objWorkbook Dim fso Dim objFolder Dim objFiles Dim objFile On Error Resume Next 'Substitute any logic to get a valid file list with directory paths set fso=CreateObject("Scripting.FileSystemObject") if err.number<>0 then msgbox "error creating fso" exit sub End if Set objFolder = fso.GetFolder(ExcelDirectory) if err.number<>0 then msgbox "error getting folder" & err.number _ & err.description exit sub End if 'document.write objFolder.name & "<BR>" Set objFiles = objFolder.Files for each objFile in objFiles 'Getting all 2003 files 'msgbox objFile.Name if fso.GetExtensionName(objFile.Name)="xls" _ and left(fso.GetBaseName(objFile.Name),4)="2003" then 'msgbox objFile.Path Set objWorkbook = objExcel.Workbooks.Open(objFile.Path) if Err.Number <> 0 then msgbox _"Excel Spreadsheet Not Found--Aborting: "_ & objFile.Name document.write _"Excel Spreadsheet Not Found--Aborting: " _ & objFile.Name Exit sub End if RunExcelXMLEngine objWorkbook end if next End Sub
The workhorse of the script is the RunExcelXMLEngine subroutine (see Listing 5). The workbook object that was opened is passed as a parameter. This is a loop; notice that the RunExcelXMLEngine subroutine is called once for every workbook in the file list. The first part of the RunExcelXMLEngine subroutine deals with capturing workbook-level data. First, the program creates the workbooklevelElement <wweek>. Then the workbook-level data is extracted from specific worksheet addresses, added as attributes, and appended to the <wweek> element, which in turn is appended or grafted to the root element.
Listing 5 RunExcelXMLEngine Capturing Global Workbook Variables
Public Sub RunExcelXMLEngine(objWorkbook) Dim objWorksheet Dim objNodeMap Dim workbookNode Dim worksheetDay Dim i, j, k Set workbookNode = oXMLDoc.createElement(workbooklevelElement) Set objNodeMap = workbookNode.Attributes Dim workBookVar 'Get the weekdate and starting day Set objWorksheet = objWorkbook.Worksheets("Totals") 'Get WeightLoss-Cell B1 workBookVar=objWorksheet.range("B1") createandAppendAttribute oXMLDoc, objNodeMap, "WeightLoss", workBookVar 'Get Weight-Cell D1 workBookVar=objWorksheet.range("D1") createandAppendAttribute oXMLDoc, objNodeMap, "Weight", workBookVar 'Get dayStart-Cell E1 workBookVar=objWorksheet.range("E1") createandAppendAttribute oXMLDoc, objNodeMap, "dayStart", workBookVar 'Get dtStart-Cell F1 workBookVar=objWorksheet.range("F1") createandAppendAttribute oXMLDoc, objNodeMap, "dtStart", workBookVar rootElement.appendChild (workbookNode)
Next we turn our attention to individual worksheets 17. In this subsection of the RunExcelXMLEngine subroutine, we capture the fixed or stationary entries on each daily worksheet. The <day> element is created to house the <foodEntry>, label, and <exerciseEntry> nodes/elements. The <day> node will house the attributes for Day and DayDate. The code is fairly straightforward (see Listing 6), using a for loop to iterate through worksheets 17.
Listing 6 RunExcelXMLEngine Capturing Worksheet Fixed or Stationary Entries
Const Breakfast = "Breakfast" Const Lunch = "Lunch" Const Dinner = "Dinner" Dim worksheetVar Dim foodEntryNode Dim labelNode Dim strTest Dim strPoints Dim strPortion Dim currentNode Dim WhichMeal Dim dayNode For i = 1 To 7 Set dayNode = oXMLDoc.createElement("day") Set objWorksheet = objWorkbook.Worksheets(i) Set objNodeMap = dayNode.Attributes worksheetVar = objWorksheet.range("A1") createandAppendAttribute oXMLDoc, objNodeMap, "Day", worksheetVar worksheetVar = objWorksheet.range("B1") createandAppendAttribute oXMLDoc, objNodeMap, "DayDate", worksheetVar
Continuing the worksheet for loop, the variable entries for <Meal>, <label>, and <foodEntry> are parsed.
The logic is as follows (see Listing 7):
Check positions A3 through A50 for the start of a <Meal>, <foodEntry>, <label>, or an empty cell.
If the cell matches "Breakfast", "Lunch", or "Dinner", a <Meal> node is starting.
If a <Meal> node is under construction (HaveCurrentNode = 1), append the under-construction node to its parent <day> node.
In any case, we create the <Meal> element using the variable CurrentNode, append its attributes, and set HaveCurrentNode to 1.
If the cell is not blank and the corresponding column C cell contains a value, we have a <foodEntry> node or a <label> node. In either case, we create the element, append attributes to the created element, and append the node to CurrentNode.
If the cell in question contains nothing, we ignore it.
Listing 7 RunExcelXMLEngine Capturing Worksheet foodEntry Nodes
'from A3 to a50 check for label or foodEntry 'Process Meals Dim HaveCurrentNode Dim HoldCell Dim HoldRow WhichMeal = "" strTest = "" HaveCurrentNode = 0 For j = 3 To 50 HoldCell = "A" & CStr(j) strTest = objWorksheet.range(HoldCell) strPortion = objWorksheet.range("B" & CStr(j)) strPoints = objWorksheet.range("C" & CStr(j)) If strTest = Breakfast Or _ strTest = Lunch Or _ strTest = Dinner Then 'We have a Section If HaveCurrentNode = 1 Then 'Finish Old Node dayNode.appendChild (currentNode) End If 'New Section Set currentNode = oXMLDoc.createElement("Meal") Set objNodeMap = currentNode.Attributes createandAppendAttribute oXMLDoc, objNodeMap, "MealTime", strTest WhichMeal = strTest HaveCurrentNode = 1 ElseIf strTest = "" Then 'ignore ElseIf strPoints = "" Then 'Assume it's a label Set labelNode = oXMLDoc.createElement("Label") Set objNodeMap = labelNode.Attributes createandAppendAttribute _ oXMLDoc, objNodeMap, "LabelText", strTest currentNode.appendChild (labelNode) Else 'Process foodEntry Nodes Set foodEntryNode = oXMLDoc.createElement("foodEntry") Set objNodeMap = foodEntryNode.Attributes createandAppendAttribute oXMLDoc, objNodeMap, "foodItem", strTest createandAppendAttribute_ oXMLDoc, objNodeMap, "Portion", strPortion createandAppendAttribute oXMLDoc, objNodeMap, "Points", strPoints currentNode.appendChild (foodEntryNode) End If Next 'j 'If HaveCurrentNode = 1 Then dayNode.appendChild (currentNode) 'End If strTest = "" HaveCurrentNode = 0
Continuing within the for loop, the <exerciseEntry> nodes are much easier to capture. Checking positions E5 through E10 on each worksheet, there either is an entry, or none. If an entry exists, we build the node with the corresponding attributes and append it to the day node. Also, we finish the main loop, set our workbook and worksheet objects to nothing, and exit.
Listing 7 RunExcelXMLEngine Capturing Worksheet exerciseEntry Nodes
'Get Exercise Dim exerciseEntryNode Dim strActivity Dim strDuration Dim strIntensity 'Goes from E5 to E10 'Activity should be in E5 'Duration should be in F5 'Intensity should be in G5 'Points should be in H5 for j=5 to 10 HoldCell = "E" & CStr(j) strActivity = objWorksheet.range(HoldCell) if strActivity<>"" then strActivity=objWorksheet.range("E" & CStr(j)) strDuration = objWorksheet.range("F" & CStr(j)) strIntensity = objWorksheet.range("G" & CStr(j)) strPoints = objWorksheet.range("H" & CStr(j)) Set exerciseEntryNode = oXMLDoc.createElement("exerciseEntry") Set objNodeMap = exerciseEntryNode.Attributes createandAppendAttribute oXMLDoc, _ objNodeMap, "Activity", strActivity createandAppendAttribute _ oXMLDoc, objNodeMap, "Duration", strDuration createandAppendAttribute _ oXMLDoc, objNodeMap, "Intensity", strIntensity createandAppendAttribute _ oXMLDoc, objNodeMap, "Points", strPoints dayNode.appendChild (exerciseEntryNode) End if next 'j workbookNode.appendChild (dayNode) Next 'i Set objWorksheet = Nothing Set objWorkbook = Nothing End Sub
The process gets progressively easier as you get accustomed to nodes, attributes, and a few loops. With a little additional work, you could build a template for the extraction specifications.
Figure 5 shows the output of transforming the XML file with an XSL style sheet.
Figure 5 Partial listing of an XSL/CSS transformation of the XML file.