Workbook Processing
The starting point for a conversion is to decide what information to obtain. After that, deciding on the overall structure or hierarchy to be built is crucial. Figure 3 (shown earlier) illustrates the structure I selected for this conversion.
In the process of determining the structure, the main elements fall into place. In this document, the root element is <wwdiary>; the <wweek> element contains the <day> elements; a <day> element contains both <Meal> and <exerciseEntry> elements; and <Meal> elements contain <Label> and <foodEntry> elements.
Finally, the attributes get mapped to the elements. There is no hard-and-fast rule to determine whether a piece of data is an attribute or another element.
NOTE
The elements are the tags such as <Meal></Meal>. The attributes are inside the elements in a name/value pair, with the values surrounded by quotes: foodItem="Bacon".
Data for the diary workbook is divided logically into global workbook data and worksheet-specific data. The first task is to map the locations of the data for both the elements and the attributes.
NOTE
Part of this mapping process may involve straightening or making the data in the worksheets more consistent. This step may simplify the capture process.
The global attributesWeightLoss, Weight, dayStart, and dtStartcome from the Totals worksheet positions B1, D1, E1, and F1, respectively. These are easy to obtain directly without any parsingbasically, just transfer the cell's contents to a variable.
The remaining elements and attributes come from the individual worksheet pages. Each of the first seven worksheets, 1 to 7, corresponds to a weekday, Sunday through Saturday.
The Day and DayDate attributes of the <day> element are easiest to extract. These items correspond to the day of the week and its date, and are located in each of the daily worksheets in the fixed positions A1 and A2.
The remaining elements take a little more parsing because they're not tied to fixed positions, but to a range of positions. The <Meal> and <exerciseEntry> elements fall into this category. The <Meal> element is actually the most complex because of its child element, <foodEntry>.
For the <Meal> element, the range of cells A3:A50 requires parsing. Each cell entry consists of one of the following:
Meal designation ("Breakfast", "Lunch", "Dinner")
Descriptive label of a group of <foodEntry> elements
<foodEntry> element
Empty string ("") indicating no data
For the <exerciseEntry> elements, cells in the range of E5:E10 were parsed for either a text entry (indicative of an exercise event) or an empty string (indicating no data).