- Microsoft Word as Authoring Tool
- Setting Up the Infrastructure
- Basic Transformation
- Transforming Paragraphs
- Formatting Text Ranges
- Summary
Setting Up the Infrastructure
Before you start dealing with Microsoft Word-generated XML output, you need a good understanding of XML and XSLT. You’ll find a wealth of information on Informit and in related books. Following are some good starting points:
- Frank Coyle’s series "Seven Steps to XML Mastery"
- Steve Holzner’s article "XSL Transformations"
- Steve Holzner’s book Sams Teach Yourself XML in 21 days, Third Edition (at minimum, read "Day 9. Formatting XML by Using XSLT" in the online book)
The Word XML schemas are well-documented in downloadable documentation from Microsoft. Download the documentation, install the attached Help file, and read the "Word Schema Overview" document in the "Word" section of the Help file.
If you need samples of WordProcessingML documents to better understand their structure, try this:
- Write a short Word document using the features that interest you. For example, if you wonder how images are handled in WordProcessingML, embed some sample images in your test document.
- Save the document as an XML file.
- Open the resulting XML file in a text editor (Notepad or WordPad will do
just fine). Remove the following line, which tells Internet Explorer to open the
file in Word, not in an Explorer window:
<?mso-application progid="Word.Document"?>
- Save the file.
- Open the WordProcessingML file with a web browser (Internet Explorer or Firefox) and inspect its content in tree-formatted structure.
When your XSL transformation is finished, you can use it straight from the Save As dialog box in Word. During the development process, it’s beneficial to have a command-line XSL translator that produces rich error messages. Saxon, a free XSL translator available from SourceForge or Saxonica, fits these requirements nicely.