Generating XML
One of the reasons for separating presentation and semantic markup internally in your application is to make exporting the semantic markup easier. One thing that I wanted to do with SourceCodeKit was generate rich HTML for the ePub edition of my Objective-C 2.0 Phrasebook.
It would be pretty easy to generate an attributed string containing the highlighted version and just dump it with tags containing this markup, but that seems like a hack. XHTML is a format that encourages semantic markup, with presentation information separated out into CSS. If you dump an attributed string containing presentation markup to HTML by using the standard AppKit functions, you get something full of <span> tags with explicit color and font directives. To change these tags, you need to go back to the original and then export them again.
I preferred to have all of this information controlled from the CSS. The procedure for generating the XHTML was very similar to that for generating the attributed string with presentation markup: Iterate over the attributed string and write out an HTML tag for each range. This process is very simple; just use the same loop as before, with the following code in the body:
NSString *token = [attrs objectForKey: kSCKTextTokenType]; NSString *semantic = [attrs objectForKey: kSCKTextSemanticType]; if (token != SCKTextTokenTypeIdentifier) { attributes = [NSDictionary dictionaryWithObject: token forKey: @"class"); } else { attributes = [NSDictionary dictionaryWithObject: semantic forKey: @"class"); } [writer startAndEndElement: @"span" attributes: attributes cdata: [code substringWithRange: r]];
This technique uses the Étoilé XML writer class to generate a <span> tag, with the class attribute set to the name of the semantic attribute for identifiers, or the name of the token type for others.
By doing so, this Objective-C source line:
[mutable sortUsingSelector: @selector(localizedCompare:)];
is transformed into this HTML:
[<span class="SCKTextTypeDeclRef">mutable</span> <span class="SCKTextTypeMessageSend">sortUsingSelector</span>: <span class="SCKTextTokenTypeKeyword">@selector</span>(localizedCompare:)];
The @selector directive is tagged as a keyword. The message receiver is tagged as a reference to a local declaration, and the selector in the message's send expression is also tagged. All of these tagged items can then be styled using CSS. If I want to try some different styles or colors, I just need to tweak the CSS file and hit the refresh button in the browser, I don't need to regenerate the HTML.
This separation also makes it easier to provide multiple CSS files for different readers. For example, making the keywords red wouldn't be particularly helpful for people with E Ink displays, which only show grayscale.
This is a pretty simple case for XML generation, because we're generating a very flat tree. Nothing stops you from generating something with a bit more structure, but that requires maintaining an attribute stack. You need to compare each range to the previous one, find the attributes that have been added or removed, and then open or close the corresponding tags.
The attributed string isn't very well suited to this kind of structured text, which is why I wrote the EtoileText framework. It provides a set of classes that do for structured text what NSAttributedString does for unstructured textprovide a tree that allows you to arrange text in a hierarchy with arbitrary attributes on the various nodes.
Going the other wayfrom an XML format to an attributed stringis very easy. Just store a stack of dictionaries containing the attributes. When you encounter a new open tag, make a mutable copy of the top dictionary, push it onto the stack, and then set it as the attributes for any character data you encounter. When you hit a close tag, pop the top dictionary from the stack.