- Introduction
- Mapping to Almost Isomorphic Tree Structures
- Structure Adjustment by XSLT
- Mapping to Tables
- Mapping to Hash Tables
- Mapping to Graph Structures
- Summary
8.6 Mapping to Graph Structures
Trees, tables, and hash tables are optimized for different access patterns. Here, we consider another frequently used data structure: graphs. Graphs are convenient when the access pattern of your program involves following links between shared data.
Let us take an application that computes the shortest path between two cities as our example. Figure 8.5 shows our problem. Circles represent cities, and arrows between them represent connections between cities. The numbers associated with the arrows are the cost of moving to one city from another. What is the shortest path from Fukuoka to Yamato? What is the cost of the shortest path? There is a well-known, simple but efficient algorithm known as Dijkstra's algorithm to solve this problem.
Figure 8.5 Graph data
We want to express this problem as an XML document and map it into a graph data structure, as shown in Figure 8.5, that is needed for implementing Dijkstra's algorithm efficiently. One natural XML encoding of the problem is something like cities.xml, shown in Listing 8.14.
Listing 8.14 Graph represented as XML, chap08/graph/cities.xml
<?xml version="1.0" encoding="UTF-8"?> <FindShortestPath xmlns="urn:shortest-path"> <StartCity>Fukuoka</StartCity> <TargetCity>Yamato</TargetCity> <Paths> <Path from="Fukuoka" to="Tokyo" cost="1"/> <Path from="Fukuoka" to="Yokohama" cost="4"/> <Path from="Fukuoka" to="Chiba" cost="7"/> <Path from="Tokyo" to="Chiba" cost="4"/> <Path from="Tokyo" to="Yokohama" cost="2"/> <Path from="Tokyo" to="Machida" cost="5"/> <Path from="Yokohama" to="Machida" cost="2"/> <Path from="Machida" to="Yamato" cost="5"/> <Path from="Chiba" to="Yamato" cost="8"/> </Paths> </FindShortestPath>
We will develop code to convert this XML document into a graph structure, whose vertexes are represented by Vertexobjects and whose edges are represented by Edgeobjects.
Part of the Vertexclass is shown in Listing 8.15. A Vertexinstance has a string, label, for the city's name and a list, outgoingEdges, that holds all the edges pointing to other cities (lines 1213). In addition, it has two variables, bestScore and bestPath, to keep track of working values during the computation. All the vertexes are kept in a single hash table named vertexList. When a vertex is needed, we call the static method register()instead of creating a new instance. This method checks whether there is already a vertex for the city and returns it if there is.
Listing 8.15 Vertexclass, chap08/graph/Vertex.java
class Vertex { [12] private String label; [13] private List outgoingEdges; private int bestScore; private Edge bestPath; private static Hashtable vertexList = new Hashtable(); static Vertex register(String label) { Vertex v = (Vertex)vertexList.get(label); if (v==null) { v = new Vertex(label); vertexList.put(label,v); } return v; } Vertex(String label) { this.label = label; this.outgoingEdges = new LinkedList(); this.bestScore = Integer.MAX_VALUE; this.bestPath = null; } void addEdge(Edge e) { this.outgoingEdges.add(e); } /* end excerpt1 */ ... }
The Edgeclass has fromVertexand toVertexto keep both ends of the connection (see Listing 8.16). When a new edge is created, it registers itself in outgoingEdgesof fromVertex.
Listing 8.16 Edgeclass, chap08/graph/Edge.java
class Edge { private Vertex fromVertex; private Vertex toVertex; private int cost; Edge(Vertex from, Vertex to, int c) { this.fromVertex = from; this.toVertex = to; this.cost = c; this.fromVertex.addEdge(this); } ... }
Now that we have both our application data structure and the input XML structure, we are ready to consider the mapping between them. Our strategy is this. We will scan an input XML document. Whenever we encounter a city, whether it appears in a Pathelement or in TargetCityor StartCity, we register it using Vertex.register(). When we see a Pathelement, we also need to create an Edgeinstance. Because the constructor of the Edgeclass takes care of updating the appropriate variables in the structure being built, we do not need to do anything other than create a new instance of Edge.
We implement our strategy in a SAX handler, ShortestPath.java, shown in Listing 8.17.
Listing 8.17 ShortestPathclass, chap08/graph/ShortestPath.java
package chap08.graph; import java.io.IOException; import org.xml.sax.SAXException; import org.xml.sax.Attributes; import org.xml.sax.helpers.DefaultHandler; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import java.util.*; public class ShortestPath extends DefaultHandler { static final String SHORTEST_PATH_URI = "urn:shortest-path"; StringBuffer buffer = null; static Vertex startCity; static Vertex targetCity; public void startElement(String uri, String local, String qname, Attributes atts) throws SAXException { if (!uri.equals(SHORTEST_PATH_URI)) return; if ( local.equals("StartCity") || local.equals("TargetCity")) { this.buffer = new StringBuffer(); } else if (local.equals("Path")) { [23] String fromString = atts.getValue("from"); [24] if (fromString==null) throw new SAXException("Attribute 'from' missing"); [25] Vertex from = Vertex.register(fromString); [26] [27] String toString = atts.getValue("to"); [28] if (toString==null) throw new SAXException("Attribute 'to' missing"); [29] Vertex to = Vertex.register(toString); String costString = atts.getValue("cost"); if (costString==null) throw new SAXException("Attribute 'cost' missing"); [33] new Edge(from,to,Integer.parseInt(costString)); } } public void endElement(String uri, String local, String qname) throws SAXException { if (!uri.equals(SHORTEST_PATH_URI)) return; if (local.equals("StartCity")) { [40] this.startCity = Vertex.register(new String(this.buffer)); buffer = null; } else if (local.equals("TargetCity")) { [43] this.targetCity = Vertex.register(new String(this.buffer)); buffer = null; } } public void characters(char[] ch, int start, int length) throws SAXException { if (this.buffer != null) { this.buffer.append(ch, start, length); } } /* end excerpt1 */
For the StartCityand TargetCityelements, we need to accumulate possibly fragmented characters in text content using a StringBuffer. This is the same technique we have seen repeatedly in this book. These city names are registered when end tags are processed (lines 40 and 43).5 When a pathelement is found, its three attributes are extracted, the from city and the to city are registered (lines 2329), and an Edgeobject is created (line 33). In this way, we can map an input XML document into our application data structure.
For completeness, we show the rest of our main class, ShortestPath, in Listing 8.18.
Listing 8.18 main()method for ShortestPathclass, chap08/graph/ ShortestPath.java(continued)
public static void main(String[] argv) throws Exception { if (argv.length < 1) { System.err.println("Usage: java chap08.graph.ShortestPath file"); System.exit(1); } SAXParserFactory factory = SAXParserFactory.newInstance(); factory.setNamespaceAware(true); SAXParser parser = factory.newSAXParser(); ShortestPath handler = new ShortestPath(); parser.parse(argv[0],handler); [68] SortedSet open = new TreeSet(new VertexComparator()); [69] startCity.update(0,null,open); [70] [71] while (!open.isEmpty()) { [72] Vertex v = (Vertex) open.first(); [73] open.remove(v); [74] for (Iterator i = v.getEdges(); i.hasNext(); ) { [75] Edge e = (Edge)i.next(); [76] e.getToVertex().update(v.getBestScore()+e.getCost(),e, open); [77] } [78] } Vertex.showAll(); }
Lines 6878 implement Dijkstra's algorithm. It uses a helper class,
VertexComparator, to compare the current cost values between two Vertex objects. Its source is included on the CD-ROM.
As we have done in this example, if you need to map XML documents into a graph structure, you need to give labels to each node and refer to these labels to express the links between nodes. Having a hash table to record the mapping between node labels and node objects is essential in this process.