3.3 Compilation Order
The first question that we address is that of specifying an order in which to build all of the targets. The primary consideration here is ensuring that before building a given target, all the targets that it depends on are already built. This is, in fact, the same problem as in §1.4.1, scheduling a set of errands.
3.3.1 Topological Sort via DFS
As mentioned in §1.4.2, a topological ordering can be computed using a depth-first search (DFS). To review, a DFS visits all of the vertices in a graph by starting at any vertex and then choosing an edge to follow. At the next vertex another edge is chosen to follow. This process continues until a dead end (a vertex with no out-edges that lead to a vertex not already discovered) is reached. The algorithm then backtracks to the last discovered vertex that is adjacent to a vertex that is not yet discovered. Once all vertices reachable from the starting vertex are explored, one of the remaining unexplored vertices is chosen and the search continues from there. The edges traversed during each of these separate searches form a depth-first tree; and all the searches form a depth-first forest. A depth-first forest for a given graph is not unique; there are typically several valid DFS forests for a graph because the order in which the adjacent vertices are visited is not specified. Each unique ordering creates a different DFS tree.
Two useful metrics in a DFS are the discover time and finish time of a vertex. Imagine that there is an integer counter that starts at zero. Every time a vertex is first visited, the value of the counter is recorded as the discover time for that vertex and the value of the counter is incremented. Likewise, once all of the vertices reachable from a given vertex have been visited, then that vertex is finished. The current value of the counter is recorded as the finish time for that vertex and the counter is incremented. The discover time of a parent in a DFS tree is always earlier than the discover time of a child. Similarly, the finish time of a parent is always later than the finish time of a child. Figure 3.2 shows a depth-first search of the file dependency graph, with the tree edges marked with black lines and with the vertices labeled with their discover and finish times (written as discover/finish).
Figure 3.2 A depth-first search of the file dependency graph. Edges in the DFS tree are black and non-tree edges are gray. Each vertex is labeled with its discover and finish time.
The relationship between topological ordering and DFS can be explained by considering three different cases at the point in the DFS when an edge (u; v) is examined. For each case, the finish time of v is always earlier than the finish time of u. Thus, the finish time is simply the topological ordering (in reverse).
Vertex v is not yet discovered. This means that v will become a descendant of u and will therefore end up with a finish time earlier than u because DFS finishes all descendants of u before finishing u.
Vertex v was discovered in an earlier DFS tree. Therefore, the finish time of v must be earlier than that of u.
Vertex v was discovered earlier in the current DFS-tree. If this case occurs, the graph contains a cycle and a topological ordering of the graph is not possible. A cycle is a path of edges such that the first vertex and last vertex of the path are the same vertex.
The main part of the depth-first search is a recursive algorithm that calls itself on each adjacent vertex. We will create a function named topo_sort_dfs() that will implement a depth-first search modified to compute a topological ordering. This first version of the function will be a straightforward, nongeneric function. In the following sections we will make modifica-tions that will finally result in a generic algorithm.
The parameters to topo_sort_dfs() include the graph, the starting vertex, a pointer to an array to record the topological order, and an array for recording which vertices have been visited. The topo_order pointer starts at the end of the array and then decrements to obtain the topological ordering from the reverse topological ordering. Note that top_ order is passed by reference so that the decrement made to it in each recursive call modifies the original object (if topo_order were instead passed by value, the decrement would happen instead to a copy of the original object).
void topo_sort_dfs(const file_dep_graph& g, vertex_t u, vertex_t*& topo_order, int* mark) { mark[u] = 1; // 1 means visited, 0 means not yet visited (For each adjacent vertex, make recursive call 47) *topo_order = u; }
The vertex_t type and edge_t types are the vertex and edge descriptors for the file_dep_graph.
typedef graph traits<file_dep_graph>::vertex_descriptor vertex_t; typedef graph_traits<file_dep_graph>::edge_descriptor edge_t;
3.3.2 Marking Vertices Using External Properties
Each vertex should be visited only once during the search. To record whether a vertex has been visited, we can mark it by creating an array that stores the mark for each vertex. In general, we use the term external property storage to refer to the technique of storing vertex or edge properties (marks are one such property) in a data structure like an array or hash table that is separate from the graph object (i.e., that is external to the graph). Property values are looked up based on some key that can be easily obtained from a vertex or edge descriptor. In this example, we use a version of adjacency_list where the the vertex descriptors are integers from zero to num_vertices(g) - 1. As a result, the vertex descriptors themselves can be used as indexes into the mark array.
3.3.3 Accessing Adjacent Vertices
In the topo_sort_dfs() function we need to access all the vertices adjacent to the vertex u. The BGL concept AdjacencyGraph defines the interface for accessing adjacent vertices. The function adjacent_vertices() takes a vertex and graph object as arguments and returns a pair of iterators whose value type is a vertex descriptor. The first iterator points to the first adjacent vertex, and the second iterator points past the end of the last adjacent vertex. The adjacent vertices are not necessarily ordered in any way. The type of the iterators is the adjacency_iterator type obtained from the graph_traits class. The reference section for adjacency_list (§14.1.1) reveals that the graph type we are using, adjacency_list, models the AdjacencyGraph concept. We may therefore correctly use the function adjacent_vertices() with our file dependency graph. The code for traversing the adjacent vertices in topo_sort_dfs() follows.
(For each adjacent vertex, make recursive call 47) ∫ graph_traits<file_dep_graph>::adjacency_iterator vi, vi_end; for (tie(vi, vi end) = adjacent vertices(u, g); vi != vi end; ++vi) if (mark[*vi] == 0) topo sort dfs(g, *vi, topo order, mark);
3.3.4 Traversing All the Vertices
One way to ensure that an ordering is obtained for every vertex in the graph (and not just those vertices reachable from a particular starting vertex) is to surround the call to topo_sort_dfs() with a loop through every vertex in the graph. The interface for traversing all the vertices in a graph is defined in the VertexListGraph concept. The vertices() function takes a graph object and returns a pair of vertex iterators. The loop through all the vertices and the creation of the mark array is encapsulated in a function called topo_sort().
void topo_sort(const file_dep_graph& g, vertex_t* topo order); [ std::vector<int> mark(num_vertices(g), 0); graph_traits<file_dep_graph>::vertex_iterator vi, vi_end; for (tie(vi, vi_end) = vertices(g); vi != vi_end; ++vi) if (mark[*vi] == 0) topo_sort_dfs(g, *vi, topo_order, &mark[0]); }
To make the output from topo_sort() more user friendly, we need to convert the vertex integers to their associated target names. We have the list of target names stored in a file (in the order that matches the vertex number) so we read in this file and store the names in an array, which we then use when printing the names of the vertices.
std::vector<std::string> name(num_vertices(g)); std::ifstream name_in("makefile-target-names.dat"); graph_traits<file_dep_graph>::vertex_iterator vi, vi_end; for (tie(vi, vi_end) = vertices(g); vi != vi_end; ++vi) name in >> name[*vi];
Now we create the order array to store the results and then apply the topological sort function.
std::vector<vertex_t> order(num_vertices(g)); topo sort(g, &order[0] + num_vertices(g)); for (int i = 0; i < num_vertices(g); ++i) std::cout << name[order[i]] << std::endl;
The output is
zag.cpp zig.cpp foo.cpp bar.cpp zow.h boz.h zig.o yow.h dax.h zag.o foo.o bar.o libfoobar.a libzigzag.a killerapp