- Topological Sort
- Resource-Management Approach
- Conclusion
- Bibliography
Resource-Management Approach
The alternative to a string-based approach is to take resource management into our own hands. First, let's combine the string and the Item into a single data structure called Node. This time, however, the list of successors will be implemented as a vector of pointers to nodes. Since the same node may appear on several lists of successors, the list can't be the sole owner of a node. The simplest solution in such a case is to treat these pointers to nodes as weak pointers. That means, in particular, that we'll have to provide a separate "owner" data structure for them. Here's the declaration of Node:
class Node { public: typedef std::vector<Node *>::iterator iter; Node (string name): _name (name), _predCount (0) {} string GetName () { return _name; } void IncPred () { _predCount++; } void DecPred () { _predCount--; } void AddSucc (Node * node) { _succ.push_back (node); } int PredCount () const { return _predCount; } iter begin () { return _succ.begin (); } iter end () { return _succ.end (); } private string _name; int _predCount; vector<Node *> _succ; // weak pointers };
The owner of all the nodes will be a strong vector, nodeOwner. We'll use the auto_vector introduced previously. Conceptually, you might think of it as a standard vector of auto_ptr, keeping in mind the particularities of such implementation.
This is the code in main that creates and fills the auto_vector:
auto_vector<Node> nodeOwner; InputData (nodeOwner);
We fill this vector with dynamically allocated nodesone node per unique input string. The input algorithm is pretty straightforward. We still have to keep a map of string/node pairs in order to avoid allocating multiple nodes for the same string, but the lifetime of this map is limited to the duration of the input process.
void InputData (auto_vector<Node> & nodeOwner) { map<string, Node *> itemMap; string pred, succ; while (cin >> pred >> succ) { Node * node1 = itemMap [pred]; if (node1 == 0) { auto_ptr<Node> node (new Node (pred)); node1 = node.get (); itemMap [pred] = node1; nodeOwner.push_back (node); } Node * node2 = itemMap [succ]; if (node2 == 0) { auto_ptr<Node> node (new Node (succ)); node2 = node.get (); itemMap [succ] = node2; nodeOwner.push_back (node); } node1->AddSucc (node2); node2->IncPred (); } }
Notice the application of standard resource management rules. A new node is allocated in the context of the constructor of an auto_ptr. The ownership of the node is transferred from the auto_ptr to the nodeOwner by passing it by value. The order of statements is important here. The auto_ptr should not be accessed after performing the push_backsuch code would not work with the newer implementations of the standard library.
As before, we create a vector of items whose predecessor count is zero. This time, however, it's a vector of (weak) pointers to nodes. We prefill it by iterating over the strong vector of nodes. Remember that we have provided the auto_vector with a special iterator that returns pointers rather than auto_ptrs. (This way we won't transfer the ownership of nodes by accident.)
vector<Node *> zeroes; typedef auto_vector<Node>::iterator Iter; Iter it = nodeOwner.begin (); Iter end = nodeOwner.end (); for (; it != end; ++it) { Node * node = *it; if (node->PredCount () == 0) zeroes.push_back (node); }
The main loop of the algorithm is simplified by the fact that we can now access the successors of each element directly, rather than by going through string lookup:
int count = 0; while (zeroes.size () != 0) { Node * node = zeroes.back (); cout << node->GetName () << endl; count++; zeroes.pop_back (); for (Node::iter itSuc = node->begin (); itSuc != node->end (); ++itSuc) { Node * nodeSuc = *itSuc; nodeSuc->DecPred (); if (nodeSuc->PredCount () == 0) zeroes.push_back (nodeSuc); } }
Tidying Up
I have deliberately gone through this rather minimalist translation of a string-based algorithm into a resource-managementaware algorithm. The resulting implementation may be polished some more. For instance, it makes perfect sense to hide the details of node management in a special-purpose class, NodeAdder:
class NodeAdder { public: NodeAdder (auto_vector<Node> > & nodeOwner) :_nodeOwner (nodeOwner) {} Node * GetNode (string const & name) { map<string, Node *>::iterator it = _itemMap.find (name); if (it != _itemMap.end ()) return it->second; auto_ptr<Node> newNode (new Node (name)); Node * node = newNode.get (); _itemMap [name] = node; _nodeOwner.push_back (newNode); return node; } private auto_vector<Node> & _nodeOwner; map<string, Node *> _itemMap; };
Notice the use of the map's find method where we previously accessed it through associative indexing. This way, we avoid creating a temporary empty map entry in case the string is not found.
Our InputData procedure is now considerably simpler:
void InputData (auto_vecor<Node> & nodeOwner) { NodeAdder nodes (nodeOwner); string pred, succ; while (cin >> pred >> succ) { Node * node1 = nodes.GetNode (pred); Node * node2 = nodes.GetNode (succ); node1->AddSucc (node2); node2->IncPred (); } }
Performance
My program is obviously more complicated that the original string-based program presented by Andrew Koenig (making his version a better choice for a conference). It exposes the plumbing of resource management, especially during the initial input phase of the algorithm. So is it at least faster?
I instrumented both versions and ran them over a large data set. Here are the results (stringy uses strings, and strongy uses strong pointers):
E:\Work\topo\stringy>release\stringy < pairs.txt > sorted.txt Processing 23617 pairs Elapsed clocks: 3325 E:\Work\topo\strongy>release\strongy < pairs.txt > sorted.txt Processing 23617 pairs Elapsed clocks: 2213
As you can see, there indeed is a visible performance gain (33% in this case). It's not spectacular, since my implementation only cuts down on logarithmic-time lookups, but it's there! It's always good to know what the tradeoffs are.
Click here to download the code for these examples.