- Why Use Binary Trees?
- Tree Terminology
- An Analogy
- How Do Binary Search Trees Work?
- Finding a Node
- Inserting a Node
- Traversing the Tree
- Finding Minimum and Maximum Key Values
- Deleting a Node
- The Efficiency of Binary Search Trees
- Trees Represented as Arrays
- Printing Trees
- Duplicate Keys
- The BinarySearchTreeTester.py Program
- The Huffman Code
- Summary
- Questions
- Experiments
- Programming Projects
Traversing the Tree
Traversing a tree means visiting each node in a specified order. Traversing a tree is useful in some circumstances such as going through all the records to look for ones that need some action (for example, parts of a vehicle that are sourced from a particular country). This process may not be as commonly used as finding, inserting, and deleting nodes but it is important nevertheless.
You can traverse a tree in three simple ways. They’re called pre-order, in-order, and post-order. The most commonly used order for binary search trees is in-order, so let’s look at that first and then return briefly to the other two.
In-order Traversal
An in-order traversal of a binary search tree causes all the nodes to be visited in ascending order of their key values. If you want to create a list of the data in a binary tree sorted by their keys, this is one way to do it.
The simplest way to carry out a traversal is the use of recursion (discussed in Chapter 6). A recursive method to traverse the entire tree is called with a node as an argument. Initially, this node is the root. The method needs to do only three things:
Call itself to traverse the node’s left subtree.
Visit the node.
Call itself to traverse the node’s right subtree.
Remember that visiting a node means doing something to it: displaying it, updating a field, adding it to a queue, writing it to a file, or whatever.
The three traversal orders work with any binary tree, not just with binary search trees. The traversal mechanism doesn’t pay any attention to the key values of the nodes; it only concerns itself with the node’s children and data. In other words, in-order traversal means “in order of increasing key values” only when the binary search tree criteria are used to place the nodes in the tree. The in of in-order refers to a node being visited in between the left and right subtrees. The pre of pre-order means visiting the node before visiting its children, and post-order visits the node after visiting the children. This distinction is like the differences between infix and postfix notation for arithmetic expressions described in Chapter 4, “Stacks and Queues.”
To see how this recursive traversal works, Figure 8-12 shows the calls that happen during an in-order traversal of a small binary tree. The tree variable references a four-node binary search tree. The figure shows the invocation of an inOrderTraverse() method on the tree that will call the print function on each of its nodes.
FIGURE 8-12 In-order traversal of a small tree
The blue rounded rectangles show the recursive calls on each subtree. The name of the recursive method has been abbreviated as i_o_trav() to fit all the calls in the figure. The first (outermost) call is on the root node (key 45). Each recursive call performs the three steps outlined previously. First, it makes a recursive call on the left subtree, rooted at key 27. That shows up as another blue rounded rectangle on the left of the figure.
Processing the subtree rooted at key 27 starts by making a recursive call on its left subtree, rooted at key 16. Another rectangle shows that call in the lower left. As before, its first task is to make a recursive call on its left subtree. That subtree is empty because it is a leaf node and is shown in the figure as a call to i_o_trav() with no arguments. Because the subtree is empty, nothing happens and the recursive call returns.
Back in the call to i_o_trav(16), it now reaches step 2 and “visits” the node by executing the function, print, on the node itself. This is shown in the figure as print(16) in black. In general, visiting a node would do more than just print the node’s key; it would take some action on the data stored at the node. The figure doesn’t show that action, but it would occur when the print(16) is executed.
After visiting the node with key 16, it’s time for step 3: call itself on the right subtree. The node with key 16 has no right child, which shows up as the smallest-sized rectangle because it is a call on an empty subtree. That completes the execution for the subtree rooted at key 16. Control passes back to the caller, the call on the subtree rooted at key 27.
The rest of the processing progresses similarly. The visit to the node with key 27 executes print(27) and then makes a call on its empty right subtree. That finishes the call on node 27 and control passes back to the call on the root of the tree, node 45. After executing print(45), it makes a call to traverse its right (nonempty) subtree. This is the fourth and final node in the tree, node 62. It makes a call on its empty left subtree, executes print(62), and finishes with a call on its empty right subtree. Control passes back up through the call on the root node, 45, and that ends the full tree traversal.
Pre-order and Post-order Traversals
The other two traversal orders are similar: only the sequence of visiting the node changes. For pre-order traversal, the node is visited first, and for post-order, it’s visited last. The two subtrees are always visited in the same order: left and then right. Figure 8-13 shows the execution of a pre-order traversal on the same four-node tree as in Figure 8-12. The execution of the print() function happens before visiting the two subtrees. That means that the pre-order traversal would print 45, 27, 16, 62 compared to the in-order traversal’s 16, 27, 45, 62. As the figures show, the differences between the orders are small, but the overall effect is large.
FIGURE 8-13 Pre-order traversal of a small tree
Python Code for Traversing
Let’s look at a simple way of implementing the in-order traversal now. As you saw in stacks, queues, linked lists, and other data structures, it’s straightforward to define the traversal using a function passed as an argument that gets applied to each item stored in the structure. The interesting difference with trees is that recursion makes it very compact.
Because these trees are represented using two classes, BinarySearchTree and __Node, you need methods that can operate on both types of objects. In Listing 8-5, the inOrderTraverse() method handles the traversal on BinarySearchTree objects. It serves as the public interface to the traversal and calls a private method __inOrderTraverse() to do the actual work on subtrees. It passes the root node to the private method and returns.
LISTING 8-5 Recursive Implementation of inOrderTraverse()
class BinarySearchTree(object): # A binary search tree class … def inOrderTraverse( # Visit all nodes of the tree in-order self, function=print): # and apply a function to each node self.__inOrderTraverse( # Call recursive version starting at self.__root, function=function) # root node def __inOrderTraverse( # Visit a subtree in-order, recursively self, node, function): # The subtree's root is node if node: # Check that this is real subtree self.__inOrderTraverse( # Traverse the left subtree node.leftChild, function) function(node) # Visit this node self.__inOrderTraverse( # Traverse the right subtree node.rightChild, function)
The private method expects a __Node object (or None) for its node parameter and performs the three steps on the subtree rooted at the node. First, it checks node and returns immediately if it is None because that signifies an empty subtree. For legitimate nodes, it first makes a recursive call to itself to process the left child of the node. Second, it visits the node by invoking the function on it. Third, it makes a recursive call to process the node’s right child. That’s all there is to it.
Using a Generator for Traversal
The inOrderTraverse() method is straightforward, but it has at least three shortcomings. First, to implement the other orderings, you would either need to write more methods or add a parameter that specifies the ordering to perform.
Second, the function passed as an argument to “visit” each node needs to take a __Node object as argument. That’s a private class inside the BinarySearchTree that protects the nodes from being manipulated by the caller. One alternative that avoids passing a reference to a __Node object would be to pass in only the data field (and maybe the key field) of each node as arguments to the visit function. That approach would minimize what the caller could do to the node and prevent it from altering the other node references.
Third, using a function to describe the action to perform on each node has its limitations. Typically, functions perform the same operation each time they are invoked and don’t know about the history of previous calls. During the traversal of a data structure like a tree, being able to make use of the results of previous node visits dramatically expands the possible operations. Here are some examples that you might want to do:
Add up all the values in a particular field at every node.
Get a list of all the unique strings in a field from every node.
Add the node’s key to some list if none of the previously visited nodes have a bigger value in some field.
In all these traversals, it’s very convenient to be able to accumulate results somewhere during the traversal. That’s possible to do with functions, but generators make it easier. We introduced generators in Chapter 5, and because trees share many similarities with those structures, they are very useful for traversing trees.
We address these shortcomings in a recursive generator version of the traverse method, traverse_rec(), shown in Listing 8-6. This version adds some complexity to the code but makes using traversal much easier. First, we add a parameter, traverseType, to the traverse_rec() method so that we don’t need three separate traverse routines. The first if statement verifies that this parameter is one of the three supported orderings: pre, in, and post. If not, it raises an exception. Otherwise, it launches the recursive private method, __traverse(), starting with the root node, just like inOrderTraverse() does.
There is an important but subtle point to note in calling the __traverse() method. The public traverse_rec() method returns the result of calling the private __traverse() method and does not just simply call it as a subroutine. The reason is that the traverse() method itself is not the generator; it has no yield statements. It must return the iterator produced by the call to __traverse(), which will be used by the traverse_rec() caller to iterate over the nodes.
Inside the __traverse() method, there are a series of if statements. The first one tests the base case. If node is None, then this is an empty tree (or subtree). It returns to indicate the iterator has hit the end (which Python converts to a StopIteration exception). The next if statement checks whether the traversal type is pre-order, and if it is, it yields the node’s key and data. Remember that the iterator will be paused at this point while control passes back to its caller. That is where the node will be “visited.” After the processing is done, the caller’s loop invokes this iterator to get the next node. The iterator resumes processing right after the yield statement, remembering all the context.
When the iterator resumes (or if the order was something other than pre-order), the next step is a for loop. This is a recursive generator to perform the traversal of the left subtree. It calls the __traverse() method on the node’s leftChild using the same traverseType. That creates its own iterator to process the nodes in that subtree. As nodes are yielded back as key, data pairs, this higher-level iterator yields them back to its caller. This loop construction produces a nested stack of iterators, similar to the nested invocations of i_o_trav() shown in Figure 8-12. When each iterator returns at the end of its work, it raises a StopIteration. The enclosing iterator catches each exception, so the various levels don’t interfere with one another.
LISTING 8-6 The Recursive Generator for Traversal
class BinarySearchTree(object): # A binary search tree class … def traverse_rec(self, # Traverse the tree recursively in traverseType="in"): # pre, in, or post order if traverseType in [ # Verify type is an accepted value and 'pre', 'in', 'post']: # use generator to walk the tree return self.__traverse( # yielding (key, data) pairs self.__root, traverseType) # starting at root raise ValueError("Unknown traversal type: " + str(traverseType)) def __traverse(self, # Recursive generator to traverse node, # subtree rooted at node in pre, in, or traverseType): # post order if node is None: # If subtree is empty, return # traversal is done if traverseType == "pre": # For pre-order, yield the current yield (node.key, node.data) # node before all the others for childKey, childData in self.__traverse( # Recursively node.leftChild, traverseType): # traverse the left subtree yield (childKey, childData) # yielding its nodes if traverseType == "in": # If in-order, now yield the current yield (node.key, node.data) # node for childKey, childData in self.__traverse( # Recursively node.rightChild, traverseType):# traverse right subtree yield (childKey, childData) # yielding its nodes if traverseType == "post": # If post-order, yield the current yield (node.key, node.data) # node after all the others
The rest of the __traverse() method is straightforward. After finishing the loop over all the nodes in the left subtree, the next if statement checks for the in-order traversal type and yields the node’s key and data, if that’s the ordering. The node gets processed between the left and right subtrees for an in-order traversal. After that, the right subtree is processed in its own loop, yielding each of the visited nodes back to its caller. After the right subtree is done, a check for post-order traversal determines whether the node should be yielded at this stage or not. After that, the __traverse() generator is done, ending its caller’s loop.
Making the Generator Efficient
The recursive generator has the advantage of structural simplicity. The base cases and recursive calls follow the node and child structure of the tree. Developing the prototype and proving its correct behavior flow naturally from this structure.
The generator does, however, suffer some inefficiency in execution. Each invocation of the __traverse() method invokes two loops: one for the left and one for the right child. Each of those loops creates a new iterator to yield the items from their subtrees back through the iterator created by this invocation of the __traverse() method itself. That layering of iterators extends from the root down to each leaf node.
Traversing the N items in the tree should take O(N) time, but creating a stack of iterators from the root down to each leaf adds complexity that’s proportional to the depth of the leaves. The leaves are at O(log N) depth, in the best case. That means the overall traversal of N items will take O(N×log N) time.
To achieve O(N) time, you need to apply the method discussed at the end of Chapter 6 and use a stack to hold the items being processed. The items include both Node structures and the (key, data) pairs stored at the nodes to be traversed in a particular order. Listing 8-7 shows the code.
The nonrecursive method combines the two parts of the recursive approach into a single traverse() method. The same check for the validity of the traversal type happens at the beginning. The next step creates a stack, using the Stack class built on a linked list from Chapter 5 (defined in the LinkStack module).
Initially, the method pushes the root node of the tree on the stack. That means the remaining work to do is the entire tree starting at the root. The while loop that follows works its way through the remaining work until the stack is empty.
At each pass through the while loop, the top item of the stack is popped off. Three kinds of items could be on the stack: a Node item, a (key, data) tuple, or None. The latter happens if the tree is empty and when it processes the leaf nodes (and finds their children are None).
If the top of the stack is a Node item, the traverse() method determines how to process the node’s data and its children based on the requested traversal order. It pushes items onto the stack to be processed on subsequent passes through the while loop. Because the items will be popped off the stack in the reverse order from the way they were pushed onto it, it starts by handling the case for post-order traversal.
In post-order, the first item pushed is the node’s (key, data) tuple. Because it is pushed first, it will be processed last overall. The next item pushed is the node’s right child. In post-order, this is traversed just before processing the node’s data. For the other orders, the right child is always the last node processed.
After pushing on the right child, the next if statement checks whether the in-order traversal was requested. If so, it pushes the node’s (key, data) tuple on the stack to be processed in-between the two child nodes. That’s followed by pushing the left child on the stack for processing.
LISTING 8-7 The Nonrecursive traverse() Generator
from LinkStack import * class BinarySearchTree(object): # A binary search tree class … def traverse(self, # Non-recursive generator to traverse traverseType='in'): # tree in pre, in, or post order if traverseType not in [ # Verify traversal type is an 'pre', 'in', 'post']: # accepted value raise ValueError( "Unknown traversal type: " + str(traverseType)) stack = Stack() # Create a stack stack.push(self.__root) # Put root node in stack while not stack.isEmpty(): # While there is work in the stack item = stack.pop() # Get next item if isinstance(item, self.__Node): # If it's a tree node if traverseType == 'post': # For post-order, put it last stack.push((item.key, item.data)) stack.push(item.rightChild) # Traverse right child if traverseType == 'in': # For pre-order, put item 2nd stack.push((item.key, item.data)) stack.push(item.leftChild) # Traverse left child if traverseType == 'pre': # For pre-order, put item 1st stack.push((item.key, item.data)) elif item: # Every other non-None item is a yield item # (key, value) pair to be yielded
Finally, the last if statement checks whether the pre-order traversal was requested and then pushes the node’s data on the stack for processing before the left and right children. It will be popped off during the next pass through the while loop. That completes all the work for a Node item.
The final elif statement checks for a non-None item on the stack, which must be a (key, data) tuple. When the loop finds such a tuple, it yields it back to the caller. The yield statement ensures that the traverse() method becomes a generator, not a function.
The loop doesn’t have any explicit handling of the None values that get pushed on the stack for empty root and child links. The reason is that there’s nothing to do for them: just pop them off the stack and continue on to the remaining work.
Using the stack, you have now made an O(N) generator. Each node of the tree is visited exactly once, pushed on the stack, and later popped off. Its key-data pairs and child links are also pushed on and popped off exactly once. The ordering of the node visits and child links follows the requested traversal ordering. Using the stack and carefully reversing the items pushed onto it make the code slightly more complex to understand but improve the performance.
Using the Generator for Traversing
The generator approach (both recursive and stack-based) makes the caller’s loops easy. For example, if you want to collect all the items in a tree whose data is below the average data value, you could use two loops:
total, count = 0, 0 for key, data in random_tree.traverse('pre'): total += data count += 1 average = total / count below_average = [] for key, data in random_tree.traverse('in'): if data <= average: below_average.append((key, data))
The first loop counts the number of items in random_tree and sums up their data values. The second loop finds all the items whose data is below the average and appends the key and data pair to the below_average list. Because the second loop is done in in-order, the keys in below_average are in ascending order. Being able to reference the variables that accumulate results—total, count, and below_average—without defining some global (or nonlocal) variables outside a function body, makes using the generator very convenient for traversal.
Traversing with the Visualization Tool
The Binary Search Tree Visualization tool allows you to explore the details of traversal using generators. You can launch any of the three kinds of traversals by selecting the Pre-order Traverse, In-order Traverse, or Post-order Traverse buttons. In each case, the tool executes a simple loop of the form
for key, data in tree.traverse("pre"): print(key)
To see the details, use the Step button (you can launch an operation in step mode by holding down the Shift key when selecting the button). In the code window, you first see the short traversal loop. The example calls the traverse() method to visit all the keys and data in a loop using one of the orders such as pre.
Figure 8-14 shows a snapshot near the beginning of a pre-order traversal. The code for the traverse() method appears at the lower right. To the right of the tree above the code, the stack is shown. The nodes containing keys 59 and 94 are on the stack. The top of the stack was already popped off and moved to the top right under the item label. It shows the key, 77, with a comma separating it from its colored rectangle to represent the (key, data) tuple that was pushed on the stack. The yield statement is highlighted, showing that the traverse() iterator is about to yield the key and data back to caller. The loop that called traverse() has scrolled off the code display but will be shown on the next step.
FIGURE 8-14 Traversing a tree in pre-order using the traverse() iterator
When control returns to the calling loop, the traverse() iterator disappears from the code window and so does the stack, as shown in Figure 8-15. The key and data variables are now bound to 77 and the root node’s data. The print statement is highlighted because the program is about to print the key in the output box along the bottom of the tree. The next step shows key 77 being copied to the output box.
FIGURE 8-15 The loop calling the traverse() iterator
After printing, control returns to the for key, data in tree.traverse('pre') loop. That pushes the traverse() iterator back on the code display, along with its stack similar to Figure 8-14. The while loop in the iterator finds that the stack is not empty, so it pops off the top item. That item is node 59, the left child of node 77. The process repeats by pushing on node 59’s children and the node’s key, data pair on the stack. On the next loop iteration, that tuple is popped off the stack, and it is yielded back to the print loop.
The processing of iterators is complex to describe, and the Visualization tool makes it easier to follow the different levels and steps than reading a written description. Try stepping through the processing of several nodes, including when the iterator reaches a leaf node and pushes None on the stack. The stack guides the iterator to return to nodes that remain to be processed.
Traversal Order
What’s the point of having three traversal orders? One advantage is that in-order traversal guarantees an ascending order of the keys in binary search trees. There’s a separate motivation for pre- and post-order traversals. They are very useful if you’re writing programs that parse or analyze algebraic expressions. Let’s see why that is the case.
A binary tree (not a binary search tree) can be used to represent an algebraic expression that involves binary arithmetic operators such as +, –, /, and *. The root node and every nonleaf node hold an operator. The leaf nodes hold either a variable name (like A, B, or C) or a number. Each subtree is a valid algebraic expression.
For example, the binary tree shown in Figure 8-16 represents the algebraic expression
FIGURE 8-16 Binary tree representing an algebraic expression
(A+B) * C – D / E
This is called infix notation; it’s the notation normally used in algebra. (For more on infix and postfix, see the section “Parsing Arithmetic Expressions” in Chapter 4.) Traversing the tree in order generates the correct in-order sequence A+B*C–D/E, but you need to insert the parentheses yourself to get the expected order of operations. Note that subtrees form their own subexpressions like the (A+B) * C outlined in the figure.
What does all this have to do with pre-order and post-order traversals? Let’s see what’s involved in performing a pre-order traversal. The steps are
Visit the node.
Call itself to traverse the node’s left subtree.
Call itself to traverse the node’s right subtree.
Traversing the tree shown in Figure 8-16 using pre-order and printing the node’s value would generate the expression
–*+ABC/DE
This is called prefix notation. It may look strange the first time you encounter it, but one of its nice features is that parentheses are never required; the expression is unambiguous without them. Starting on the left, each operator is applied to the next two things to its right in the expression, called the operands. For the first operator, –, these two things are a product expression, *+ABC, and a division expression, /DE. For the second operator, *, the two things are a sum expression, +AB, and a single variable, C. For the third operator, +, the two things it operates on are the variables, A and B, so this last expression would be A+B in in-order notation. Finally, the fourth operator, /, operates on the two variables D and E.
The third kind of traversal, post-order, contains the three steps arranged in yet another way:
Call itself to traverse the node’s left subtree.
Call itself to traverse the node’s right subtree.
Visit the node.
For the tree in Figure 8-16, visiting the nodes with a post-order traversal generates the expression
AB+C*DE/–
This is called postfix notation. It means “apply the last operator in the expression, –, to the two things immediately to the left of it.” The first thing is AB+C*, and the second thing is DE/. Analyzing the first thing, AB+C*, shows its meaning to be “apply the * operator to the two things immediately to the left of it, AB+ and C.” Analyzing the first thing of that expression, AB+, shows its meaning to be “apply the + operator to the two things immediately to the left of it, A and B.” It’s hard to see initially, but the “things” are always one of three kinds: a single variable, a single number, or an expression ending in a binary operator.
To process the meaning of a postfix expression, you start from the last character on the right and interpret it as follows. If it’s a binary operator, then you repeat the process to interpret two subexpressions on its left, which become the operands of the operator. If it’s a letter, then it’s a simple variable, and if it’s a number, then it’s a constant. For both variables and numbers, you “pop” them off the right side of the expression and return them to the process of the enclosing expression.
We don’t show the details here, but you can easily construct a tree like that in Figure 8-16 by using a postfix expression as input. The approach is analogous to that of evaluating a postfix expression, which you saw in the PostfixTranslate.py program in Chapter 4 and its corresponding InfixCalculator Visualization tool. Instead of storing operands on the stack, however, you store entire subtrees. You read along the postfix string from left to right as you did in the PostfixEvaluate() method. Here are the steps when you encounter an operand (a variable or a number):
Make a tree with one node that holds the operand.
Push this tree onto the stack.
Here are the steps when you encounter an operator, O:
Pop two operand trees R and L off the stack (the top of the stack has the rightmost operand, R).
Create a new tree T with the operator, O, in its root.
Attach R as the right child of T.
Attach L as the left child of T.
Push the resulting tree, T, back on the stack.
When you’re done evaluating the postfix string, you pop the one remaining item off the stack. Somewhat amazingly, this item is a complete tree depicting the algebraic expression. You can then see the prefix and infix representations of the original postfix notation (and recover the postfix expression) by traversing the tree in one of the three orderings we described. We leave an implementation of this process as an exercise.