- Why Use Binary Trees?
- Tree Terminology
- An Analogy
- How Do Binary Search Trees Work?
- Finding a Node
- Inserting a Node
- Traversing the Tree
- Finding Minimum and Maximum Key Values
- Deleting a Node
- The Efficiency of Binary Search Trees
- Trees Represented as Arrays
- Printing Trees
- Duplicate Keys
- The BinarySearchTreeTester.py Program
- The Huffman Code
- Summary
- Questions
- Experiments
- Programming Projects
Trees Represented as Arrays
Up to now, we’ve represented the binary tree nodes using objects with references for the left and right children. There’s a completely different way to represent a tree: with an array.
In the array approach, the nodes are stored in an array and are not linked by references. The position of the node in the array corresponds to its position in the tree. We put the root node at index 0. The root’s left child is placed at index 1, and its right child at index 2, and so on, progressing from left to right along each level of the tree. This approach is shown in Figure 8-24, which is a binary search tree with letters for the keys.
FIGURE 8-24 A binary tree represented by an array
Every position in the tree, whether it represents an existing node or not, corresponds to a cell in the array. Adding a node at a given position in the tree means inserting the node into the equivalent cell in the array. Cells representing tree positions with no nodes are filled with 0, None, or some other special value that cannot be confused with a node. In the figure, the ° symbol is used in the array for empty nodes.
With this scheme, a node’s children and parent can be found by applying some simple arithmetic to the node’s index number in the array. If a node’s index number is index, this node’s left child is
2 * index + 1
its right child is
2 * index + 2
and its parent is
(index – 1) // 2
(where the // indicates integer division with no remainder). You can verify these formulas work by looking at the indices in Figure 8-24. Any algorithm that follows links between nodes can easily determine where to check for the next node. The scheme works for any binary tree, not just binary search trees. It has the nice feature that edges/links between nodes are just as easy to travel up as they are going down (without the double linking needed for lists). Even better, it can be generalized to any tree with a fixed number of children.
In most situations, however, representing a tree with an array isn’t very efficient. Unfilled nodes leave holes in the array, wasting memory. Even worse, when deletion of a node involves moving subtrees, every node in the subtree must be moved to its new location in the array, which is time-consuming in large trees. For insertions that insert nodes beyond the current maximum depth of the tree, the array may need to be resized.
If deletions aren’t allowed or are very rare and the maximum depth of the tree can be predicted, the array representation may be useful, especially if obtaining memory for each node dynamically is, for some reason, too time-consuming. That might be the case when programming in assembly language or a very limited operating system, or a system with no garbage collection.
Tree Levels and Size
When trees are represented as arrays, the maximum level and number of nodes is constrained by the size of the array. For linked trees, there’s no specific maximum. For both representations, the current maximum level and number of nodes can be determined only by traversing the tree. If there will be frequent calls to request these metrics, the BinarySearchTree object can maintain values for them, but the insert() and delete() methods must be modified to update the values as nodes are added and removed.
To count nodes in a linked tree, you can use the traverse() method to iterate over all the nodes and increment a count, as shown earlier in the example to find the average key value and again in the nodes() method of Listing 8-10. To find the maximum level, you cannot use the same technique because the level of each node during the traversal is not provided (although it could be added by modifying the generator). Instead, the recursive definition shown in Listing 8-10 gets the job done in a few lines of code.
LISTING 8-10 The levels() and nodes() Methods of BinarySearchTree
class BinarySearchTree(object): # A binary search tree class … def levels(self): # Count the levels in the tree return self.__levels(self.__root) # Count starting at root def __levels(self, node): # Recursively count levels in subtree if node: # If a node is provided, then level is 1 return 1 + max(self.__levels(node.leftChild), # more than self.__levels(node.rightChild)) # max child else: return 0 # Empty subtree has no levels def nodes(self): # Count the tree nodes, using iterator count = 0 # Assume an empty tree for key, data in self.traverse(): # Iterate over all keys in any count += 1 # order and increment count return count
Counting the levels of a subtree is somewhat different than what you’ve seen before in that each node takes the maximum level of each of its subtrees and adds one to it for the node itself. It might seem as if there should be a shortcut by looking at the depth of the minimum or maximum key so that you don’t need to visit every node. If you think about it, however, even finding the minimum and maximum keys shows the depth only on the left and right “flanks” of the tree. There could be longer paths somewhere in the middle, and the only way to find them is to visit all the nodes.