- Why Use Binary Trees?
- Tree Terminology
- An Analogy
- How Do Binary Search Trees Work?
- Finding a Node
- Inserting a Node
- Traversing the Tree
- Finding Minimum and Maximum Key Values
- Deleting a Node
- The Efficiency of Binary Search Trees
- Trees Represented as Arrays
- Printing Trees
- Duplicate Keys
- The BinarySearchTreeTester.py Program
- The Huffman Code
- Summary
- Questions
- Experiments
- Programming Projects
The Efficiency of Binary Search Trees
As you’ve seen, most operations with trees involve descending the tree from level to level to find a particular node. How long does this operation take? We mentioned earlier that the efficiency of finding a node could range from O(log N) to O(N), but let’s look at the details.
In a full, balanced tree, about half the nodes are on the bottom level. More accurately, in a full, balanced tree, there’s exactly one more node on the bottom row than in the rest of the tree. Thus, about half of all searches or insertions or deletions require finding a node on the lowest level. (About a quarter of all search operations require finding the node on the next-to-lowest level, and so on.)
During a search, you need to visit one node on each level. You can get a good idea how long it takes to carry out these operations by knowing how many levels there are. Assuming a full, balanced tree, Table 8-1 shows how many levels are necessary to hold a given number of nodes.
Table 8-1 Number of Levels for Specified Number of Nodes
Number of Nodes |
Number of Levels |
---|---|
1 |
1 |
3 |
2 |
7 |
3 |
15 |
4 |
31 |
5 |
… |
… |
1,023 |
10 |
… |
… |
32,767 |
15 |
… |
… |
1,048,575 |
20 |
… |
… |
33,554,431 |
25 |
… |
… |
1,073,741,823 |
30 |
The numbers are very much like those for searching the ordered array discussed in Chapter 2. In that case, the number of comparisons for a binary search was approximately equal to the base 2 logarithm of the number of cells in the array. Here, if you call the number of nodes in the first column N, and the number of levels in the second column L, you can say that N is 1 less than 2 raised to the power L, or
N = 2L – 1
Adding 1 to both sides of the equation, you have
N + 1 = 2L
Using what you learned in Chapter 2 about logarithms being the inverse of raising a number to a power, you can take the logarithm of both sides and rearrange the terms to get
log2(N + 1) = log2(2L) = L
L = log2(N + 1)
Thus, the time needed to carry out the common tree operations is proportional to the base 2 log of N. In Big O notation, you say such operations take O(log N) time.
If the tree isn’t full or balanced, the analysis is difficult. You can say that for a tree with a given number of levels, average search times will be shorter for the nonfull tree than the full tree because fewer searches will proceed to lower levels.
Compare the tree to the other data storage structures we’ve discussed so far. In an unordered array or a linked list containing 1,000,000 items, finding the item you want takes, on average, 500,000 comparisons, basically O(N). In a balanced tree of 1,000,000 items, only 20 (or fewer) comparisons are required because it’s O(log N).
In an ordered array, you can find an item equally quickly, but inserting an item requires, on average, moving 500,000 items. Inserting an item in a tree with 1,000,000 items requires 20 or fewer comparisons, plus a small amount of time to connect the item. The extra time is constant and doesn’t depend on the number of items.
Similarly, deleting an item from a 1,000,000-item array requires moving an average of 500,000 items, while deleting an item from a 1,000,000-node tree requires 20 or fewer comparisons to find the item, plus a few more comparisons to find its successor, plus a short time to disconnect the item and connect its successor. Because the successor is somewhere lower in the tree than the node to delete, the total number of comparisons to find both the node and its successor will be 20 or fewer.
Thus, a tree provides high efficiency for all the common data storage operations: searches, insertions, and deletions. Traversing is not as fast as the other operations, but it must be O(N) to cover all N items, by definition. In all the data structures you’ve seen, it has been O(N), but we show some other data structures later where it could be greater. There is a little more memory needed for traversing a tree compared to arrays or lists because you need to store the recursive calls or use a stack. That memory will be O(log N). That contrasts with the arrays and lists that need only O(1) memory during traversal.