- Why Use Binary Trees?
- Tree Terminology
- An Analogy
- How Do Binary Search Trees Work?
- Finding a Node
- Inserting a Node
- Traversing the Tree
- Finding Minimum and Maximum Key Values
- Deleting a Node
- The Efficiency of Binary Search Trees
- Trees Represented as Arrays
- Printing Trees
- Duplicate Keys
- The BinarySearchTreeTester.py Program
- The Huffman Code
- Summary
- Questions
- Experiments
- Programming Projects
How Do Binary Search Trees Work?
Let’s see how to carry out the common binary tree operations of finding a node with a given key, inserting a new node, traversing the tree, and deleting a node. For each of these operations, we first show how to use the Binary Search Tree Visualization tool to carry it out; then we look at the corresponding Python code.
The Binary Search Tree Visualization Tool
For this example, start the Binary Search Tree Visualization tool (the program is called BinaryTree.py). You should see a screen something like that shown in Figure 8-5.
FIGURE 8-5 The Binary Search Tree Visualization tool
Using the Visualization Tool
The key values shown in the nodes range from 0 to 99. Of course, in a real tree, there would probably be a larger range of key values. For example, if telephone numbers were used for key values, they could range up to 999,999,999,999,999 (15 digits including country codes in the International Telecommunication Union standard). We focus on a simpler set of possible keys.
Another difference between the Visualization tool and a real tree is that the tool limits its tree to a depth of five; that is, there can be no more than five levels from the root to the bottom (level 0 through level 4). This restriction ensures that all the nodes in the tree will be visible on the screen. In a real tree the number of levels is unlimited (until the computer runs out of memory).
Using the Visualization tool, you can create a new tree whenever you want. To do this, enter a number of items and click the Erase & Random Fill button. You can ask to fill with 0 to 99 items. If you choose 0, you will create an empty tree. Using larger numbers will fill in more nodes, but some of the requested nodes may not appear. That’s due to the limit on the depth of the tree and the random order the items are inserted. You can experiment by creating trees with different numbers of nodes to see the variety of trees that come out of the random sequencing.
The nodes are created with different colors. The color represents the data stored with the key. We show a little later how that data is updated in some operations.
Constructing Trees
As shown in the Visualization tool, the tree’s shape depends both on the items it contains as well as the order the items are inserted into the tree. That might seem strange at first. If items are inserted into a sorted array, they always end up in the same order, regardless of their sequencing. Why are binary search trees different?
One of the key features of the binary search tree is that it does not have to fully order the items as they are inserted. When it adds a new item to an existing tree, it decides where to place the new leaf node by comparing its key with that of the nodes already stored in the tree. It follows a path from the root down to a missing child where the new node “belongs.” By choosing the left child when the new node’s key is less than the key of an internal node and the right child for other values, there will always be a unique path for the new node. That unique path means you can easily find that node by its key later, but it also means that the previously inserted items affect the path to any new item.
For example, if you start with an empty binary search tree and insert nodes in increasing key order, the unique path for each one will always be the rightmost path. Each insertion adds one more node at the bottom right. If you reverse the order of the nodes and insert them into an empty tree, each insertion will add the node at the bottom left because the key is lower than any other in the tree so far. Figure 8-6 shows what happens if you insert nodes with keys 44, 65, 83, and 87 in forward or reverse order.
FIGURE 8-6 Trees made by inserting nodes in sorted order
Unbalanced Trees
The trees shown in Figure 8-6, don’t look like trees. In fact, they look more like linked lists. One of the goals for a binary search tree is to speed up the search for a particular node, so having to step through a linked list to find the node would not be an improvement. To get the benefit of the tree, the nodes should be roughly balanced on both sides of the root. Ideally, each step along the path to find a node should cut the number of nodes to search in half, just like in binary searches of arrays and the guess-a-number game described in Chapter 2.
We can call some trees unbalanced; that is, they have most of their nodes on one side of the root or the other, as shown in Figure 8-7. Any subtree may also be unbalanced like the outlined one on the lower left of the figure. Of course, only a fully balanced tree will have equal numbers of nodes on the left and right subtrees (and being fully balanced, every node’s subtree will be fully balanced too). Inserting nodes one at a time on randomly chosen items means most trees will be unbalanced by one or more nodes as they are constructed, so you typically cannot expect to find fully balanced trees. In the following chapters, we look more carefully at ways to balance them as nodes are inserted and deleted.
FIGURE 8-7 An unbalanced tree (with an unbalanced subtree)
Trees become unbalanced because of the order in which the data items are inserted. If these key values are inserted randomly, the tree will be more or less balanced. When an ascending sequence (like 11, 18, 33, 42, 65) or a descending sequence is encountered, all the values will be right children (if ascending) or left children (if descending), and the tree will be unbalanced. The key values in the Visualization tool are generated randomly, but of course some short ascending or descending sequences will be created anyway, which will lead to local imbalances.
If a tree is created by data items whose key values arrive in random order, the problem of unbalanced trees may not be too much of a problem for larger trees because the chances of a long run of numbers in a sequence is small. Sometimes, however, key values will arrive in strict sequence; for example, when someone doing data entry arranges a stack of forms into alphabetical order by name before entering the data. When this happens, tree efficiency can be seriously degraded. We discuss unbalanced trees and what to do about them in Chapters 9 and 10.
Representing the Tree in Python Code
Let’s start implementing a binary search tree in Python. As with other data structures, there are several approaches to representing a tree in the computer’s memory. The most common is to store the nodes at (unrelated) locations in memory and connect them using references in each node that point to its children.
You can also represent a tree in memory as an array, with nodes in specific positions stored in corresponding positions in the array. We return to this possibility at the end of this chapter. For our sample Python code we’ll use the approach of connecting the nodes using references, similar to the way linked lists were implemented in Chapter 5.
The BinarySearchTree Class
We need a class for the overall tree object: the object that holds, or at least leads to, all the nodes. We’ll call this class BinarySearchTree. It has only one field, __root, that holds the reference to the root node, as shown in Listing 8-1. This is very similar to the LinkedList class that was used in Chapter 5 to represent linked lists. The BinarySearchTree class doesn’t need fields for the other nodes because they are all accessed from the root node by following other references.
LISTING 8-1 The Constructor for the BinarySearchTree Class
class BinarySearchTree(object): # A binary search tree class def __init__(self): # The tree organizes nodes by their self.__root = None # keys. Initially, it is empty.
The constructor initializes the reference to the root node as None to start with an empty tree. When the first node is inserted, __root will point to it as shown in the Visualization tool example of Figure 8-5. There are, of course, many methods that operate on BinarySearchTree objects, but first, you need to define the nodes inside them.
The __Node Class
The nodes of the tree contain the data representing the objects being stored (contact information in an address book, for example), a key to identify those objects (and to order them), and the references to each of the node’s two children. To remind us that callers that create BinarySearchTree objects should not be allowed to directly alter the nodes, we make a private __Node class inside that class. Listing 8-2 shows how an internal class can be defined inside the BinarySearchTree class.
LISTING 8-2 The Constructors for the __Node and BinarySearchTree Classes
class BinarySearchTree(object): # A binary search tree class … class __Node(object): # A node in a binary search tree def __init__( # Constructor takes a key that is self, # used to determine the position key, # inside the search tree, data, # the data associated with the key left=None, # and a left and right child node right=None): # if known self.key = key # Copy parameters to instance self.data = data # attributes of the object self.leftChild = left self.rightChild = right def __str__(self): # Represent a node as a string return "{" + str(self.key) + ", " + str(self.data) + "}" def __init__(self): # The tree organizes nodes by their self.__root = None # keys. Initially, it is empty. def isEmpty(self): # Check for empty tree return self.__root is None def root(self): # Get the data and key of the root node if self.isEmpty(): # If the tree is empty, raise exception raise Exception("No root node in empty tree") return ( # Otherwise return root data and its key self.__root.data, self.__root.key)
The __Node objects are created and manipulated by the BinarySearchTree’s methods. The fields inside __Node can be initialized as public attributes because the BinarySearchTree‘s methods take care never to return a __Node object. This declaration allows for direct reading and writing without making accessor methods like getKey() or setData(). The __Node constructor simply populates the fields from the arguments provided. If the child nodes are not provided, the fields for their references are filled with None.
We add a __str__() method for __Node objects to aid in displaying the contents while debugging. Notably, it does not show the child nodes. We discuss how to display full trees a little later. That’s all the methods needed for __Node objects; all the rest of the methods you define are for BinarySearchTree objects.
Listing 8-2 shows an isEmpty() method for BinarySearchTree objects that checks whether the tree has any nodes in it. The root() method extracts the root node’s data and key. It’s like peek() for a queue and raises an exception if the tree is empty.
Some programmers also include a reference to a node’s parent in the __Node class. Doing so simplifies some operations but complicates others, so we don’t include it here. Adding a parent reference achieves something similar to the DoublyLinkedList class described in Chapter 5, “Linked Lists”; it’s useful in certain contexts but adds complexity.
We’ve made another design choice by storing the key for each node in its own field. For the data structures based on arrays, we chose to use a key function that extracts the key from each array item. That approach was more convenient for arrays because storing the keys separately from the data would require the equivalent of a key array along with the data array. In the case of node class with named fields, adding a key field makes the code perhaps more readable and somewhat more efficient by avoiding some function calls. It also makes the key more independent of the data, which adds flexibility and can be used to enforce constraints like immutable keys even when data changes.
The BinarySearchTree class has several methods. They are used for finding, inserting, deleting, and traversing nodes; and for displaying the tree. We investigate them each separately.