- Why Use Binary Trees?
- Tree Terminology
- An Analogy
- How Do Binary Search Trees Work?
- Finding a Node
- Inserting a Node
- Traversing the Tree
- Finding Minimum and Maximum Key Values
- Deleting a Node
- The Efficiency of Binary Search Trees
- Trees Represented as Arrays
- Printing Trees
- Duplicate Keys
- The BinarySearchTreeTester.py Program
- The Huffman Code
- Summary
- Questions
- Experiments
- Programming Projects
Deleting a Node
Deleting a node is the most complicated common operation required for binary search trees. The fundamental operation of deletion can’t be ignored, however, and studying the details builds character. If you’re not in the mood for character building, feel free to skip to the Efficiency of Binary Search Trees section.
You start by verifying the tree isn’t empty and then finding the node you want to delete, using the same approach you saw in __find() and insert(). If the node isn’t found, then you’re done. When you’ve found the node and its parent, there are three cases to consider:
The node to be deleted is a leaf (has no children).
The node to be deleted has one child.
The node to be deleted has two children.
Let’s look at these three cases in turn. The first is easy; the second, almost as easy; and the third, quite complicated.
Case 1: The Node to Be Deleted Has No Children
To delete a leaf node, you simply change the appropriate child field in the node’s parent to None instead of to the node. The node object still exists, but it is no longer part of the tree, as shown when deleting node 17 in Figure 8-18.
FIGURE 8-18 Deleting a node with no children
If you’re using a language like Python or Java that has garbage collection, the deleted node’s memory will eventually be reclaimed for other uses (if you eliminate all references to it in the program). In languages that require explicit allocation and deallocation of memory, the deleted node should be released for reuse.
Using the Visualization Tool to Delete a Node with No Children
Try deleting a leaf node using the Binary Search Tree Visualization tool. You can either type the key of a node in the text entry box or select a leaf with your pointer device and then select Delete. You see the program use __find() to locate the node by its key, copy it to a temporary variable, set the parent link to None, and then “return” the deleted key and data (in the form of its colored background).
Case 2: The Node to Be Deleted Has One Child
This second case isn’t very difficult either. The node has only two edges: one to its parent and one to its only child. You want to “cut” the node out of this sequence by connecting its parent directly to its child. This process involves changing the appropriate reference in the parent (leftChild or rightChild or __root) to point to the deleted node’s child. Figure 8-19 shows the deletion of node 16, which has only one child.
FIGURE 8-19 Deleting a node with one child
After finding the node and its parent, the delete method has to change only one reference. The deleted node, key 16 in the figure, becomes disconnected from the tree (although it may still have a child pointer to the node that was promoted up (key 20). Garbage collectors are sophisticated enough to know that they can reclaim the deleted node without following its links to other nodes that might still be needed.
Now let’s go back to the case of deleting a node with no children. In that case, the delete method also made a single change to replace one of the parent’s child pointers. That pointer was set to None because there was no replacement child node. That’s a similar operation to Case 2, so you can treat Case 1 and Case 2 together by saying, “If the node to be deleted, D, has 0 or 1 children, replace the appropriate link in its parent with either the left child of D, if it isn’t empty, or the right child of D.” If both child links from D are None, then you’ve covered Case 1. If only one of D’s child links is non-None, then the appropriate child will be selected as the parent’s new child, covering Case 2. You promote either the single child or None into the parent’s child (or possibly __root) reference.
Using the Visualization Tool to Delete a Node with One Child
Let’s assume you’re using the Visualization tool on the tree in Figure 8-5 and deleting node 61, which has a right child but no left child. Click node 61 and the key should appear in the text entry area, enabling the Delete button. Selecting the button starts another call to __find() that stops with current pointing to the node and parent pointing to node 59.
After making a copy of node 61, the animation shows the right child link from node 59 being set to node 61’s right child, node 62. The original copy of node 61 goes away, and the tree is adjusted to put the subtree rooted at node 62 into its new position. Finally, the copy of node 61 is moved to the output box at the bottom.
Use the Visualization tool to generate new trees with single child nodes and see what happens when you delete them. Look for the subtree whose root is the deleted node’s child. No matter how complicated this subtree is, it’s simply moved up and plugged in as the new child of the deleted node’s parent.
Python Code to Delete a Node
Let’s now look at the code for at least Cases 1 and 2. Listing 8-8 shows the code for the delete() method, which takes one argument, the key of the node to delete. It returns either the data of the node that was deleted or None, to indicate the node was not found. That makes it behave somewhat like the methods for popping an item off a stack or deleting an item from a queue. The difference is that the node must be found inside the tree instead of being at a known position in the data structure.
LISTING 8-8 The delete() Method of BinarySearchTree
class BinarySearchTree(object): # A binary search tree class … def delete(self, goal): # Delete a node whose key matches goal node, parent = self.__find(goal) # Find goal and its parent if node is not None: # If node was found, return self.__delete( # then perform deletion at node parent, node) # under the parent def __delete(self, # Delete the specified node in the tree parent, node): # modifying the parent node/tree deleted = node.data # Save the data that's to be deleted if node.leftChild: # Determine number of subtrees if node.rightChild: # If both subtrees exist, self.__promote_successor( # Then promote successor to node) # replace deleted node else: # If no right child, move left child up if parent is self: # If parent is the whole tree, self.__root = node.leftChild # update root elif parent.leftChild is node: # If node is parent's left, parent.leftChild = node.leftChild # child, update left else: # else update right child parent.rightChild = node.leftChild else: # No left child; so promote right child if parent is self: # If parent is the whole tree, self.__root = node.rightChild # update root elif parent.leftChild is node: # If node is parent's left parent.leftChild = node.rightChild # child, then update else: # left child link else update parent.rightChild = node.rightChild # right child return deleted # Return the deleted node's data
Just like for insertion, the first step is to find the node to delete and its parent. If that search does not find the goal node, then there’s nothing to delete from the tree, and delete() returns None. If the node to delete is found, the node and its parent are passed to the private __delete() method to modify the nodes in the tree.
Inside the __delete() method, the first step is to store a reference to the node data being deleted. This step enables retrieval of the node’s data after the references to it are removed from the tree. The next step checks how many subtrees the node has. That determines what case is being processed. If both a left and a right child are present, that’s Case 3, and it hands off the deletion to another private method, __promote_successor(), which we describe a little later.
If there is only a left subtree of the node to delete, then the next thing to look at is its parent node. If the parent is the BinarySearchTree object (self), then the node to delete must be the root node, so the left child is promoted into the root node slot. If the parent’s left child is the node to delete, then the parent’s left child link is replaced with the node’s left child to remove the node. Otherwise, the parent’s right child link is updated to remove the node.
Notice that working with references makes it easy to move an entire subtree. When the parent’s reference to the node is updated, the child that gets promoted could be a single node or an immense subtree. Only one reference needs to change. Although there may be lots of nodes in the subtree, you don’t need to worry about moving them individually. In fact, they “move” only in the sense of being conceptually in different positions relative to the other nodes. As far as the program is concerned, only the parent’s reference to the root of the subtree has changed, and the rest of the contents in memory remain the same.
The final else clause of the __delete() method deals with the case when the node has no left child. Whether or not the node has a right child, __delete() only needs to update the parent’s reference to point at the node’s right child. That handles both Case 1 and Case 2. It still must determine which field of the parent object gets the reference to the node’s right child, just as in the earlier lines when only the left child was present. It puts the node.rightChild in either the __root, leftChild, or rightChild field of the parent, accordingly. Finally, it returns the data of the node that was deleted.
Case 3: The Node to Be Deleted Has Two Children
Now the fun begins. If the deleted node has two children, you can’t just replace it with one of these children, at least if the child has its own (grand) children. Why not? Examine Figure 8-20 and imagine deleting node 27 and replacing it with its right subtree, whose root is 33. You are promoting the right subtree, but it has its own children. Which left child would node 33 have in its new position, the deleted node’s left child, 16, or node 33’s left child, 28? And what do you do with the other left child? You can’t just throw it away.
FIGURE 8-20 Options for deleting a node with two subtrees
The middle option in Figure 8-20 shows potentially allowing three children. That would bring a whole host of other problems because the tree is no longer binary (see Chapter 9 for more on that idea). The right-hand option in the figure shows pushing the deleted node’s left child, 16, down and splicing in the new node’s left child, 28, above it. That approach looks plausible. The tree is still a binary search tree, at least. The problem, however, is what to do if the promoted node’s left child has a complicated subtree of its own (for example, if node 28 in the figure had a whole subtree below it). That could mean following a long path to figure out where to splice the left subtrees together.
We need another approach. The good news is that there’s a trick. The bad news is that, even with the trick, there are special cases to consider. Remember that, in a binary search tree, the nodes are arranged in order of ascending keys. For each node, the node with the next-highest key is called its in-order successor, or simply its successor. In the original tree of Figure 8-20, node 28 is the in-order successor of node 27.
Here’s the trick: To delete a node with two children, replace the node with its in-order successor. Figure 8-21 shows a deleted node being replaced by its successor. Notice that the nodes are still in order. All it took was a simple replacement. It’s going to be a little more complicated if the successor itself has children; we look at that possibility in a moment.
FIGURE 8-21 Node replaced by its successor
Finding the Successor
How do you find the successor of a node? Human beings can do this quickly (for small trees, anyway). Just take a quick glance at the tree and find the next-largest number following the key of the node to be deleted. In Figure 8-21 it doesn’t take long to see that the successor of 27 is 28, or that the successor of 35 is 44. The computer, however, can’t do things “at a glance"; it needs an algorithm.
Remember finding the node with the minimum or maximum key? In this case you’re looking for the minimum key larger than the key to be deleted. The node to be deleted has both a left and right subtree because you’re working on Case 3. So, you can just look for the minimum key in the right subtree, as illustrated in Figure 8-22. All you need to do is follow the left child links until you find a node with no left child.
FIGURE 8-22 Finding the successor
What about potential nodes in the trees rooted above the node to be deleted? Couldn’t the successor be somewhere in there? Let’s think it through. Imagine you seek the successor of node 27 in Figure 8-22. The successor would have to be greater than 27 and less than 33, the key of its right child. Any node with a key between those two values would be inserted somewhere in the left subtree of node 33. Remember that you always search down the binary search tree choosing the path based on the key’s relative order to the keys already in the tree. Furthermore, node 33 was placed as the right child of node 27 because it was less than the root node, 44. Any node’s right child key must be less than its parent’s key if it is the left child of that parent. So going up to parent, grandparent, or beyond (following left child links) only leads to larger keys, and those keys can’t be the successor.
There are a couple of other things to note about the successor. If the right child of the original node to delete has no left children, this right child is itself the successor, as shown in the example of Figure 8-23. Because the successor always has an empty left child link, it has at most one child.
FIGURE 8-23 The right child is the successor
Replacing with the Successor
Having found the successor, you can easily copy its key and data values into the node to be deleted, but what do you do with the subtree rooted at the successor node? You can’t leave a copy of the successor node in the tree there because the data would be stored in two places, create duplicate keys, and make deleting the successor a problem in the future. So, what’s the easiest way to get it out of the tree?
Hopefully, reading Chapter 6 makes the answer jump right out. You can now delete the successor from the tree using a recursive call. You want to do the same operation on the successor that you’ve been doing on the original node to delete—the one with the goal key. What’s different is that you only need to do the deletion in a smaller tree, the right subtree where you found the successor. If you tried to do it starting from the root of the tree after replacing the goal node, the __find() method would follow the same path and end at the node you just replaced. You could get around that problem by delaying the replacement of the key until after deleting the successor, but it’s much easier—and more importantly, faster—if you start a new delete operation in the right subtree. There will be much less tree to search, and you can’t accidentally end up at the previous goal node.
In fact, when you searched for the successor, you followed child links to determine the path, and that gave you both the successor and the successor’s parent node. With those two references available, you now have everything needed to call the private __delete() method shown in Listing 8-8. You can now define the __promote_successor() method, as shown in Listing 8-9. Remember, this is the method used to handle Case 3—when the node to delete has two children.
The __promote_successor() method takes as its lone parameter the node to delete. Because it is going to replace that node’s data and key and then delete the successor, it’s easier to refer to it as the node to be replaced in this context. To start, it points a successor variable at the right child of the node to be replaced. Just like the __find() method, it tracks the parent of the successor node, which is initialized to be the node to be replaced. Then it acts like the minNode() method, using a while loop to update successor with its left child if there is a left child. When the loop exits, successor points at the successor node and parent to its parent node.
LISTING 8-9 The __promote_successor() Method of BinarySearchTree
class BinarySearchTree(object): # A binary search tree class … def __promote_successor( # When deleting a node with both subtrees, self, # find successor on the right subtree, put # its data in this node, and delete the node): # successor from the right subtree successor = node.rightChild # Start search for successor in parent = node # right subtree and track its parent while successor.leftChild: # Descend left child links until parent = successor # no more left links, tracking parent successor = successor.leftChild node.key = successor.key # Replace node to delete with node.data = successor.data # successor's key and data self.__delete(parent, successor) # Remove successor node
All that’s left to do is update the key and data of the node to be replaced and delete the successor node using a recursive call to __delete(). Unlike previous recursive methods you’ve seen, this isn’t a call to the same routine where the call occurs. In this case, the __promote_successor() method calls __delete(), which in turn, could call __promote_successor(). This is called mutual recursion—where two or more routines call each other.
Your senses should be tingling now. How do you know this mutual recursion will end? Where’s the base case that you saw with the “simple” recursion routines? Could you get into an infinite loop of mutually recursive calls? That’s a good thing to worry about, but it’s not going to happen here. Remember that deleting a node broke down into three cases. Cases 1 and 2 were for deleting leaf nodes and nodes with one child. Those two cases did not lead to __promote_successor() calls, so they are the base cases. When you do call __promote_successor() for Case 3, it operates on the subtree rooted at the node to delete, so the only chance that the tree being processed recursively isn’t smaller than the original is if the node to delete is the root node. The clincher, however, is that __promote_successor() calls __delete() only on successor nodes—nodes that are guaranteed to have at most one child and at least one level lower in the tree than the node they started on. Those always lead to a base case and never to infinite recursion.
Using the Visualization Tool to Delete a Node with Two Children
Generate a tree with the Visualization tool and pick a node with two children. Now mentally figure out which node is its successor, by going to its right child and then following down the line of this right child’s left children (if it has any). For your first try, you may want to make sure the successor has no children of its own. On later attempts, try looking at the more complicated situation where entire subtrees of the successor are moved around, rather than a single node.
After you’ve chosen a node to delete, click the Delete button. You may want to use the Step or Pause/Play buttons to track the individual steps. Each of the methods we’ve described will appear in the code window, so you can see how it decides the node to delete has two children, locates the successor, copies the successor key and data, and then deletes the successor node.
Is Deletion Necessary?
If you’ve come this far, you can see that deletion is fairly involved. In fact, it’s so complicated that some programmers try to sidestep it altogether. They add a new Boolean field to the __Node class, called something like isDeleted. To delete a node, they simply set this field to True. This is a sort of a “soft” delete, like moving a file to a trash folder without truly deleting it. Then other operations, like __find(), check this field to be sure the node isn’t marked as deleted before working with it. This way, deleting a node doesn’t change the structure of the tree. Of course, it also means that memory can fill up with previously “deleted” nodes.
This approach is a bit of a cop-out, but it may be appropriate where there won’t be many deletions in a tree. Be very careful. Assumptions like that tend to come back to haunt you. For example, assuming that deletions might not be frequent for a company’s personnel records might encourage a programmer to use the isDeleted field. If the company ends up lasting for hundreds of years, there are likely to be more deletions than active employees at some point in the future. The same is true if the company experiences high turnover rates, even over a short time frame. That will significantly affect the performance of the tree operations.