Priority Queues and Heapsort in Java
- Elementary Implementations
- Heap Data Structure
- Algorithms on Heaps
- Heapsort
- Priority-Queue ADT
- Priority Queues for Client Arrays
- Binomial Queues
Priority Queues and Heapsort
Many applications require that we process records with keys in order, but not necessarily in full sorted order and not necessarily all at once. Often, we collect a set of records, then process the one with the largest key, then perhaps collect more records, then process the one with the current largest key, and so forth. An appropriate data structure in such an environment supports the operations of inserting a new element and deleting the largest element. Such a data structure is called a priority queue. Using priority queues is similar to using queues (remove the oldest) and stacks (remove the newest), but implementing them efficiently is more challenging. The priority queue is the most important example of the generalized queue ADT that we discussed in Section 4.7. In fact, the priority queue is a proper generalization of the stack and the queue, because we can implement these data structures with priority queues, using appropriate priority assignments (see Exercises 9.3 and 9.4).
Definition 9.1 A priority queue is a data structure of items with keys which supports two basic operations: insert a new item, and remove the item with the largest key.
Applications of priority queues include simulation systems, where the keys might correspond to event times, to be processed in chronological order; job scheduling in computer systems, where the keys might correspond to priorities indicating which users are to be served first; and numerical computations, where the keys might be computational errors, indicating that the largest should be dealt with first.
We can use any priority queue as the basis for a sorting algorithm by inserting all the records, then successively removing the largest to get the records in reverse order. Later on in this book, we shall see how to use priority queues as building blocks for more advanced algorithms. In Part 5, we shall see how priority queues are an appropriate abstraction for helping us understand the relationships among several fundamental graph-searching algorithms; and in Part 6, we shall develop a file-compression algorithm using routines from this chapter. These are but a few examples of the important role played by the priority queue as a basic tool in algorithm design.
In practice, priority queues are more complex than the simple definition just given, because there are several other operations that we may need to perform to maintain them under all the conditions that might arise when we are using them. Indeed, one of the main reasons that many priority-queue implementations are so useful is their flexibility in allowing client application programs to perform a variety of different operations on sets of records with keys. We want to build and maintain a data structure containing records with numerical keys (priorities) that supports some of the following operations:
Construct a priority queue from N given items.
Insert a new item.
Remove the maximum item.
Change the priority of an arbitrary specified item.
Remove an arbitrary specified item.
Join two priority queues into one large one.
If records can have duplicate keys, we take "maximum" to mean "any record with the largest key value." As with many data structures, we also need to add a standard test if empty operation and perhaps a copy (clone) operation to this set.
There is overlap among these operations, and it is sometimes convenient to define other, similar operations. For example, certain clients may need frequently to find the maximum item in the priority queue, without necessarily removing it. Or, we might have an operation to replace the maximum item with a new item. We could implement operations such as these using our two basic operations as building blocks: Find the maximum could be remove the maximum followed by insert, and replace the maximum could be either insert followed by remove the maximum or remove the maximum followed by insert. We normally get more efficient code, however, by implementing such operations directly, provided that they are needed and precisely specified. Precise specification is not always as straightforward as it might seem. For example, the two options just given for replace the maximum are quite different: the former always makes the priority queue grow temporarily by one item, and the latter always puts the new item on the queue. Similarly, the change priority operation could be implemented as a remove followed by an insert, and construct could be implemented with repeated uses of insert.
For some applications, it might be slightly more convenient to switch around to work with the minimum, rather than with the maximum. We stick primarily with priority queues that are oriented toward accessing the maximum key. When we do need the other kind, we shall refer to it (a priority queue that allows us to remove the minimum item) as a minimum-oriented priority queue.
The priority queue is a prototypical abstract data type (ADT) (see Chapter 4): It represents a well-defined set of operations on data, and it provides a convenient abstraction that allows us to separate applications programs (clients) from various implementations that we will consider in this chapter. The interface given in Program 9.1 defines the most basic priority-queue operations; we shall consider a more com- plete interface in Section 9.5. Strictly speaking, different subsets of the various operations that we might want to include lead to different abstract data structures, but the priority queue is essentially characterized by the remove-the-maximum and insert operations, so we shall focus on them.
Program 9.1 Basic priority-queue ADT
This interface defines operations for the simplest type of priority queue: initialize, test if empty, add a new item, remove the largest item. Elementary implementations of these methods using arrays and linked lists can require linear time in the worst case, but we shall see implementations in this chapter where all operations are guaranteed to run in time at most proportional to the logarithm of the number of items in the queue. The constructor's parameter specifies the maximum number of items expected in the queue and may be ignored by some implementations.
class PQ // ADT interface { // implementations and private members hidden PQ(int) boolean empty() void insert(ITEM) ITEM getmax() };
Different implementations of priority queues afford different performance characteristics for the various operations to be performed, and different applications need efficient performance for different sets of operations. Indeed, performance differences are, in principle, the only differences that can arise in the abstract-data-type concept. This situation leads to cost tradeoffs. In this chapter, we consider a variety of ways of approaching these cost tradeoffs, nearly reaching the ideal of being able to perform the remove the maximum operation in logarithmic time and all the other operations in constant time.
First, in Section 9.1, we illustrate this point by discussing a few elementary data structures for implementing priority queues. Next, in Sections 9.2 through 9.4, we concentrate on a classical data structure called the heap, which allows efficient implementations of all the operations but join. In Section 9.4, we also look at an important sorting algorithm that follows naturally from these implementations. In Sections 9.5 and 9.6, we look in more detail at some of the problems involved in developing complete priority-queue ADTs. Finally, in Section 9.7, we examine a more advanced data structure, called the binomial queue, that we use to implement all the operations (including join) in worst-case logarithmic time.
During our study of all these various data structures, we shall bear in mind both the basic tradeoffs dictated by linked versus sequential memory allocation (as introduced in Chapter 3) and the problems involved with making packages usable by applications programs. In particular, some of the advanced algorithms that appear later in this book are client programs that make use of priority queues.
Exercises
9.1 A letter means insert and an asterisk means remove the maximum in the sequence
P R I O * R * * I * T * Y * * * Q U E * * * U * E:
Give the sequence of values returned by the remove the maximum operations.
9.2 Add to the conventions of Exercise 9.1 a plus sign to mean join and parentheses to delimit the priority queue created by the operations within them. Give the contents of the priority queue after the sequence
( ( ( P R I O *) + ( R * I T * Y * ) ) * * * ) + ( Q U E * * * U * E ):
9.3 Explain how to use a priority queue ADT to implement a stack ADT.
9.4 Explain how to use a priority queue ADT to implement a queue ADT.
9.1 Elementary Implementations
The basic data structures that we discussed in Chapter 3 provide us with numerous options for implementing priority queues. Program 9.2 is an implementation that uses an unordered array as the underlying data structure. The find the maximum operation is implemented by scanning the array to find the maximum, then exchanging the maximum item with the last item and decrementing the queue size. Figure 9.1 shows the contents of the array for a sample sequence of operations. This basic implementation corresponds to similar implementations that we saw in Chapter 4 for stacks and queues (see Programs 4.7 and 4.17) and is useful for small queues. The significant difference has to do with performance. For stacks and queues, we were able to develop implementations of all the operations that take constant time; for priority queues, it is easy to find implementations where either the insert or the remove the maximum operations takes constant time, but finding an implementation where both operations will be fast is a more difficult task, and it is the subject of this chapter.
Figure 9.1 Priority-queue example (unordered array representation)
This
sequence shows the result of the sequence of operations in the left column (top
to bottom), where a letter denotes insert and an asterisk denotes remove the
maximum. Each line displays the operation, the letter removed for the
remove-the-maximum operations, and the contents of the array after the
operation.
Program 9.2 Array implementation of a priority queue
This implementation, which may be compared with the array implementations for stacks and queues that we considered in Chapter 4 (see Programs 4.7 and 4.17), keeps the items in an unordered array. Items are added to and removed from the end of the array, as in a stack.
class PQ { static boolean less(ITEM v, ITEM w) { return v.less(w); } static void exch(ITEM[] a, int i, int j) { ITEM t = a[i]; a[i] = a[j]; a[j] = t; } private ITEM[] pq; private int N; PQ(int maxN) { pq = new ITEM[maxN]; N = 0; } boolean empty() { return N == 0; } void insert(ITEM item) { pq[N++] = item; } ITEM getmax() { int max = 0; for (int j = 1; j < N; j++) if (less(pq[max], pq[j])) max = j; exch(pq, max, N-1); return pq[--N]; } };
We can use unordered or ordered sequences, implemented as linked lists or as arrays. The basic tradeoff between leaving the items unordered and keeping them in order is that maintaining an ordered sequence allows for constant-time remove the maximum and find the maximum but might mean going through the whole list for insert, whereas an unordered sequence allows a constant-time insert but might mean going through the whole sequence for remove the maximum and find the maximum. The unordered sequence is the prototypical lazy approach to this problem, where we defer doing work until necessary (to find the maximum); the ordered sequence is the prototypical eager approach to the problem, where we do as much work as we can up front (keep the list sorted on insertion) to make later operations efficient. We can use an array or linked-list representation in either case, with the basic tradeoff that the (doubly) linked list allows a constant-time remove (and, in the unordered case, join), but requires more space for the links.
The worst-case costs of the various operations (within a constant factor) on a priority queue of size N for various implementations are summarized in Table 9.1.
Developing a full implementation requires paying careful attention to the interfaceparticularly to how client programs access nodes for the remove and change priority operations, and how they access priority queues themselves as data types for the join operation. These issues are discussed in Sections 9.4 and 9.7, where two full implementations are given: one using doubly linked unordered lists, and another using binomial queues.
Table 9.1 Worst-case costs of priority-queue operations
Implementations of the priority queue ADT have widely varying performance characteristics, as indicated in this table of the worst-case time (within a constant factor for large N) for various methods. Elementary methods (first four lines) require constant time for some operations and linear time for others; more advanced methods guarantee logarithmicor constant-time performance for most or all operations. |
||||||
|
insert |
remove maximum |
remove |
find maximum |
change priority |
joint |
ordered array |
N |
1 |
N |
1 |
N |
N |
ordered list |
N |
1 |
1 |
1 |
N |
N |
unordered array |
1 |
N |
1 |
N |
1 |
N |
unordered list |
1 |
N |
1 |
N |
1 |
1 |
heap |
lg N |
lg N |
lg N |
1 |
lg N |
N |
binomial queue |
lg N |
lg N |
lg N |
lg N |
lg N |
lg N |
best in theory |
1 |
lg N |
lg N |
1 |
1 |
1 |
The running time of a client program using priority queues depends not just on the keys but also on the mix of the various operations. It is wise to keep in mind the simple implementations because they often can outperform more complicated methods in many practical situations. For example, the unordered-list implementation might be appropriate in an application where only a few remove the maximum operations are performed, as opposed to a huge number of insertions, whereas an ordered list would be appropriate if a huge number of find the maximum operations are involved, or if the items inserted tend to be larger than those already in the priority queue.
Exercises
9.5 Criticize the following idea: To implement find the maximum in constant time, why not keep track of the maximum value inserted so far, then return that value for find the maximum?
9.6 Give the contents of the array after the execution of the sequence of operations depicted in Figure 9.1.
9.7 Provide an implementation for the basic priority-queue interface that uses an ordered array for the underlying data structure.
9.8 Provide an implementation for the basic priority-queue interface that uses an unordered linked list for the underlying data structure. Hint: See Programs 4.8 and 4.16.
9.9 Provide an implementation for the basic priority-queue interface that uses an ordered linked list for the underlying data structure. Hint: See Program 3.11.
9.10 Consider a lazy implementation where the list is ordered only when a remove the maximum or a find the maximum operation is performed. Insertions since the previous sort are kept on a separate list, then are sorted and merged in when necessary. Discuss advantages of such an implementation over the elementary implementations based on unordered and ordered lists.
9.11 Write a performance driver client program that uses insert to fill a priority queue, then uses getmax to remove half the keys, then uses insert to fill it up again, then uses getmax to remove all the keys, doing so multiple times on random sequences of keys of various lengths ranging from small to large; measures the time taken for each run; and prints out or plots the average running times.
9.12 Write a performance driver client program that uses insert to fill a priority queue, then does as many getmax and insert operations as it can do in 1 second, doing so multiple times on random sequences of keys of various lengths ranging from small to large; and prints out or plots the average number of getmax operations it was able to do.
9.13 Use your client program from Exercise 9.12 to compare the unordered-array implementation in Program 9.2 with your unordered-list implementation from Exercise 9.8.
9.14 Use your client program from Exercise 9.12 to compare your ordered-array and ordered-list implementations from Exercises 9.7 and 9.9.
9.15 Write an exercise driver client program that uses the methods in our priority-queue interface Program 9.1 on difficult or pathological cases that might turn up in practical applications. Simple examples include keys that are already in order, keys in reverse order, all keys the same, and sequences of keys having only two distinct values.
9.16 (This exercise is 24 exercises in disguise.) Justify the worst-case bounds for the four elementary implementations that are given in Table 9.1, by reference to the implementation in Program 9.2 and your implementations from Exercises 9.7 through 9.9 for insert and remove the maximum; and by informally describing the methods for the other operations. For remove, change priority, and join, assume that you have a handle that gives you direct access to the referent.