␡
- 2.1 collectionsContainer Data Types
- 2.2 arraySequence of Fixed-Type Data
- 2.3 heapqHeap Sort Algorithm
- 2.4 bisectMaintain Lists in Sorted Order
- 2.5 QueueThread-Safe FIFO Implementation
- 2.6 structBinary Data Structures
- 2.7 weakrefImpermanent References to Objects
- 2.8 copyDuplicate Objects
- 2.9 pprintPretty-Print Data Structures
This chapter is from the book
2.3 heapq—Heap Sort Algorithm
Purpose The heapq module implements a min-heap sort algorithm suitable for use with Python's lists.
Python Version New in 2.3 with additions in 2.5
A heap is a tree-like data structure where the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or an array organized so that the children of element N are at positions 2*N+1 and 2*N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.
A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python's heapq module implements a min-heap.
2.3.1 Example Data
The examples in this section use the data in heapq_heapdata.py.
# This data was generated with the random module.
data = [19, 9, 4, 10, 11]
The heap output is printed using heapq_showtree.py.
import math
from cStringIO import StringIO
def show_tree(tree, total_width=36, fill=' '):
"""Pretty-print a tree."""
output = StringIO()
last_row = -1
for i, n in enumerate(tree):
if i:
row = int(math.floor(math.log(i+1, 2)))
else:
row = 0
if row != last_row:
output.write('\n')
columns = 2**row
col_width = int(math.floor((total_width * 1.0) / columns))
output.write(str(n).center(col_width, fill))
last_row = row
print output.getvalue()
print '-' * total_width
print
return
2.3.2 Creating a Heap
There are two basic ways to create a heap: heappush() and heapify().
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heap = []
print 'random :', data
print
for n in data:
print 'add %3d:' % n
heapq.heappush(heap, n)
show_tree(heap)
Using heappush(), the heap sort order of the elements is maintained as new items are added from a data source.
$ python heapq_heappush.py
random : [19, 9, 4, 10, 11]
add 19:
19
------------------------------------
add 9:
9
19
------------------------------------
add 4:
4
19 9
------------------------------------
add 10:
4
10 9
19
------------------------------------
add 11:
4
10 9
19 11
------------------------------------
If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place.
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print 'random :', data
heapq.heapify(data)
print 'heapified :'
show_tree(data)
The result of building a list in heap order one item at a time is the same as building it unordered and then calling heapify().
$ python heapq_heapify.py
random : [19, 9, 4, 10, 11]
heapified :
4
9 19
10 11
------------------------------------
2.3.3 Accessing Contents of a Heap
Once the heap is organized correctly, use heappop() to remove the element with the lowest value.
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
print 'random :', data
heapq.heapify(data)
print 'heapified :'
show_tree(data)
print
for i in xrange(2):
smallest = heapq.heappop(data)
print 'pop %3d:' % smallest
show_tree(data)
In this example, adapted from the stdlib documentation, heapify() and heappop() are used to sort a list of numbers.
$ python heapq_heappop.py
random : [19, 9, 4, 10, 11]
heapified :
4
9 19
10 11
------------------------------------
pop 4:
9
10 19
11
------------------------------------
pop 9:
10
11 19
------------------------------------
To remove existing elements and replace them with new values in a single operation, use heapreplace().
import heapq
from heapq_showtree import show_tree
from heapq_heapdata import data
heapq.heapify(data)
print 'start:'
show_tree(data)
for n in [0, 13]:
smallest = heapq.heapreplace(data, n)
print 'replace %2d with %2d:' % (smallest, n)
show_tree(data)
Replacing elements in place makes it possible to maintain a fixed-size heap, such as a queue of jobs ordered by priority.
$ python heapq_heapreplace.py
start:
4
9 19
10 11
------------------------------------
replace 4 with 0:
0
9 19
10 11
------------------------------------
replace 0 with 13:
9
10 19
13 11
------------------------------------
2.3.4 Data Extremes from a Heap
heapq also includes two functions to examine an iterable to find a range of the largest or smallest values it contains.
import heapq
from heapq_heapdata import data
print 'all :', data
print '3 largest :', heapq.nlargest(3, data)
print 'from sort :', list(reversed(sorted(data)[-3:]))
print '3 smallest:', heapq.nsmallest(3, data)
print 'from sort :', sorted(data)[:3]
Using nlargest() and nsmallest() is only efficient for relatively small values of n > 1, but can still come in handy in a few cases.
$ python heapq_extremes.py
all : [19, 9, 4, 10, 11]
3 largest : [19, 11, 10]
from sort : [19, 11, 10]
3 smallest: [4, 9, 10]
from sort : [4, 9, 10]
See Also:
heapq (http://docs.python.org/library/heapq.html) The standard library documentation for this module.
Heap (data structure) (http://en.wikipedia.org/wiki/Heap_(data_structure)) Wikipedia article that provides a general description of heap data structures.
Priority Queue (page 98) A priority queue implementation from Queue (page 96 in the standard library.