- 2.1 enum: Enumeration Type
- 2.2 collections: Container Data Types
- 2.3 array: Sequence of Fixed-Type Data
- 2.4 heapq: Heap Sort Algorithm
- 2.5 bisect: Maintain Lists in Sorted Order
- 2.6 queue: Thread-Safe FIFO Implementation
- 2.7 struct: Binary Data Structures
- 2.8 weakref: Impermanent References to Objects
- 2.9 copy: Duplicate Objects
- 2.10 pprint: Pretty-Print Data Structures
2.4 heapq: Heap Sort Algorithm
A heap is a tree-like data structure in which the child nodes have a sort-order relationship with the parents. Binary heaps can be represented using a list or array organized so that the children of element N are at positions 2*N+1 and 2*N+2 (for zero-based indexes). This layout makes it possible to rearrange heaps in place, so it is not necessary to reallocate as much memory when adding or removing items.
A max-heap ensures that the parent is larger than or equal to both of its children. A min-heap requires that the parent be less than or equal to its children. Python’s heapq module implements a min-heap.
2.4.1 Example Data
The examples in this section use the data in heapq_heapdata.py.
Listing 2.46: heapq_heapdata.py
# This data was generated with the random module. data = [19, 9, 4, 10, 11]
The heap output is printed using heapq_showtree.py.
Listing 2.47: heapq_showtree.py
import math from io import StringIO def show_tree(tree, total_width=36, fill=' '): """Pretty-print a tree.""" output = StringIO() last_row = -1 for i, n in enumerate(tree): if i: row = int(math.floor(math.log(i + 1, 2))) else: row = 0 if row != last_row: output.write('\n') columns = 2 ** row col_width = int(math.floor(total_width / columns)) output.write(str(n).center(col_width, fill)) last_row = row print(output.getvalue()) print('-' * total_width) print()
2.4.2 Creating a Heap
There are two basic ways to create a heap: heappush() and heapify().
Listing 2.48: heapq_heappush.py
import heapq from heapq_showtree import show_tree from heapq_heapdata import data heap = [] print('random :', data) print() for n in data: print('add {:>3}:'.format(n)) heapq.heappush(heap, n) show_tree(heap)
When heappush() is used, the heap sort order of the elements is maintained as new items are added from a data source.
$ python3 heapq_heappush.py random : [19, 9, 4, 10, 11] add 19: 19 ------------------------------------ add 9: 9 19 ------------------------------------ add 4: 4 19 9 ------------------------------------ add 10: 4 10 9 19 ------------------------------------ add 11: 4 10 9 19 11 ------------------------------------
If the data is already in memory, it is more efficient to use heapify() to rearrange the items of the list in place.
Listing 2.49: heapq_heapify.py
import heapq from heapq_showtree import show_tree from heapq_heapdata import data print('random :', data) heapq.heapify(data) print('heapified :') show_tree(data)
The result of building a list in heap order one item at a time is the same as building an unordered list and then calling heapify().
$ python3 heapq_heapify.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------
2.4.3 Accessing the Contents of a Heap
Once the heap is organized correctly, use heappop() to remove the element with the lowest value.
Listing 2.50: heapq_heappop.py
import heapq from heapq_showtree import show_tree from heapq_heapdata import data print('random :', data) heapq.heapify(data) print('heapified :') show_tree(data) print for i in range(2): smallest = heapq.heappop(data) print('pop {:>3}:'.format(smallest)) show_tree(data)
In this example, adapted from the standard library documentation, heapify() and heappop() are used to sort a list of numbers.
$ python3 heapq_heappop.py random : [19, 9, 4, 10, 11] heapified : 4 9 19 10 11 ------------------------------------ pop 4: 9 10 19 11 ------------------------------------ pop 9: 10 11 19 ------------------------------------
To remove existing elements and replace them with new values in a single operation, use heapreplace().
Listing 2.51: heapq_heapreplace.py
import heapq from heapq_showtree import show_tree from heapq_heapdata import data heapq.heapify(data) print('start:') show_tree(data) for n in [0, 13]: smallest = heapq.heapreplace(data, n) print('replace {:>2} with {:>2}:'.format(smallest, n)) show_tree(data)
Replacing elements in place makes it possible to maintain a fixed-size heap, such as a queue of jobs ordered by priority.
$ python3 heapq_heapreplace.py start: 4 9 19 10 11 ------------------------------------ replace 4 with 0: 0 9 19 10 11 ------------------------------------ replace 0 with 13: 9 10 19 13 11 ------------------------------------
2.4.4 Data Extremes from a Heap
heapq also includes two functions to examine an iterable and find a range of the largest or smallest values it contains.
Listing 2.52: heapq_extremes.py
import heapq from heapq_heapdata import data print('all :', data) print('3 largest :', heapq.nlargest(3, data)) print('from sort :', list(reversed(sorted(data)[-3:]))) print('3 smallest:', heapq.nsmallest(3, data)) print('from sort :', sorted(data)[:3])
Using nlargest() and nsmallest() is efficient only for relatively small values of n > 1, but can still come in handy in a few cases.
$ python3 heapq_extremes.py all : [19, 9, 4, 10, 11] 3 largest : [19, 11, 10] from sort : [19, 11, 10] 3 smallest: [4, 9, 10] from sort : [4, 9, 10]
2.4.5 Efficiently Merging Sorted Sequences
Combining several sorted sequences into one new sequence is easy for small data sets.
list(sorted(itertools.chain(*data)))
For larger data sets, this technique can use a considerable amount of memory. Instead of sorting the entire combined sequence, merge() uses a heap to generate a new sequence one item at a time, determining the next item using a fixed amount of memory.
Listing 2.53: heapq_merge.py
import heapq import random random.seed(2016) data = [] for i in range(4): new_data = list(random.sample(range(1, 101), 5)) new_data.sort() data.append(new_data) for i, d in enumerate(data): print('{}: {}'.format(i, d)) print('\nMerged:') for i in heapq.merge(*data): print(i, end=' ') print()
Because the implementation of merge() uses a heap, it consumes memory based on the number of sequences being merged, rather than the number of items in those sequences.
$ python3 heapq_merge.py 0: [33, 58, 71, 88, 95] 1: [10, 11, 17, 38, 91] 2: [13, 18, 39, 61, 63] 3: [20, 27, 31, 42, 45] Merged: 10 11 13 17 18 20 27 31 33 38 39 42 45 58 61 63 71 88 91 95