- 4.1. Introduction/Motivation
- 4.2. Threads and Processes
- 4.3. Threads and Python
- 4.4. The thread Module
- 4.5. The threading Module
- 4.6. Comparing Single vs. Multithreaded Execution
- 4.7. Multithreading in Practice
- 4.8. Producer-Consumer Problem and the Queue/queue Module
- 4.9. Alternative Considerations to Threads
- 4.10. Related Modules
- 4.11. Exercises
4.9. Alternative Considerations to Threads
Before you rush off and do some threading, let’s do a quick recap: threading in general is a good thing. However, because of the restrictions of the GIL in Python, threading is more appropriate for I/O-bound applications (I/O releases the GIL, allowing for more concurrency) than for CPU-bound applications. In the latter case, to achieve greater parallelism, you’ll need processes that can be executed by other cores or CPUs.
Without going into too much detail here (some of these topics have already been covered in the “Execution Environment” chapter of Core Python Programming or Core Python Language Fundamentals), when looking at multiple threads or processes, the primary alternatives to the threading module include:
4.9.1. The subprocess Module
This is the primary alternative when desiring to spawn processes, whether to purely execute stuff or to communicate with another process via the standard files (stdin, stdout, stderr). It was introduced to Python in version 2.4.
4.9.2. The multiprocessing Module
This module, added in Python 2.6, lets you spawn processes for multiple cores or CPUs but with an interface very similar to that of the threading module; it also contains various mechanisms to pass data between processes that are cooperating on shared work.
4.9.3. The concurrent.futures Module
This is a new high-level library that operates only at a “job” level, which means that you no longer have to fuss with synchronization, or managing threads or processes. you just specify a thread or process pool with a certain number of “workers,” submit jobs, and collate the results. It’s new in Python 3.2, but a port for Python 2.6+ is available at http://code.google.com/p/pythonfutures.
What would bookrank3.py look like with this change? Assuming everything else stays the same, here’s the new import and modified _main() function:
from concurrent.futures import ThreadPoolExecutor . . . def _main(): print('At', ctime(), 'on Amazon...') with ThreadPoolExecutor(3) as executor: for isbn in ISBNs: executor.submit(_showRanking, isbn) print('all DONE at:', ctime())
The argument given to concurrent.futures.ThreadPoolExecutor is the thread pool size, and our application is looking for the rankings of three books. Of course, this is an I/O-bound application for which threads are more useful. For a CPU-bound application, we would use concurrent. futures.ProcessPoolExecutor, instead.
Once we have an executor (whether threads or processes), which is responsible for dispatching the jobs and collating the results, we can call its submit() method to execute what we would have had to spawn a thread to run previously.
If we do a “full” port to Python 3 by replacing the string format operator with the str.format() method, making liberal use of the with statement, and using the executor’s map() method, we can actually delete _showRanking() and roll its functionality into _main(). In Example 4-13, you’ll find our final bookrank3CF.py script.
Example 4-13. Higher-Level Job Management (bookrank3CF.py)
Our friend, the book rank screenscraper, but this time using concurrent.futures.
1 #!/usr/bin/env python 2 3 from concurrent.futures import ThreadPoolExecutor 4 from re import compile 5 from time import ctime 6 from urllib.request import urlopen as uopen 7 8 REGEX = compile(b'#([\d,]+) in Books ') 9 AMZN = 'http://amazon.com/dp/' 10 ISBNs = { 11 '0132269937': 'Core Python Programming', 12 '0132356139': 'Python Web Development with Django', 13 '0137143419': 'Python Fundamentals', 14 } 15 16 def getRanking(isbn): 17 with uopen('{0}{1}'.format(AMZN, isbn)) as page: 18 return str(REGEX.findall(page.read())[0], 'utf-8') 19 20 def _main(): 21 print('At', ctime(), 'on Amazon...') 22 with ThreadPoolExecutor(3) as executor: 23 for isbn, ranking in zip( 24 ISBNs, executor.map(getRanking, ISBNs)): 25 print('- %r ranked %s' % (ISBNs[isbn], ranking) 26 print('all DONE at:', ctime()) 27 28 if __name__ == '__main__': 29 main()
Line-by-Line Explanation
Lines 1–14
Outside of the new import statement, everything in the first half of this script is identical to the bookrank3.py file we looked at earlier in this chapter.
Lines 16–18
The new getRanking() uses the with statement and str.format(). You can make the same change to bookrank.py because both features are available in version 2.6+ (they are not unique to version 3.x).
Lines 20–26
In the previous code example, we used executor.submit() to spawn the jobs. Here, we tweak this slightly by using executor.map() because it allows us to absorb the functionality from _showRanking(), letting us remove it entirely from our code.
The output is nearly identical to what we’ve seen earlier:
$ python3 bookrank3CF.py At Wed Apr 6 00:21:50 2011 on Amazon... - 'Core Python Programming' ranked 43,992 - 'Python Fundamentals' ranked 1,018,454 - 'Python Web Development with Django' ranked 502,566 all DONE at: Wed Apr 6 00:21:55 2011
You can read more about the concurrent.futures module origins at the link below.
- http://docs.python.org/dev/py3k/library/concurrent.futures.html
- http://code.google.com/p/pythonfutures/
- http://www.python.org/dev/peps/pep-3148/
A summary of these options and other threading-related modules and packages can be found in the next section.