- 4.1. Introduction/Motivation
- 4.2. Threads and Processes
- 4.3. Threads and Python
- 4.4. The thread Module
- 4.5. The threading Module
- 4.6. Comparing Single vs. Multithreaded Execution
- 4.7. Multithreading in Practice
- 4.8. Producer-Consumer Problem and the Queue/queue Module
- 4.9. Alternative Considerations to Threads
- 4.10. Related Modules
- 4.11. Exercises
4.7. Multithreading in Practice
So far, none of the simplistic sample snippets we’ve seen so far represent code that you’d write in practice. They don’t really do anything useful beyond demonstrating threads and the different ways that you can create them—the way we’ve started them up and wait for them to finish are all identical, and they all just sleep, too.
We also mentioned earlier in Section 4.3.1 that due to the fact that the Python Virtual Machine is single-threaded (the GIL), greater concurrency in Python is only possible when threading is applied to an I/O-bound application (versus CPU-bound applications, which only do round-robin), so let’s look at an example of this, and for a further exercise, try to port it to Python 3 to give you a sense of what that process entails.
4.7.1. Book Rankings Example
The bookrank.py script shown in Example 4-9 is very staightforward. It goes to the one of my favorite online retailers, Amazon, and asks for the current rankings of books written by yours truly. In our sample code, you’ll see a function, getRanking(), that uses a regular expression to pull out and return the current ranking plus showRanking(), which displays the result to the user.
Note that, according to their Conditions of Use guidelines, “Amazon grants you a limited license to access and make personal use of this site and not to download (other than page caching) or modify it, or any portion of it, except with express written consent of Amazon.” For our application, all we’re doing is looking at the current book rankings for a specific book and then throwing everything away; we’re not even caching the page.
Example 4-9 is our first (but nearly-final) attempt at bookrank.py, which is a non-threaded version.
Example 4-9. Book Rankings “Screenscraper” (bookrank.py)
This script makes calls to download book ranking information via separate threads.
1 #!/usr/bin/env python 2 3 from atexit import register 4 from re import compile 5 from threading import Thread 6 from time import ctime 7 from urllib2 import urlopen as uopen 8 9 REGEX = compile('#([\d,]+) in Books ') 10 AMZN = 'http://amazon.com/dp/' 11 ISBNs = { 12 '0132269937': 'Core Python Programming', 13 '0132356139': 'Python Web Development with Django', 14 '0137143419': 'Python Fundamentals', 15 } 16 17 def getRanking(isbn): 18 page = uopen('%s%s' % (AMZN, isbn)) # or str.format() 19 data = page.read() 20 page.close() 21 return REGEX.findall(data)[0] 22 23 def _showRanking(isbn): 24 print '- %r ranked %s' % ( 25 ISBNs[isbn], getRanking(isbn)) 26 27 def _main(): 28 print 'At', ctime(), 'on Amazon...' 29 for isbn in ISBNs: 30 _showRanking(isbn) 31 32 @register 33 def _atexit(): 34 print 'all DONE at:', ctime() 35 36 if __name__ == '__main__': 37 main()
Line-by-Line Explanation
Lines 1–7
These are the startup and import lines. We’ll use the atexit.register() function to tell us when the script is over (you’ll see why later). We’ll also use the regular expression re.compile() function for the pattern that matches a book’s ranking on Amazon’s product pages. Then, we save the threading.Thread import for future improvement (coming up a bit later), time.ctime() for the current timestamp string, and urllib2.urlopen() for accessing each link.
Lines 9–15
We use three constants in this script: REGEX, the regular expression object (compiled from the regex pattern that matches a book’s ranking); AMZN, the base Amazon product link—all we need to complete each link is a book’s International Standard Book Number (ISBN), which serves as a book’s ID, differentiating one written work from all others. There are two standards: the ISBN-10 ten-character value and its successor, the ISBN-13 thirteen-character ISBN. Currently, Amazon’s systems understand both ISBN types, so we’ll just use ISBN-10 because they’re shorter. These are stored in the ISBNs dictionary along with the corresponding book titles.
Lines 17–21
The purpose of getRanking() is to take an ISBN, create the final URL with which to communicate to Amazon’s servers, and then call urllib2.urlopen() on it. We used the string format operator to put together the URL (on line 18) but if you’re using version 2.6 and newer, you can also try the str.format() method, for example, '{0}{1}'.format(AMZN,isbn).
Once you have the full URL, call urllib2.urlopen()—we shortened it to uopen()—and expect the file-like object back once the Web server has been contacted. Then the read() call is issued to download the entire Web page, and “file” is closed. If the regex is as precise as we have planned, there should only be exactly one match, so we grab it from the generated list (any additional would be dropped) and return it back to the caller.
Lines 23–25
The _showRanking() function is just a short snippet of code that takes an ISBN, looks up the title of the book it represents, calls getRanking() to get its current ranking on Amazon’s Web site, and then outputs both of these values to the user. The leading single-underscore notation indicates that this is a special function only to be used by code within this module and should not be imported by any other application using this as a library or utility module.
Lines 27–30
_main() is also a special function, only executed if this module is run directly from the command-line (and not imported for use by another module). It shows the start and end times (to let users know how long it took to run the entire script) and calls _showRanking() for each ISBN to lookup and display each book’s current ranking on Amazon.
Lines 32–37
These lines present something completely different. What is atexit.register()? It’s a function (used in a decorator role here) that registers an exit function with the Python interpreter, meaning it’s requesting a special function be called just before the script quits. (Instead of the decorator, you could have also done register (_atexit()).
Why are we using it here? Well, right now, it’s definitely not needed. The print statement could very well go at the end of _main() in lines 27–31, but that’s not a really great place for it. Plus this is functionality that you might really want to use in a real production application at some point. We assume that you know what lines 36–37 are about, so onto the output:
$ python bookrank.py At Wed Mar 30 22:11:19 2011 PDT on Amazon... - 'Core Python Programming' ranked 87,118 - 'Python Fundamentals' ranked 851,816 - 'Python Web Development with Django' ranked 184,735 all DONE at: Wed Mar 30 22:11:25 2011
If you’re wondering, we’ve separated the process of retrieving (getRanking()) and displaying (_showRanking() and _main()) the data in case you wish to do something other than dumping the results out to the user via the terminal. In practice, you might need to send this data back via a Web template, store it in a database, text it to a mobile phone, etc. If you put all of this code into a single function, it makes it harder to reuse and/or repurpose.
Also, if Amazon changes the layout of their product pages, you might need to modify the regular expression “screenscraper” to continue to be able to extract the data from the product page. By the way, using a regex (or even plain old string processing) for this simple example is fine, but you might need a more powerful markup parser, such as HTMLParser from the standard library or third-party tools like BeautifulSoup, html5lib, or lxml. (We demonstrate a few of these in Chapter 9, “Web Clients and Servers.”)
Add threading
Okay, you don’t have to tell me that this is still a silly single-threaded program. We’re going to change our application to use threads instead. It is an I/O-bound application, so this is a good candidate to do so. To simplify things, we won’t use any of the classes and object-oriented programming; instead, we’ll use threading.Thread directly, so you can think of this more as a derivative of mtsleepC.py than any of the succeeding examples. We’ll just spawn the threads and start them up immediately.
Take your application and modify the _showRanking(isbn) call to the following:
Thread(target=_showRanking, args=(isbn,)).start().
That’s it! Now you have your final version of bookrank.py and can see that the application (typically) runs faster because of the added concurrency. But, your still only as fast as the slowest response.
$ python bookrank.py At Thu Mar 31 10:11:32 2011 on Amazon... - 'Python Fundamentals' ranked 869,010 - 'Core Python Programming' ranked 36,481 - 'Python Web Development with Django' ranked 219,228 all DONE at: Thu Mar 31 10:11:35 2011
As you can see from the output, instead of taking six seconds as our single-threaded version, our threaded version only takes three. Also note that the output is in “by completion” order, which is variable, versus the single-threaded display. With the non-threaded version, the order is always by key, but now the queries all happen in parallel with the output coming as each thread completes its work.
In the earlier mtsleepX.py examples, we used Thread.join() on all the threads to block execution until each thread exits. This effectively prevents the main thread from continuing until all threads are done, so the print statement of “all DONE at” is called at the correct time.
In those examples, it’s not necessary to join() all the threads because none of them are daemon threads. The main thread is not going to exit the script until all the spawned threads have completed anyway. Because of this reasoning, we’ve dropped all the join()s in mtsleepF.py. However, realize that if we displayed “all done” from the same spot, it would be incorrect.
The main thread would have displayed “all done” before the threads have completed, so we can’t have that print call above in _main(). There are only 2 places we can put this print: after line 37 when _main() returns (the very final line executed of our script), or use atexit.register() to register an exit function. Because the latter is something we haven’t discussed before and might be something useful to you later on, we thought this would be a good place to introduce it to you. This is also one interface that remains constant between Python 2 and 3, our upcoming challenge.
Porting to Python 3
The next thing we want is a working Python 3 version of this script. As projects and applications continue down the migration path, this is something with which you need to become familiar, anyway. Fortunately, there are few tools to help you, one of them being the 2to3 tool. There are generally two ways of using it:
$ 2to3 foo.py # only output diff $ 2to3 -w foo.py # overwrites w/3.x code
In the first command, the 2to3 tool just displays the differences between the version 2.x original script and its generated 3.x equivalent. The -w flag instructs 2to3 to overwrite the original script with the newly minted 3.x version while renaming the 2.x version to foo.py.bak.
Let’s run 2to3 on bookrank.py, writing over the existing file. It not only spits out the differences, it also saves the new version, as we just described:
$ 2to3 -w bookrank.py RefactoringTool: Skipping implicit fixer: buffer RefactoringTool: Skipping implicit fixer: idioms RefactoringTool: Skipping implicit fixer: set_literal RefactoringTool: Skipping implicit fixer: ws_comma --- bookrank.py (original) +++ bookrank.py (refactored) @@ -4,7 +4,7 @@ from re import compile from threading import Thread from time import ctime -from urllib2 import urlopen as uopen +from urllib.request import urlopen as uopen REGEX = compile('#([\d,]+) in Books ') AMZN = 'http://amazon.com/dp/' @@ -21,17 +21,17 @@ return REGEX.findall(data)[0] def _showRanking(isbn): - print '- %r ranked %s' % ( - ISBNs[isbn], getRanking(isbn)) + print('- %r ranked %s' % ( + ISBNs[isbn], getRanking(isbn))) def _main(): - print 'At', ctime(), 'on Amazon...' + print('At', ctime(), 'on Amazon...') for isbn in ISBNs: Thread(target=_showRanking, args=(isbn,)).start()#_showRanking(isbn) @register def _atexit(): - print 'all DONE at:', ctime() + print('all DONE at:', ctime()) if __name__ == '__main__': _main() RefactoringTool: Files that were modified: RefactoringTool: bookrank.py
The following step is optional for readers, but we renamed our files to bookrank.py and bookrank3.py by using these POSIX commands (Windows-based PC users should use the ren command):
$ mv bookrank.py bookrank3.py $ mv bookrank.py.bak bookrank.py
If you try to run our new next-generation script, it’s probably wishful thinking that it’s a perfect translation and that you’re done with your work. Something bad happened, and you’ll get the following exception in each thread (this output is for just one thread as they’re all the same):
$ python3 bookrank3.py Exception in thread Thread-1: Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/ 3.2/lib/python3.2/threading.py", line 736, in _bootstrap_inner self.run() File "/Library/Frameworks/Python.framework/Versions/ 3.2/lib/python3.2/threading.py", line 689, in run self._target(*self._args, **self._kwargs) File "bookrank3.py", line 25, in _showRanking ISBNs[isbn], getRanking(isbn))) File "bookrank3.py", line 21, in getRanking return REGEX.findall(data)[0] TypeError: can't use a string pattern on a bytes-like object :
Darn it! Apparently the problem is that the regular expression is a (Unicode) string, whereas the data that comes back from urlopen() file-like object’s read() method is an ASCII/bytes string. The fix here is to compile a bytes object instead of a text string. Therefore, change line 9 so that re.compile() is compiling a bytes string (by adding the bytes string. To do this, add the bytes string designation b just before the opening quote, as shown here:
REGEX = compile(b'#([\d,]+) in Books ')
Now let’s try it again:
$ python3 bookrank3.py At Sun Apr 3 00:45:46 2011 on Amazon... - 'Core Python Programming' ranked b'108,796' - 'Python Web Development with Django' ranked b'268,660' - 'Python Fundamentals' ranked b'969,149' all DONE at: Sun Apr 3 00:45:49 2011
Aargh! What’s wrong now? Well, it’s a little bit better (no errors), but the output looks weird. The ranking values grabbed by the regular expressions, when passed to str() show the b and quotes. Your first instinct might be to try ugly string slicing:
>>> x = b'xxx' >>> repr(x) "b'xxx'" >>> str(x) "b'xxx'" >>> str(x)[2:-1] 'xxx'
However, it’s just more appropriate to convert it to a real (Unicode string, perhaps using UTF-8:
>>> str(x, 'utf-8') 'xxx'
To do that in our script, make a similar change to line 53 so that it now reads as:
return str(REGEX.findall(data)[0], 'utf-8')
Now, the output of our Python 3 script matches that of our Python 2 script:
$ python3 bookrank3.py At Sun Apr 3 00:47:31 2011 on Amazon... - 'Python Fundamentals' ranked 969,149 - 'Python Web Development with Django' ranked 268,660 - 'Core Python Programming' ranked 108,796 all DONE at: Sun Apr 3 00:47:34 2011
In general, you’ll find that porting from version 2.x to version 3.x follows a similar pattern: you ensure that all your unit and integration tests pass, knock down all the basics using 2to3 (and other tools), and then clean up the aftermath by getting the code to run and pass the same tests. We’ll try this exercise again with our next example which demonstrates the use of synchronization with threads.
4.7.2. Synchronization Primitives
In the main part of this chapter, we looked at basic threading concepts and how to utilize threading in Python applications. However, we neglected to mention one very important aspect of threaded programming: synchronization. Often times in threaded code, you will have certain functions or blocks in which you don’t (or shouldn’t) want more than one thread executing. Usually these involve modifying a database, updating a file, or anything similar that might cause a race condition, which, if you recall from earlier in the chapter, is when different code paths or behaviors are exhibited or inconsistent data was rendered if one thread ran before another one and vice versa. (You can read more about race conditions on the Wikipedia page at http://en.wikipedia.org/wiki/Race_condition.)
Such cases require synchronization. Synchronization is used when any number of threads can come up to one of these critical sections of code (http://en.wikipedia.org/wiki/Critical_section), but only one is allowed through at any given time. The programmer makes these determinations and chooses the appropriate synchronization primitives, or thread control mechanisms to perform the synchronization. There are different types of process synchronization (see http://en.wikipedia.org/wiki/Synchronization_(computer_science)) and Python supports several types, giving you enough choices to select the best one to get the job done.
We introduced them all to you earlier at the beginning of this section, so here we’d like to demonstrate a couple of sample scripts that use two types of synchronization primitives: locks/mutexes, and semaphores. A lock is the simplest and lowest-level of all these mechanisms; while semaphores are for situations in which multiple threads are contending for a finite resource. Locks are easier to explain, so we’ll start there, and then discuss semaphores.
4.7.3. Locking Example
Locks have two states: locked and unlocked (surprise, surprise). They support only two functions: acquire and release. These actions mean exactly what you think.
As multiple threads vie for a lock, the first thread to acquire one is permitted to go in and execute code in the critical section. All other threads coming along are blocked until the first thread wraps up, exits the critical section, and releases the lock. At this moment, any of the other waiting threads can acquire the lock and enter the critical section. Note that there is no ordering (first come, first served) for the blocked threads; the selection of the “winning” thread is not deterministic and can vary between different implementations of Python.
Let’s see why locks are necessary. mtsleepF.py is an application that spawns a random number of threads, each of which outputs when it has completed. Take a look at the core chunk of (Python 2) source here:
from atexit import register from random import randrange from threading import Thread, currentThread from time import sleep, ctime class CleanOutputSet(set): def __str__(self): return ', '.join(x for x in self) loops = (randrange(2,5) for x in xrange(randrange(3,7))) remaining = CleanOutputSet() def loop(nsec): myname = currentThread().name remaining.add(myname) print '[%s] Started %s' % (ctime(), myname) sleep(nsec) remaining.remove(myname) print '[%s] Completed %s (%d secs)' % ( ctime(), myname, nsec) print ' (remaining: %s)' % (remaining or 'NONE') def _main(): for pause in loops: Thread(target=loop, args=(pause,)).start() @register def _atexit(): print 'all DONE at:', ctime()
We’ll have a longer line-by-line explanation once we’ve finalized our code with locking, but basically what mtsleepF.py does is expand on our earlier examples. Like bookrank.py, we simplify the code a bit by skipping object-oriented programming, drop the list of thread objects and thread join()s, and (re)use atexit.register() (for all the same reasons as bookrank.py).
Also as a minor change to the earlier mtsleepX.py examples, instead of hardcoding a pair of loops/threads sleeping for 4 and 2 seconds, respectively, we wanted to mix it up a little by randomly creating between 3 and 6 threads, each of which can sleep anywhere between 2 and 4 seconds.
One of the new features that stands out is the use of a set to hold the names of the remaining threads still running. The reason why we’re subclassing the set object instead of using it directly is because we just want to demonstrate another use case, altering the default printable string representation of a set.
When you display a set, you get output such as set([X, Y, Z,...]). The issue is that the users of our application don’t (and shouldn’t) need to know anything about sets or that we’re using them. We just want to display something like X, Y, Z, ..., instead; thus the reason why we derived from set and implemented its __str__() method.
With this change, and if you’re lucky, the output will be all nice and lined up properly:
$ python mtsleepF.py [Sat Apr 2 11:37:26 2011] Started Thread-1 [Sat Apr 2 11:37:26 2011] Started Thread-2 [Sat Apr 2 11:37:26 2011] Started Thread-3 [Sat Apr 2 11:37:29 2011] Completed Thread-2 (3 secs) (remaining: Thread-3, Thread-1) [Sat Apr 2 11:37:30 2011] Completed Thread-1 (4 secs) (remaining: Thread-3) [Sat Apr 2 11:37:30 2011] Completed Thread-3 (4 secs) (remaining: NONE) all DONE at: Sat Apr 2 11:37:30 2011
However, if you’re unlucky, you might get strange output such as this pair of example executions:
$ python mtsleepF.py [Sat Apr 2 11:37:09 2011] Started Thread-1 [Sat Apr 2 11:37:09 2011] Started Thread-2 [Sat Apr 2 11:37:09 2011] Started Thread-3 [Sat Apr 2 11:37:12 2011] Completed Thread-1 (3 secs) [Sat Apr 2 11:37:12 2011] Completed Thread-2 (3 secs) (remaining: Thread-3) (remaining: Thread-3) [Sat Apr 2 11:37:12 2011] Completed Thread-3 (3 secs) (remaining: NONE) all DONE at: Sat Apr 2 11:37:12 2011 $ python mtsleepF.py [Sat Apr 2 11:37:56 2011] Started Thread-1 [Sat Apr 2 11:37:56 2011] Started Thread-2 [Sat Apr 2 11:37:56 2011] Started Thread-3 [Sat Apr 2 11:37:56 2011] Started Thread-4 [Sat Apr 2 11:37:58 2011] Completed Thread-2 (2 secs) [Sat Apr 2 11:37:58 2011] Completed Thread-4 (2 secs) (remaining: Thread-3, Thread-1) (remaining: Thread-3, Thread-1) [Sat Apr 2 11:38:00 2011] Completed Thread-1 (4 secs) (remaining: Thread-3) [Sat Apr 2 11:38:00 2011] Completed Thread-3 (4 secs) (remaining: NONE) all DONE at: Sat Apr 2 11:38:00 2011
What’s wrong? Well, for one thing, the output might appear partially garbled (because multiple threads might be executing I/O in parallel). You can see some examples of preceding code in which the output is interleaved, too. Another problem identified is when you have two threads modifying the same variable (the set containing the names of the remaining threads).
Both the I/O and access to the same data structure are part of critical sections; therefore, we need locks to prevent more than one thread from entering them at the same time. To add locking, you need to add a line of code to import the Lock (or RLock) object and create a lock object, so add/modify your code to contain these lines in the right places:
from threading import Thread, Lock, currentThread lock = Lock()
Now you mut use your lock. The following code highlights the acquire() and release() calls that we should insert into our loop() function:
def loop(nsec): myname = currentThread().name lock.acquire() remaining.add(myname) print '[%s] Started %s' % (ctime(), myname) lock.release() sleep(nsec) lock.acquire() remaining.remove(myname) print '[%s] Completed %s (%d secs)' % ( ctime(), myname, nsec) print ' (remaining: %s)' % (remaining or 'NONE') lock.release()
Once the changes are made, you should no longer get strange output:
$ python mtsleepF.py [Sun Apr 3 23:16:59 2011] Started Thread-1 [Sun Apr 3 23:16:59 2011] Started Thread-2 [Sun Apr 3 23:16:59 2011] Started Thread-3 [Sun Apr 3 23:16:59 2011] Started Thread-4 [Sun Apr 3 23:17:01 2011] Completed Thread-3 (2 secs) (remaining: Thread-4, Thread-2, Thread-1) [Sun Apr 3 23:17:01 2011] Completed Thread-4 (2 secs) (remaining: Thread-2, Thread-1) [Sun Apr 3 23:17:02 2011] Completed Thread-1 (3 secs) (remaining: Thread-2) [Sun Apr 3 23:17:03 2011] Completed Thread-2 (4 secs) (remaining: NONE) all DONE at: Sun Apr 3 23:17:03 2011
The modified (and final) version of mtsleepF.py is shown in Example 4-10.
Example 4-10. Locks and More Randomness (mtsleepF.py)
In this example, we demonstrate the use of locks and other threading tools.
1 #!/usr/bin/env python 2 3 from atexit import register 4 from random import randrange 5 from threading import Thread, Lock, currentThread 6 from time import sleep, ctime 7 8 class CleanOutputSet(set): 9 def __str__(self): 10 return ', '.join(x for x in self) 11 12 lock = Lock() 13 loops = (randrange(2,5) for x in xrange(randrange(3,7))) 14 remaining = CleanOutputSet() 15 16 def loop(nsec): 17 myname = currentThread().name 18 lock.acquire() 19 remaining.add(myname) 20 print '[%s] Started %s' % (ctime(), myname) 21 lock.release() 22 sleep(nsec) 23 lock.acquire() 24 remaining.remove(myname) 25 print '[%s] Completed %s (%d secs)' % ( 26 ctime(), myname, nsec) 27 print ' (remaining: %s)' % (remaining or 'NONE') 28 lock.release() 29 30 def _main(): 31 for pause in loops: 32 Thread(target=loop, args=(pause,)).start() 33 34 @register 35 def _atexit(): 36 print 'all DONE at:', ctime() 37 38 if __name__ == '__main__': 39 main()
Line-by-Line Explanation
Lines 1–6
These are the usual startup and import lines. Be aware that threading.currentThread() is renamed to threading.current_thread() starting in version 2.6 but with the older name remaining intact for backward compatibility.
Lines 8–10
This is the set subclass we described earlier. It contains an implementation of __str__() to change the output from the default to a comma-delimited string of its elements.
Lines 12–14
Our global variables consist of the lock, an instance of our modified set from above, and a random number of threads (between three and six), each of which will pause or sleep for between two and four seconds.
Lines 16–28
The loop() function saves the name of the current thread executing it, then acquires a lock so that the addition of that name to the remaining set and an output indicating the thread has started is atomic (where no other thread can enter this critical section). After releasing the lock, this thread sleeps for the predetermined random number of seconds, then re-acquires the lock in order to do its final output before releasing it.
Lines 30–39
The _main() function is only executed if this script was not imported for use elsewhere. Its job is to spawn and execute each of the threads. As mentioned before, we use atexit.register() to register the _atexit() function that the interpreter can execute before exiting.
As an alternative to maintaining your own set of currently running threads, you might consider using threading.enumerate(), which returns a list of all threads that are still running (including daemon threads, but not those which haven’t started yet). We didn’t use it for our example here because it gives us two extra threads that we need to remove to keep our output short: the current thread (because it hasn’t completed yet) as well as the main thread (not necessary to show this either).
Also don’t forget that you can also use the str.format() method instead of the string format operator if you’re using Python 2.6 or newer (including version 3.x). In other words, this print statement
print '[%s] Started %s' % (ctime(), myname)
can be replaced by this one in 2.6+
print '[{0}] Started {1}'.format(ctime(), myname)
or this call to the print() function in version 3.x:
print('[{0}] Started {1}'.format(ctime(), myname))
If you just want a count of currently running threads, you can use threading.activeCount() (renamed to active_count() starting in version 2.6), instead.
Using Context Management
Another option for those of you using Python 2.5 and newer is to have neither the lock acquire() nor release() calls at all, simplifying your code. When using the with statement, the context manager for each object is responsible for calling acquire() before entering the suite and release() when the block has completed execution.
The threading module objects Lock, RLock, Condition, Semaphore, and BoundedSemaphore, all have context managers, meaning they can be used with the with statement. By using with, you can further simplify loop() to:
from __future__ import with_statement # 2.5 only def loop(nsec): myname = currentThread().name with lock: remaining.add(myname) print '[%s] Started %s' % (ctime(), myname) sleep(nsec) with lock: remaining.remove(myname) print '[%s] Completed %s (%d secs)' % ( ctime(), myname, nsec) print ' (remaining: %s)' % ( remaining or 'NONE',)
Porting to Python 3
Now let’s do a seemingly easy port to Python 3.x by running the 2to3 tool on the preceding script (this output is truncated because we saw a full diff dump earlier):
$ 2to3 -w mtsleepF.py RefactoringTool: Skipping implicit fixer: buffer RefactoringTool: Skipping implicit fixer: idioms RefactoringTool: Skipping implicit fixer: set_literal RefactoringTool: Skipping implicit fixer: ws_comma : RefactoringTool: Files that were modified: RefactoringTool: mtsleepF.py
After renaming mtsleepF.py to mtsleepF3.py and mtsleep.py.bak to mtsleepF.py, we discover, much to our pleasant surprise, that this is one script that ported perfectly, with no issues:
$ python3 mtsleepF3.py [Sun Apr 3 23:29:39 2011] Started Thread-1 [Sun Apr 3 23:29:39 2011] Started Thread-2 [Sun Apr 3 23:29:39 2011] Started Thread-3 [Sun Apr 3 23:29:41 2011] Completed Thread-3 (2 secs) (remaining: Thread-2, Thread-1) [Sun Apr 3 23:29:42 2011] Completed Thread-2 (3 secs) (remaining: Thread-1) [Sun Apr 3 23:29:43 2011] Completed Thread-1 (4 secs) (remaining: NONE) all DONE at: Sun Apr 3 23:29:43 2011
Now let’s take our knowledge of locks, introduce semaphores, and look at an example that uses both.
4.7.4. Semaphore Example
As stated earlier, locks are pretty simple to understand and implement. It’s also fairly easy to decide when you should need them. However, if the situation is more complex, you might need a more powerful synchronization primitive, instead. For applications with finite resources, using semaphores might be a better bet.
Semaphores are some of the oldest synchronization primitives out there. They’re basically counters that decrement when a resource is being consumed (and increment again when the resource is released). You can think of semaphores representing their resources as either available or unavailable. The action of consuming a resource and decrementing the counter is traditionally called P() (from the Dutch word probeer/proberen) but is also known as wait, try, acquire, pend, or procure. Conversely, when a thread is done with a resource, it needs to return it back to the pool. To do this, the action used is named “V()” (from the Dutch word verhogen/-verhoog) but also known as signal, increment, release, post, vacate. Python simplifies all the naming and uses the same function/method names as locks: acquire and release. Semaphores are more flexible than locks because you can have multiple threads, each using one of the instances of the finite resource.
For our example, we’re going to simulate an oversimplified candy vending machine as an example. This particular machine has only five slots available to hold inventory (candy bars). If all slots are taken, no more candy can be added to the machine, and similarly, if there are no more of one particular type of candy bar, consumers wishing to purchase that product are out-of-luck. We can track these finite resources (candy slots) by using a semaphore.
Example 4-11 shows the source code (candy.py).
Example 4-11. Candy Vending Machine and Semaphores (candy.py)
This script uses locks and semaphores to simulate a candy vending machine.
1 #!/usr/bin/env python 2 3 from atexit import register 4 from random import randrange 5 from threading import BoundedSemaphore, Lock, Thread 6 from time import sleep, ctime 7 8 lock = Lock() 9 MAX = 5 10 candytray = BoundedSemaphore(MAX) 11 12 def refill(): 13 lock.acquire() 14 print 'Refilling candy...', 15 try: 16 candytray.release() 17 except ValueError: 18 print 'full, skipping' 19 else: 20 print 'OK' 21 lock.release() 22 23 def buy(): 24 lock.acquire() 25 print 'Buying candy...', 26 if candytray.acquire(False): 27 print 'OK' 28 else: 29 print 'empty, skipping' 30 lock.release() 31 32 def producer(loops): 33 for i in xrange(loops): 34 refill() 35 sleep(randrange(3)) 36 37 def consumer(loops): 38 for i in xrange(loops): 39 buy() 40 sleep(randrange(3)) 41 42 def _main(): 43 print 'starting at:', ctime() 44 nloops = randrange(2, 6) 45 print 'THE CANDY MACHINE (full with %d bars)!' % MAX 46 Thread(target=consumer, args=(randrange( 47 nloops, nloops+MAX+2),)).start() # buyer 48 Thread(target=producer, args=(nloops,)).start() #vndr 49 50 @register 51 def _atexit(): 52 print 'all DONE at:', ctime() 53 54 if __name__ == '__main__': 55 _main()
Line-by-Line Explanation
Lines 1–6
The startup and import lines are quite similar to examples earlier in this chapter. The only thing new is the semaphore. The threading module comes with two semaphore classes, Semaphore and BoundedSemaphore. As you know, semaphores are really just counters; they start off with some fixed number of a finite resource.
This counter decrements when one unit of this is allocated, and when that unit is returned to the pool, the counter increments. The additional feature you get with a BoundedSemaphore is that the counter can never increment beyond its initial value; in other words, it prevents the aberrant use case where a semaphore is released more times than it’s acquired.
Lines 8–10
The global variables in this script are the lock, a constant representing the maximum number of items that can be inventoried, and the tray of candy.
Lines 12–21
The refill() function is performed when the owner of the fictitious vending machines comes to add one more item to inventory. The entire routine represents a critical section; this is why acquiring the lock is the only way to execute all lines. The code outputs its action to the user as well as warns when someone has exceeded the maximum inventory (lines 17–18).
Lines 23–30
buy() is the converse of refill(); it allows a consumer to acquire one unit of inventory. The conditional (line 26) detects when all finite resources have been consumed already. The counter can never go below zero, so this call would normally block until the counter is incremented again. By passing the nonblocking flag as False, this instructs the call to not block but to return a False if it would’ve blocked, indicating no more resources.
Lines 32–40
The producer() and consumer() functions merely loop and make corresponding calls to refill() and buy(), pausing momentarily between calls.
Lines 42–55
The remainder of the code contains the call to _main() if the script was executed from the command-line, the registration of the exit function, and finally, _main(), which seeds the newly created pair of threads representing the producer and consumer of the candy inventory.
The additional math in the creation of the consumer/buyer is to randomly suggest positive bias where a customer might actually consume more candy bars than the vendor/producer puts in the machine (otherwise, the code would never enter the situation in which the consumer attempts to buy a candy bar from an empty machine).
Running the script results in output similar to the following:
$ python candy.py starting at: Mon Apr 4 00:56:02 2011 THE CANDY MACHINE (full with 5 bars)! Buying candy... OK Refilling candy... OK Refilling candy... full, skipping Buying candy... OK Buying candy... OK Refilling candy... OK Buying candy... OK Buying candy... OK Buying candy... OK all DONE at: Mon Apr 4 00:56:08 2011
Porting to Python 3
Similar to mtsleepF.py, candy.py is another example of how the 2to3 tool is sufficient to generate a working Python 3 version, which we have renamed to candy3.py. We’ll leave this as an exercise for the reader to confirm.
Summary
We’ve demonstrated only a couple of the synchronization primitives that come with the threading module. There are plenty more for you to explore. However, keep in mind that that’s still only what they are: “primitives.” There’s nothing wrong with using them to build your own classes and data structures that are thread-safe. The Python Standard Library comes with one, the Queue object.