- 4.1. Introduction/Motivation
- 4.2. Threads and Processes
- 4.3. Threads and Python
- 4.4. The thread Module
- 4.5. The threading Module
- 4.6. Comparing Single vs. Multithreaded Execution
- 4.7. Multithreading in Practice
- 4.8. Producer-Consumer Problem and the Queue/queue Module
- 4.9. Alternative Considerations to Threads
- 4.10. Related Modules
- 4.11. Exercises
4.4. The thread Module
Let’s take a look at what the thread module has to offer. In addition to being able to spawn threads, the thread module also provides a basic synchronization data structure called a lock object (a.k.a. primitive lock, simple lock, mutual exclusion lock, mutex, and binary semaphore). As we mentioned earlier, such synchronization primitives go hand in hand with thread management.
Table 4-1 lists the more commonly used thread functions and LockType lock object methods.
Table 4-1. thread Module and Lock Objects
Function/Method |
Description |
thread Module Functions |
|
start_new_thread(function, args, kwargs=None) |
Spawns a new thread and executes function with the given args and optional kwargs |
allocate_lock() |
Allocates LockType lock object |
exit() |
Instructs a thread to exit |
LockType Lock Object Methods |
|
acquire(wait=None) |
Attempts to acquire lock object |
locked() |
Returns True if lock acquired, False otherwise |
release() |
Releases lock |
The key function of the thread module is start_new_thread(). It takes a function (object) plus arguments and optionally, keyword arguments. A new thread is spawned specifically to invoke the function.
Let’s take our onethr.py example and integrate threading into it. By slightly changing the call to the loop*() functions, we now present mtsleepA.py in Example 4-2:
Example 4-2. Using the thread Module (mtsleepA.py)
The same loops from onethr.py are executed, but this time using the simple multithreaded mechanism provided by the thread module. The two loops are executed concurrently (with the shorter one finishing first, obviously), and the total elapsed time is only as long as the slowest thread rather than the total time for each separately.
1 #!/usr/bin/env python 2 3 import thread 4 from time import sleep, ctime 5 6 def loop0(): 7 print 'start loop 0 at:', ctime() 8 sleep(4) 9 print 'loop 0 done at:', ctime() 10 11 def loop1(): 12 print 'start loop 1 at:', ctime() 13 sleep(2) 14 print 'loop 1 done at:', ctime() 15 16 def main(): 17 print 'starting at:', ctime() 18 thread.start_new_thread(loop0, ()) 19 thread.start_new_thread(loop1, ()) 20 sleep(6) 21 print 'all DONE at:', ctime() 22 23 if __name__ == '__main__': 24 main()
start_new_thread() requires the first two arguments, so that is the reason for passing in an empty tuple even if the executing function requires no arguments.
Upon execution of this program, our output changes drastically. Rather than taking a full 6 or 7 seconds, our script now runs in 4 seconds, the length of time of our longest loop, plus any overhead.
$ mtsleepA.py starting at: Sun Aug 13 05:04:50 2006 start loop 0 at: Sun Aug 13 05:04:50 2006 start loop 1 at: Sun Aug 13 05:04:50 2006 loop 1 done at: Sun Aug 13 05:04:52 2006 loop 0 done at: Sun Aug 13 05:04:54 2006 all DONE at: Sun Aug 13 05:04:56 2006
The pieces of code that sleep for 4 and 2 seconds now occur concurrently, contributing to the lower overall runtime. You can even see how loop 1 finishes before loop 0.
The only other major change to our application is the addition of the sleep(6) call. Why is this necessary? The reason is that if we did not stop the main thread from continuing, it would proceed to the next statement, displaying “all done” and exit, killing both threads running loop0() and loop1().
We did not have any code that directed the main thread to wait for the child threads to complete before continuing. This is what we mean by threads requiring some sort of synchronization. In our case, we used another sleep() call as our synchronization mechanism. We used a value of 6 seconds because we know that both threads (which take 4 and 2 seconds) should have completed by the time the main thread has counted to 6.
You are probably thinking that there should be a better way of managing threads than creating that extra delay of 6 seconds in the main thread. Because of this delay, the overall runtime is no better than in our single-threaded version. Using sleep() for thread synchronization as we did is not reliable. What if our loops had independent and varying execution times? We could be exiting the main thread too early or too late. This is where locks come in.
Making yet another update to our code to include locks as well as getting rid of separate loop functions, we get mtsleepB.py, which is presented in Example 4-3. Running it, we see that the output is similar to mtsleepA.py. The only difference is that we did not have to wait the extra time for mtsleepA.py to conclude. By using locks, we were able to exit as soon as both threads had completed execution. This renders the following output:
$ mtsleepB.py starting at: Sun Aug 13 16:34:41 2006 start loop 0 at: Sun Aug 13 16:34:41 2006 start loop 1 at: Sun Aug 13 16:34:41 2006 loop 1 done at: Sun Aug 13 16:34:43 2006 loop 0 done at: Sun Aug 13 16:34:45 2006 all DONE at: Sun Aug 13 16:34:45 2006
Example 4-3. Using thread and Locks (mtsleepB.py)
Rather than using a call to sleep() to hold up the main thread as in mtsleepA.py, the use of locks makes more sense.
1 #!/usr/bin/env python 2 3 import thread 4 from time import sleep, ctime 5 6 loops = [4,2] 7 8 def loop(nloop, nsec, lock): 9 print 'start loop', nloop, 'at:', ctime() 10 sleep(nsec) 11 print 'loop', nloop, 'done at:', ctime() 12 lock.release() 13 14 def main(): 15 print 'starting at:', ctime() 16 locks = [] 17 nloops = range(len(loops)) 18 19 for i in nloops: 20 lock = thread.allocate_lock() 21 lock.acquire() 22 locks.append(lock) 23 24 for i in nloops: 25 thread.start_new_thread(loop, 26 (i, loops[i], locks[i])) 27 28 for i in nloops: 29 while locks[i].locked(): pass 30 31 print 'all DONE at:', ctime() 32 33 if __name__ == '__main__': 34 main()
So how did we accomplish our task with locks? Let’s take a look at the source code.
Line-by-Line Explanation
Lines 1–6
After the Unix startup line, we import the thread module and a few familiar attributes of the time module. Rather than hardcoding separate functions to count to 4 and 2 seconds, we use a single loop() function and place these constants in a list, loops.
Lines 8–12
The loop() function acts as a proxy for the deleted loop*() functions from our earlier examples. We had to make some cosmetic changes to loop() so that it can now perform its duties using locks. The obvious changes are that we need to be told which loop number we are as well as the sleep duration. The last piece of new information is the lock itself. Each thread will be allocated an acquired lock. When the sleep() time has concluded, we release the corresponding lock, indicating to the main thread that this thread has completed.
Lines 14–34
The bulk of the work is done here in main(), using three separate for loops. We first create a list of locks, which we obtain by using the thread.allocate_lock() function and acquire (each lock) with the acquire() method. Acquiring a lock has the effect of “locking the lock.” Once it is locked, we add the lock to the lock list, locks. The next loop actually spawns the threads, invoking the loop() function per thread, and for each thread, provides it with the loop number, the sleep duration, and the acquired lock for that thread. So why didn’t we start the threads in the lock acquisition loop? There are two reasons. First, we wanted to synchronize the threads, so that all the horses started out the gate around the same time, and second, locks take a little bit of time to be acquired. If your thread executes too fast, it is possible that it completes before the lock has a chance to be acquired.
It is up to each thread to unlock its lock object when it has completed execution. The final loop just sits and spins (pausing the main thread) until both locks have been released before continuing execution. Because we are checking each lock sequentially, we might be at the mercy of all the slower loops if they are more toward the beginning of the set of loops. In such cases, the majority of the wait time may be for the first loop(s). When that lock is released, remaining locks may have already been unlocked (meaning that corresponding threads have completed execution). The result is that the main thread will fly through those lock checks without pause. Finally, you should be well aware that the final pair of lines will execute main() only if we are invoking this script directly.
As hinted in the earlier Core Note, we presented the thread module only to introduce the reader to threaded programming. Your MT application should use higher-level modules such as the threading module, which we discuss in the next section.