- Python Shortcuts, Commands, and Packages
- 4.2 Twenty-Two Programming Shortcuts
- 4.3 Running Python from the Command Line
- 4.4 Writing and Using Doc Strings
- 4.5 Importing Packages
- 4.6 A Guided Tour of Python Packages
- 4.7 Functions as First-Class Objects
- 4.8 Variable-Length Argument Lists
- 4.9 Decorators and Function Profilers
- 4.10 Generators
- 4.11 Accessing Command-Line Arguments
- Chapter 4 Summary
- Chapter 4 Questions for Review
- Chapter 4 Suggested Problems
4.10 Generators
There’s no subject in Python about which more confusion abounds than generators. It’s not a difficult feature once you understand it. Explaining it’s the hard part.
But first, what does a generator do? The answer: It enables you to deal with a sequence one element at a time.
Suppose you need to deal with a sequence of elements that would take a long time to produce if you had to store it all in memory at the same time. For example, you want to examine all the Fibonacci numbers up to 10 to the 50th power. It would take a lot of time and space to calculate the entire sequence. Or you may want to deal with an infinite sequence, such as all even numbers.
The advantage of a generator is that it enables you to deal with one member of a sequence at a time. This creates a kind of “virtual sequence.”
4.10.1 What’s an Iterator?
One of the central concepts in Python is that of iterator (sometimes confused with iterable). An iterator is an object that produces a stream of values, one at a time.
All lists can be iterated, but not all iterators are lists. There are many functions, such as reversed, that produce iterators that are not lists. These cannot be indexed or printed in a useful way, at least not directly. Here’s an example:
>>> iter1 = reversed([1, 2, 3, 4]) >>> print(iter1) <list_reverseiterator object at 0x1111d7f28>
However, you can convert an iterator to a list and then print it, index it, or slice it:
>>> print(list(iter1)) [4, 3, 2, 1]
Iterators in Python work with for statements. For example, because iter1 is an iterator, the following lines of code work perfectly well.
>>> iter1 = reversed([1, 2, 3, 4]) >>> for i in iter1: print(i, end=' ') 4 3 2 1
Iterators have state information; after reaching the end of its series, an iterator is exhausted. If we used iter1 again without resetting it, it would produce no more values.
4.10.2 Introducing Generators
A generator is one of the easiest ways to produce an iterator. But the generator function is not itself an iterator. Here’s the basic procedure.
Write a generator function. You do this by using a yield statement anywhere in the definition.
Call the function you completed in step 1 to get an iterator object.
The iterator created in step 2 is what yields values in response to the next function. This object contains state information and can be reset as needed.
Figure 4.4 illustrates the process.
Figure 4.4. Returning a generator from a function
Here’s what almost everybody gets wrong when trying to explain this process: It looks as if the yield statement, placed in the generator function (the thing on the left in Figure 4.4), is doing the yielding. That’s “sort of” true, but it’s not really what’s going on.
The generator function defines the behavior of the iterator. But the iterator object, the thing to its right in Figure 4.4, is what actually executes this behavior.
When you include one or more yield statements in a function, the function is no longer an ordinary Python function; yield describes a behavior in which the function does not return a value but sends a value back to the caller of next. State information is saved, so when next is called again, the iterator advances to the next value in the series without starting over. This part, everyone seems to understand.
But—and this is where people get confused—it isn’t the generator function that performs these actions, even though that’s where the behavior is defined. Fortunately, you don’t need to understand it; you just need to use it. Let’s start with a function that prints even numbers from 2 to 10:
def print_evens(): for n in range(2, 11, 2): print(n)
Now replace print(n) with the statement yield n. Doing so changes the nature of what the function does. While we’re at it, let’s change the name to make_evens_gen to have a more accurate description.
def make_evens_gen(): for n in range(2, 11, 2): yield n
The first thing you might say is “This function no longer returns anything; instead, it yields the value n, suspending its execution and saving its internal state.”
But this revised function, make_evens_gen, does indeed have a return value! As shown in Figure 4.4, the value returned is not n; the return value is an iterator object, also called a “generator object.” Look what happens if you call make_evens_gen and examine the return value.
>>> make_evens_gen() <generator object make_evens_gen at 0x1068bd410>
What did the function do? Yield a value for n? No! Instead, it returned an iterator object, and that’s the object that yields a value. We can save the iterator object (or generator object) and then pass it to next.
>>> my_gen = make_evens_gen() >>> next(my_gen) 2 >>> next(my_gen) 4 >>> next(my_gen) 6
Eventually, calling next exhausts the series, and a StopIteration exception is raised. But what if you want to reset the sequence of values to the beginning? Easy. You can do that by calling make_evens_gen again, producing a new instance of the iterator. This has the effect of starting over.
>>> my_gen = make_evens_gen() # Start over >>> next(my_gen) 2 >>> next(my_gen) 4 >>> next(my_gen) 6 >>> my_gen = make_evens_gen() # Start over >>> next(my_gen) 2 >>> next(my_gen) 4 >>> next(my_gen) 6
What happens if you call make_evens_gen every time? In that case, you keep starting over, because each time you’re creating a new generator object. This is most certainly not what you want.
>>> next(make_evens_gen()) 2 >>> next(make_evens_gen()) 2 >>> next(make_evens_gen()) 2
Generators can be used in for statements, and that’s one of the most frequent uses. For example, we can call make_evens_gen as follows:
for i in make_evens_gen(): print(i, end=' ')
This block of code produces the result you’d expect:
2 4 6 8 10
But let’s take a look at what’s really happening. The for block calls make_evens_gen one time. The result of the call is to get a generator object. That object then provides the values in the for loop. The same effect is achieved by the following code, which breaks the function call onto an earlier line.
>>> my_gen = make_evens_gen() >>> for i in my_gen: print(i, end=' ')
Remember that my_gen is an iterator object. If you instead referred to make_evens_gen directly, Python would raise an exception.
for i in make_evens_gen: # ERROR! Not an iterable! print(i, end=' ')
Once you understand that the object returned by the generator function is the generator object, also called the iterator, you can call it anywhere an iterable or iterator is accepted in the syntax. For example, you can convert a generator object to a list, as follows.
>>> my_gen = make_evens_gen() >>> a_list = list(my_gen) >>> a_list [2, 4, 6, 8, 10] >>> a_list = list(my_gen) # Oops! No reset! >>> a_list []
The problem with the last few statements in this example is that each time you iterate through a sequence using a generator object, the iteration is exhausted and needs to be reset.
>>> my_gen = make_evens_gen() # Reset! >>> a_list = list(my_gen) >>> a_list [2, 4, 6, 8, 10]
You can of course combine the function call and the list conversion. The list itself is stable and (unlike a generator object) will retain its values.
>>> a_list = list(make_evens_gen()) >>> a_list [2, 4, 6, 8, 10]
One of the most practical uses of an iterator is with the in and not in keywords. We can, for example, generate an iterator that produces Fibonacci numbers up to and including N, but not larger than N.
def make_fibo_gen(n): a, b = 1, 1 while a <= n: yield a a, b = a + b, a
The yield statement changes this function from an ordinary function to a generator function, so it returns a generator object (iterator). We can now determine whether a number is a Fibonacci by using the following test:
n = int(input('Enter number: ')) if n in make_fibo_gen(n): print('number is a Fibonacci. ') else: print('number is not a Fibonacci. ')
This example works because the iterator produced does not yield an infinite sequence, something that would cause a problem. Instead, the iterator terminates if n is reached without being confirmed as a Fibonacci.
Remember—and we state this one last time—by putting yield into the function make_fibo_gen, it becomes a generator function and it returns the generator object we need. The previous example could have been written as follows, so that the function call is made in a separate statement. The effect is the same.
n = int(input('Enter number: ')) my_fibo_gen = make_fibo_gen(n) if n in my_fibo_gen: print('number is a Fibonacci. ') else: print('number is not a Fibonacci. ')
As always, remember that a generator function (which contains the yield statement) is not a generator object at all, but rather a generator factory. This is confusing, but you just have to get used to it. In any case, Figure 4.4 shows what’s really going on, and you should refer to it often.