Effective Python Items 15 & 23: How and Why to Use Closures
Item 15: Know How Closures Interact with Variable Scope
Say you want to sort a list of numbers but prioritize one group of numbers to come first. This pattern is useful when you’re rendering a user interface and want important messages or exceptional events to be displayed before everything else.
A common way to do this is to pass a helper function as the key argument to a list’s sort method. The helper’s return value will be used as the value for sorting each item in the list. The helper can check whether the given item is in the important group and can vary the sort key accordingly.
def sort_priority(values, group):
def helper(x):
if x in group:
return (0, x)
return (1, x)
values.sort(key=helper)
This function works for simple inputs.
numbers = [8, 3, 1, 2, 5, 4, 7, 6]
group = {2, 3, 5, 7}
sort_priority(numbers, group)
print(numbers)
>>>
[2, 3, 5, 7, 1, 4, 6, 8]
There are three reasons why this function operates as expected:
- Python supports closures: functions that refer to variables from the scope in which they were defined. This is why the helper function is able to access the group argument to sort_priority.
- Functions are first-class objects in Python, meaning you can refer to them directly, assign them to variables, pass them as arguments to other functions, compare them in expressions and if statements, etc. This is how the sort method can accept a closure function as the key argument.
- Python has specific rules for comparing tuples. It first compares items in index zero, then index one, then index two, and so on. This is why the return value from the helper closure causes the sort order to have two distinct groups.
It’d be nice if this function returned whether higher-priority items were seen at all so the user interface code can act accordingly. Adding such behavior seems straightforward. There’s already a closure function for deciding which group each number is in. Why not also use the closure to flip a flag when high-priority items are seen? Then the function can return the flag value after it’s been modified by the closure.
Here, I try to do that in a seemingly obvious way:
def sort_priority2(numbers, group):
found = False
def helper(x):
if x in group:
found = True # Seems simple
return (0, x)
return (1, x)
numbers.sort(key=helper)
return found
I can run the function on the same inputs as before.
found = sort_priority2(numbers, group)
print('Found:', found)
print(numbers)
>>>
Found: False
[2, 3, 5, 7, 1, 4, 6, 8]
The sorted results are correct, but the found result is wrong. Items from group were definitely found in numbers, but the function returned False. How could this happen?
When you reference a variable in an expression, the Python interpreter will traverse the scope to resolve the reference in this order:
- The current function’s scope
- Any enclosing scopes (like other containing functions)
- The scope of the module that contains the code (also called the global scope)
- The built-in scope (that contains functions like len and str)
If none of these places have a defined variable with the referenced name, then a NameError exception is raised.
Assigning a value to a variable works differently. If the variable is already defined in the current scope, then it will just take on the new value. If the variable doesn’t exist in the current scope, then Python treats the assignment as a variable definition. The scope of the newly defined variable is the function that contains the assignment.
This assignment behavior explains the wrong return value of the sort_priority2 function. The found variable is assigned to True in the helper closure. The closure’s assignment is treated as a new variable definition within helper, not as an assignment within sort_priority2.
def sort_priority2(numbers, group):
found = False # Scope: 'sort_priority2'
def helper(x):
if x in group:
found = True # Scope: 'helper' -- Bad!
return (0, x)
return (1, x)
numbers.sort(key=helper)
return found
Encountering this problem is sometimes called the scoping bug because it can be so surprising to newbies. But this is the intended result. This behavior prevents local variables in a function from polluting the containing module. Otherwise, every assignment within a function would put garbage into the global module scope. Not only would that be noise, but the interplay of the resulting global variables could cause obscure bugs.
Getting Data Out
In Python 3, there is special syntax for getting data out of a closure. The nonlocal statement is used to indicate that scope traversal should happen upon assignment for a specific variable name. The only limit is that nonlocal won’t traverse up to the module-level scope (to avoid polluting globals).
Here, I define the same function again using nonlocal:
def sort_priority3(numbers, group):
found = False
def helper(x):
nonlocal found
if x in group:
found = True
return (0, x)
return (1, x)
numbers.sort(key=helper)
return found
The nonlocal statement makes it clear when data is being assigned out of a closure into another scope. It’s complementary to the global statement, which indicates that a variable’s assignment should go directly into the module scope.
However, much like the anti-pattern of global variables, I’d caution against using nonlocal for anything beyond simple functions. The side effects of nonlocal can be hard to follow. It’s especially hard to understand in long functions where the nonlocal statements and assignments to associated variables are far apart.
When your usage of nonlocal starts getting complicated, it’s better to wrap your state in a helper class. Here, I define a class that achieves the same result as the nonlocal approach. It’s a little longer, but is much easier to read (see Item 23: “Accept Functions for Simple Interfaces Instead of Classes” for details on the __call__ special method).
class Sorter(object):
def __init__(self, group):
self.group = group
self.found = False
def __call__(self, x):
if x in self.group:
self.found = True
return (0, x)
return (1, x)
sorter = Sorter(group)
numbers.sort(key=sorter)
assert sorter.found is True
Scope in Python 2
Unfortunately, Python 2 doesn’t support the nonlocal keyword. In order to get similar behavior, you need to use a work-around that takes advantage of Python’s scoping rules. This approach isn’t pretty, but it’s the common Python idiom.
# Python 2
def sort_priority(numbers, group):
found = [False]
def helper(x):
if x in group:
found[0] = True
return (0, x)
return (1, x)
numbers.sort(key=helper)
return found[0]
As explained above, Python will traverse up the scope where the found variable is referenced to resolve its current value. The trick is that the value for found is a list, which is mutable. This means that once retrieved, the closure can modify the state of found to send data out of the inner scope (with found[0] = True).
This approach also works when the variable used to traverse the scope is a dictionary, a set, or an instance of a class you’ve defined.
Things to Remember
- Closure functions can refer to variables from any of the scopes in which they were defined.
- By default, closures can’t affect enclosing scopes by assigning variables.
- In Python 3, use the nonlocal statement to indicate when a closure can modify a variable in its enclosing scopes.
- In Python 2, use a mutable value (like a single-item list) to work around the lack of the nonlocal statement.
- Avoid using nonlocal statements for anything beyond simple functions.
Item 23: Accept Functions for Simple Interfaces Instead of Classes
Many of Python’s built-in APIs allow you to customize behavior by passing in a function. These hooks are used by APIs to call back your code while they execute. For example, the list type’s sort method takes an optional key argument that’s used to determine each index’s value for sorting. Here, I sort a list of names based on their lengths by providing a lambda expression as the key hook:
names = ['Socrates', 'Archimedes', 'Plato', 'Aristotle']
names.sort(key=lambda x: len(x))
print(names)
>>>
['Plato', 'Socrates', 'Aristotle', 'Archimedes']
In other languages, you might expect hooks to be defined by an abstract class. In Python, many hooks are just stateless functions with well-defined arguments and return values. Functions are ideal for hooks because they are easier to describe and simpler to define than classes. Functions work as hooks because Python has first-class functions: Functions and methods can be passed around and referenced like any other value in the language.
For example, say you want to customize the behavior of the defaultdict class (see Item 46: “Use Built-in Algorithms and Data Structures” for details). This data structure allows you to supply a function that will be called each time a missing key is accessed. The function must return the default value the missing key should have in the dictionary. Here, I define a hook that logs each time a key is missing and returns 0 for the default value:
def log_missing():
print('Key added')
return 0
Given an initial dictionary and a set of desired increments, I can cause the log_missing function to run and print twice (for 'red' and 'orange').
current = {'green': 12, 'blue': 3}
increments = [
('red', 5),
('blue', 17),
('orange', 9),
]
result = defaultdict(log_missing, current)
print('Before:', dict(result))
for key, amount in increments:
result[key] += amount
print('After: ', dict(result))
>>>
Before: {'green': 12, 'blue': 3}
Key added
Key added
After: {'orange': 9, 'green': 12, 'blue': 20, 'red': 5}
Supplying functions like log_missing makes APIs easy to build and test because it separates side effects from deterministic behavior. For example, say you now want the default value hook passed to defaultdict to count the total number of keys that were missing. One way to achieve this is using a stateful closure (see Item 15: “Know How Closures Interact with Variable Scope” for details). Here, I define a helper function that uses such a closure as the default value hook:
def increment_with_report(current, increments):
added_count = 0
def missing():
nonlocal added_count # Stateful closure
added_count += 1
return 0
result = defaultdict(missing, current)
for key, amount in increments:
result[key] += amount
return result, added_count
Running this function produces the expected result (2), even though the defaultdict has no idea that the missing hook maintains state. This is another benefit of accepting simple functions for interfaces. It’s easy to add functionality later by hiding state in a closure.
result, count = increment_with_report(current, increments)
assert count == 2
The problem with defining a closure for stateful hooks is that it’s harder to read than the stateless function example. Another approach is to define a small class that encapsulates the state you want to track.
class CountMissing(object):
def __init__(self):
self.added = 0
def missing(self):
self.added += 1
return 0
In other languages, you might expect that now defaultdict would have to be modified to accommodate the interface of CountMissing. But in Python, thanks to first-class functions, you can reference the CountMissing.missing method directly on an object and pass it to defaultdict as the default value hook. It’s trivial to have a method satisfy a function interface.
counter = CountMissing()
result = defaultdict(counter.missing, current) # Method ref
for key, amount in increments:
result[key] += amount
assert counter.added == 2
Using a helper class like this to provide the behavior of a stateful closure is clearer than the increment_with_report function above. However, in isolation it’s still not immediately obvious what the purpose of the CountMissing class is. Who constructs a CountMissing object? Who calls the missing method? Will the class need other public methods to be added in the future? Until you see its usage with defaultdict, the class is a mystery.
To clarify this situation, Python allows classes to define the __call__ special method. __call__ allows an object to be called just like a function. It also causes the callable built-in function to return True for such an instance.
class BetterCountMissing(object):
def __init__(self):
self.added = 0
def __call__(self):
self.added += 1
return 0
counter = BetterCountMissing()
counter()
assert callable(counter)
Here, I use a BetterCountMissing instance as the default value hook for a defaultdict to track the number of missing keys that were added:
counter = BetterCountMissing()
result = defaultdict(counter, current) # Relies on __call__
for key, amount in increments:
result[key] += amount
assert counter.added == 2
This is much clearer than the CountMissing.missing example. The __call__ method indicates that a class’s instances will be used somewhere a function argument would also be suitable (like API hooks). It directs new readers of the code to the entry point that’s responsible for the class’s primary behavior. It provides a strong hint that the goal of the class is to act as a stateful closure.
Best of all, defaultdict still has no view into what’s going on when you use __call__. All that defaultdict requires is a function for the default value hook. Python provides many different ways to satisfy a simple function interface depending on what you need to accomplish.
Things to Remember
- Instead of defining and instantiating classes, functions are often all you need for simple interfaces between components in Python.
- References to functions and methods in Python are first class, meaning they can be used in expressions like any other type.
- The __call__ special method enables instances of a class to be called like plain Python functions.
- When you need a function to maintain state, consider defining a class that provides the __call__ method instead of defining a stateful closure (see Item 15: “Know How Closures Interact with Variable Scope”).