Python Classes and Interfaces
- Item 37: Compose Classes Instead of Nesting Many Levels of Built-in Types
- Item 38: Accept Functions Instead of Classes for Simple Interfaces
- Item 39: Use @classmethod Polymorphism to Construct Objects Generically
- Item 40: Initialize Parent Classes with super
- Item 41: Consider Composing Functionality with Mix-in Classes
- Item 42: Prefer Public Attributes Over Private Ones
- Item 43: Inherit from collections.abc for Custom Container Types
Python is an object-oriented language. Getting things done in Python often requires writing new classes and defining how they interact through their interfaces and hierarchies. Learn how to use classes to express your intended behaviors with objects.
Save 35% off the list price* of the related book or multi-format eBook (EPUB + MOBI + PDF) with discount code ARTICLE.
* See informit.com/terms
As an object-oriented programming language, Python supports a full range of features, such as inheritance, polymorphism, and encapsulation. Getting things done in Python often requires writing new classes and defining how they interact through their interfaces and hierarchies.
Python’s classes and inheritance make it easy to express a program’s intended behaviors with objects. They allow you to improve and expand functionality over time. They provide flexibility in an environment of changing requirements. Knowing how to use them well enables you to write maintainable code.
Item 37: Compose Classes Instead of Nesting Many Levels of Built-in Types
Python’s built-in dictionary type is wonderful for maintaining dynamic internal state over the lifetime of an object. By dynamic, I mean situations in which you need to do bookkeeping for an unexpected set of identifiers. For example, say that I want to record the grades of a set of students whose names aren’t known in advance. I can define a class to store the names in a dictionary instead of using a predefined attribute for each student:
class SimpleGradebook: def __init__(self): self._grades = {} def add_student(self, name): self._grades[name] = [] def report_grade(self, name, score): self._grades[name].append(score) def average_grade(self, name): grades = self._grades[name] return sum(grades) / len(grades)
Using the class is simple:
book = SimpleGradebook() book.add_student('Isaac Newton') book.report_grade('Isaac Newton', 90) book.report_grade('Isaac Newton', 95) book.report_grade('Isaac Newton', 85) print(book.average_grade('Isaac Newton')) >>> 90.0
Dictionaries and their related built-in types are so easy to use that there’s a danger of overextending them to write brittle code. For example, say that I want to extend the SimpleGradebook class to keep a list of grades by subject, not just overall. I can do this by changing the _grades dictionary to map student names (its keys) to yet another dictionary (its values). The innermost dictionary will map subjects (its keys) to a list of grades (its values). Here, I do this by using a defaultdict instance for the inner dictionary to handle missing subjects (see Item 17: “Prefer defaultdict Over setdefault to Handle Missing Items in Internal State” for background):
from collections import defaultdict class BySubjectGradebook: def __init__(self): self._grades = {} # Outer dict def add_student(self, name): self._grades[name] = defaultdict(list) # Inner dict
This seems straightforward enough. The report_grade and average_grade methods gain quite a bit of complexity to deal with the multilevel dictionary, but it’s seemingly manageable:
def report_grade(self, name, subject, grade): by_subject = self._grades[name] grade_list = by_subject[subject] grade_list.append(grade) def average_grade(self, name): by_subject = self._grades[name] total, count = 0, 0 for grades in by_subject.values(): total += sum(grades) count += len(grades) return total / count
Using the class remains simple:
book = BySubjectGradebook() book.add_student('Albert Einstein') book.report_grade('Albert Einstein', 'Math', 75) book.report_grade('Albert Einstein', 'Math', 65) book.report_grade('Albert Einstein', 'Gym', 90) book.report_grade('Albert Einstein', 'Gym', 95) print(book.average_grade('Albert Einstein')) >>> 81.25
Now, imagine that the requirements change again. I also want to track the weight of each score toward the overall grade in the class so that midterm and final exams are more important than pop quizzes. One way to implement this feature is to change the innermost dictionary; instead of mapping subjects (its keys) to a list of grades (its values), I can use the tuple of (score, weight) in the values list:
class WeightedGradebook: def __init__(self): self._grades = {} def add_student(self, name): self._grades[name] = defaultdict(list) def report_grade(self, name, subject, score, weight): by_subject = self._grades[name] grade_list = by_subject[subject] grade_list.append((score, weight))
Although the changes to report_grade seem simple—just make the grade list store tuple instances—the average_grade method now has a loop within a loop and is difficult to read:
def average_grade(self, name): by_subject = self._grades[name] score_sum, score_count = 0, 0 for subject, scores in by_subject.items(): subject_avg, total_weight = 0, 0 for score, weight in scores: subject_avg += score * weight total_weight += weight score_sum += subject_avg / total_weight score_count += 1 return score_sum / score_count
Using the class has also gotten more difficult. It’s unclear what all of the numbers in the positional arguments mean:
book = WeightedGradebook() book.add_student('Albert Einstein') book.report_grade('Albert Einstein', 'Math', 75, 0.05) book.report_grade('Albert Einstein', 'Math', 65, 0.15) book.report_grade('Albert Einstein', 'Math', 70, 0.80) book.report_grade('Albert Einstein', 'Gym', 100, 0.40) book.report_grade('Albert Einstein', 'Gym', 85, 0.60) print(book.average_grade('Albert Einstein')) >>> 80.25
When you see complexity like this, it’s time to make the leap from built-in types like dictionaries, tuples, sets, and lists to a hierarchy of classes.
In the grades example, at first I didn’t know I’d need to support weighted grades, so the complexity of creating classes seemed unwarranted. Python’s built-in dictionary and tuple types made it easy to keep going, adding layer after layer to the internal bookkeeping. But you should avoid doing this for more than one level of nesting; using dictionaries that contain dictionaries makes your code hard to read by other programmers and sets you up for a maintenance nightmare.
As soon as you realize that your bookkeeping is getting complicated, break it all out into classes. You can then provide well-defined interfaces that better encapsulate your data. This approach also enables you to create a layer of abstraction between your interfaces and your concrete implementations.
Refactoring to Classes
There are many approaches to refactoring (see Item 89: “Consider warnings to Refactor and Migrate Usage” for another). In this case, I can start moving to classes at the bottom of the dependency tree: a single grade. A class seems too heavyweight for such simple information. A tuple, though, seems appropriate because grades are immutable. Here, I use the tuple of (score, weight) to track grades in a list:
grades = [] grades.append((95, 0.45)) grades.append((85, 0.55)) total = sum(score * weight for score, weight in grades) total_weight = sum(weight for _, weight in grades) average_grade = total / total_weight
I used _ (the underscore variable name, a Python convention for unused variables) to capture the first entry in each grade’s tuple and ignore it when calculating the total_weight.
The problem with this code is that tuple instances are positional. For example, if I want to associate more information with a grade, such as a set of notes from the teacher, I need to rewrite every usage of the two-tuple to be aware that there are now three items present instead of two, which means I need to use _ further to ignore certain indexes:
grades = [] grades.append((95, 0.45, 'Great job')) grades.append((85, 0.55, 'Better next time')) total = sum(score * weight for score, weight, _ in grades) total_weight = sum(weight for _, weight, _ in grades) average_grade = total / total_weight
This pattern of extending tuples longer and longer is similar to deepening layers of dictionaries. As soon as you find yourself going longer than a two-tuple, it’s time to consider another approach.
The namedtuple type in the collections built-in module does exactly what I need in this case: It lets me easily define tiny, immutable data classes:
from collections import namedtuple Grade = namedtuple('Grade', ('score', 'weight'))
These classes can be constructed with positional or keyword arguments. The fields are accessible with named attributes. Having named attributes makes it easy to move from a namedtuple to a class later if the requirements change again and I need to, say, support mutability or behaviors in the simple data containers.
Next, I can write a class to represent a single subject that contains a set of grades:
class Subject: def __init__(self): self._grades = [] def report_grade(self, score, weight): self._grades.append(Grade(score, weight)) def average_grade(self): total, total_weight = 0, 0 for grade in self._grades: total += grade.score * grade.weight total_weight += grade.weight return total / total_weight
Then, I write a class to represent a set of subjects that are being studied by a single student:
class Student: def __init__(self): self._subjects = defaultdict(Subject) def get_subject(self, name): return self._subjects[name] def average_grade(self): total, count = 0, 0 for subject in self._subjects.values(): total += subject.average_grade() count += 1 return total / count
Finally, I’d write a container for all of the students, keyed dynamically by their names:
class Gradebook: def __init__(self): self._students = defaultdict(Student) def get_student(self, name): return self._students[name]
The line count of these classes is almost double the previous implementation’s size. But this code is much easier to read. The example driving the classes is also more clear and extensible:
book = Gradebook() albert = book.get_student('Albert Einstein') math = albert.get_subject('Math') math.report_grade(75, 0.05) math.report_grade(65, 0.15) math.report_grade(70, 0.80) gym = albert.get_subject('Gym') gym.report_grade(100, 0.40) gym.report_grade(85, 0.60) print(albert.average_grade()) >>> 80.25
It would also be possible to write backward-compatible methods to help migrate usage of the old API style to the new hierarchy of objects.
Things to Remember
Avoid making dictionaries with values that are dictionaries, long tuples, or complex nestings of other built-in types.
Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class.
Move your bookkeeping code to using multiple classes when your internal state dictionaries get complicated.