4.8 Example: Set Implementation
Now let's consider a more elaborate example: A class AbstractSet (Figure 45) is a superclass of two other classes, ListSet (Figure 46) and DictSet (Figure 47). A set, in the mathematical sense, is a collection of elements (objects) without duplications. These classes may be considered a kind of sketch of how sets could be implemented. These three classes provide two implementations of sets as follows:
Figure 45 AbstractSet.
Figure 46 ListSet.
Figure 47 DictSet.
AbstractSet declares all the set operations, but it doesn't implement them all. It provides some common code, but leaves many operations up to the subclasses.
ListSet implements a set using a list to hold the elements.
DictSet implements a set using a dictionary to hold the ele_ments.
Why have two implementations? Lists and dictionaries may each be more efficient than the other for some set sizes and some uses, although in Chapter 17, we will settle on the dictionary implementation of sets and provide one that has a more complete collection of methods than these.
The operations provided by these sets are as follows:
s=ListSet(elems) or s=DictSet(elems)Creates a set initially containing the elements of the (optional) sequence elems.
s.insert(x)Adds element x to set s if it is not already present. Returns s.
s.contains(x)Returns true (1) if s contains x, false (0) otherwise.
s.delete(x)Removes element x from set s. Performs no operations if s does not contain x. Returns s.
s.members()Returns a list of all the elements of set s.
s.new()Returns a new empty set of the same type as s, e.g., a ListSet for a ListSet.
s.copy()Returns a copy of set s.
s.size()Returns the number of elements in set s.
s.insertAll(q)Inserts all the elements in sequence q into the set s. Returns s.
s.removeAny()Removes and returns an arbitrary element of set s. If s is empty, it returns None.
s.union(t)Returns a new set of the same type as s that contains all the elements contained in either s or t.
s.intersection(t)Returns a new set of the same type as s that contains all the elements contained in both s and t.
str(s)Returns a string representation of s, listing all the elements. This is the __str__() method; it tells str() how to do its job.
repr(s)This is the __repr__() method. For these sets, it is the same as str(s).
You can find all the methods in AbstractSet, but not all of them are implemented there. Those methods that contain raise NotImplementedError are actually implemented in the subclasses. In a language like Java, we would have to declare them "abstract," which would tell the compiler that they must be implemented in a subclass and that instances of AbstractSet cannot be created, because only instances of subclasses that have the code for the methods can be created.
Python doesn't have any special way to declare "abstract" methods, but this is the custom. You raise a NotImplementedError for the abstract method; if the method hasn't been overridden at run-time, you will find out about it.
What about removing a method from a class by implementing a subclass that overrides it with a method that raises NotImplementedError? You can do that, but it is considered an extremely bad programming practice. An instance of the subclass is supposed to have an is-a relationship to its superclass. That means that it can be used anywhere an instance of the superclass can be used, but if it lacks one of the methods of the superclass, then it cannot be used anywhere that method is needed.
The __init__() method for AbstractSet does nothing when it is calledthe pass statement performs no operation. Why is it present? It is there to honor the programming practice that a class instance ought to be given a chance to initialize itself. If at some future time we were to change AbstractSet so that it did need to perform some initialization, it is easier already to have the __init__() method and the subclasses already calling it.
Why have an AbstractSet? It is not essential in Python, although it would be in statically-typed object-oriented languages. It documents the operations that all sets must have. If you specify that an algorithm requires an AbstractSet, then that algorithm should use only the operations that AbstractSet provides. Since ListSet and DictSet are subclasses of AbstractSet, either of them can be provided to the algorithm and it will still work.
In object-oriented languages that use static typing, the AbstractSet class would be required to allow ListSet and DictSet objects to be used interchangeably. Variables and attributes would have to be declared with the AbstractSet class, and then objects of either subclass could be assigned to them. Python does not require this. Any object that has the required methods can be used. We could eliminate AbstractSet here if we were willing to duplicate the code for the insertAll(), removeAny(), union(), intersection(), and __str__() methods.
The reason that AbstractSet would be required in statically-typed languages, but not in Python, is that the compiler of a statically-typed language must know the value of every expression. You have to declare the types of variables and functions. The compiler would need these to check that you are performing only permissible operations and to figure out the data types of their results. So you would need the class AbstractSet in order to declare all the methods you could call for a set. This would allow you to declare a variable AbstractSet and assign either a ListSet or a DictSet to it and use them without knowing which one is there.
Python, however, doesn't know in general what kind of value a variable contains or whether an operation will work or not. All that's required is for Python to find the methods it's calling at run-time. So we didn't really need AbstractSet. If both ListSet and DictSet implement all the set operations, they can be used interchangeably.
However, ListSet and DictSet do not implement all the set operations. Some set operations, such as union and intersection, are implemented in AbstractSet. This demonstrates one of the most trivial uses for inheritance: code sharing.
The basis for the division of methods between those implemented in ListSet and DictSet on one hand and those implemented in AbstractSet on the other is this: ListSet and DictSet contain those methods that depend on the implementation of the set, on the kind of data structure it uses. AbstractSet implements those methods that are the same for all implementations.
A method can call other methods for the same object. If those methods are defined in different classes, two cases occur: up calls and down calls. If a method in a subclass calls a method in a superclass, it is called an up call (super to sub is interpreted as above to below). If a method in a superclass calls a method in a subclass, it is called a down call.
If you come from a non-object-oriented background, you may be saying, "I can see how an up call works. The subclass imports the superclass, so it knows the methods defined there. But how does the superclass know the names of methods defined in a subclass?" The question, however, assumes that the compiler must know what method is being called before the program runs. If you are calling a method on an object from outside the object's classes, you usually don't know what the actual class of the object will be. You just know it is supposed to have a method with a certain name, say M, that will do a certain kind of thing for you. At run-time, Python searches for method M in the class and superclasses of the object. It's exactly the same with the call self.M() within a method. Again, Python will take the actual class of the current object, self, and search that class and its superclasses for method M. Where will Python find M? Maybe in the same class the call is in, maybe in a superclass, maybe in a subclass. You don't know. You shouldn't have to care.
In each of ListSet and DictSet, there is an example of an up call. In the __init__() method there is a call of insertAll(), defined in AbstractSet, to initialize the set to the sequence of elements. It is in AbstractSet because it does not depend on the implementation of the set.
Method insertAll() contains a down call to insert(). Method insert() does depend on the representation of the set. At run-time this down call will either call the insert() in ListSet or the insert() in DictSet, depending on which type of set is present.
There are two other things to notice about the __init__() methods in ListSet and DictSet:
They call the __init__() method of AbstractSet, which is somewhat pointless, since it does nothing. This is considered a good programming practice. A class should be given the chance to initialize itself. Knowing that a class's initialization method does nothing is the sort of knowledge you shouldn't use. It shouldn't be part of the public definition of the class. It could be changed in some later release.
They initialize an attribute, rep, to the representation of a set. ListSet initializes it to an empty list. DictSet initializes it to an empty dictionary.
ListSet keeps the elements of the set in a list. It checks for the presence of an element with the in operator. It uses list's append() method to insert an element into the set and remove() to delete it. The members() method just returns a copy of the list. The new() method returns a new ListSet, while copy() returns a new ListSet with a copy of the current object's rep attribute.
DictSet keeps the elements as keys in a dictionary. To insert an element, the element is put into the dictionary with itself as its value. The value isn't actually important, only the key. It checks for the presence of an element by the dictionary's has_key() method. It deletes an element with a del statement. It gets a list of the members of the set using the dictionary's keys() method.
In both ListSet and DictSet, there are if statements to test for the presence of an element before removing it. These are necessary to avoid having Python raise an error if the element isn't present.2
AbstractSet has the code that can be common to all sets. The method insertAll() iterates over a sequence, inserting all the elements into the set. The call t.union(s) copies set t and then inserts all the elements of set s into it. The call t.intersection(s) uses new() to create a new set of the same class as t, and then inserts all the elements of t into it that are also in s.
Later in the book, we will look at object-oriented design patterns. There are two present here:
FactoriesThe new() method is a factory method. It manufactures a new set object. When it is called in AbstractSet, we don't know what kind of set it will create. Why do we have it? Because when we create an actual set, we must specify the actual class, but AbstractSet shouldn't have to know anything about the actual sets, only what is common to them. It is the subsets that know about, well, about themselves.
Template methodsThe methods union() and intersection() are being used as template methods. They have the basic algorithm, but they are missing the details. These details are filled in by methods like contains() and insert(), which are defined in subclasses. The idea of a template method is that the superclass contains the general algorithm, but omits some details that are filled in by methods in a subclass. Thus the same algorithm can be implemented in several versions, sharing much of the code between them.