Building a Bag Class in Swift: Explorations into Generics and the Swift Generator Underbelly
I recently decided to build a Bag class in Swift. I thought it would be an exciting way to learn to create a custom generator. Generators are the Swift feature that support collection enumeration, letting you iterate across the members of a collection. It was also an opportunity to pick up some practical experience using generics, the feature that enables collections to store a broad range of types. In my journey, I learned how generics and generators work hand-in-hand to create the enumeration I was aiming for and picked up some valuable experience along the way.
Building Bags
If you’re not familiar with them, bags are counted sets. Sets are collections of distinct objects. Their counted variant enables you to store the quantity of each item along with the items themselves. In Cocoa and Cocoa Touch NSCountedSet and CFBag provide the Foundation and Core Foundation versions of Bags. This is a data type I often used when developing in Smalltalk and one that seemed a good match to implementing in Swift. While Swift offers native arrays and dictionaries, sets and counted sets are not built-in collection types.
A bag might hold integers, strings, or, say, socks. If you add a sock twice, you have two socks. Add five more socks, and you have seven. Each add and remove operation keeps track of that count. When you remove the last of an item from the bag, it's gone. Removed items won't participate in enumeration or appear in a list of members. Like sets, bags are unordered. You can iterate through them to visit each member in an arbitrary order.
Because dictionaries can store a value for each key, they make a fine internal representation for bags. The key acts as the set member. The value stores the current count. In the following Bag class definition, the _storage dictionary acts as a backing store. To make clear that the internal storage isn't meant for outside consumption, the _storage variable uses an underscore prefix. Swift does not support private variables or offer any access control mechanisms; however, Apple does promise this feature in future updates.
class Bag<T: Hashable> { var _storage = Dictionary<T, Int>() }
Generics
You might be wondering about those T’s that appear in the declaration between angle brackets. They refer to generics. Generics permit you to work with elements whose type you don’t know in advance. A generic dictionary typed Dictionary<T, Int> says, "give me a dictionary whose key-value pairs always map to an integer but whose key type is not yet defined."
The compiler infers that type when the class is used. When you build an actual bag, it can contain numbers, strings, structures, or any other element. But when you design the bag, you won’t know how it will end up being used. That’s where generics come in. You specify the behavior but not the type.
In this minimal definition, a generic type is mentioned twice – first in the class declaration and then in the variable declaration. The letter “T” is conventional. It means “some type,” but you are not tied to using T. You might substitute KeyType, BagElementType, or DancingHippopotami as the generic type token if you want to. (Although I sadly discourage the last of these.) These full names have the advantage of clarity and self-documentation. However, names are not as instantly identifiable as the single-letter convention. Single letters pop from the code.
class Bag<KeyType: Hashable> { var _storage = Dictionary<KeyType, Int>() }
Whatever generic token you use for this implementation, its type must conform to the Hashable protocol. This protocol enables those keys to be hashed to a fixed value and used with the _storage dictionary. This is the same hashing requirement you’d encounter with NSDictionary in Objective C.
Resolving Generics
When using generics, tokens are resolved when the bag is created. In the following code, test1 stores a bag of Ints and test2 stores one of Strings. Specifying the type on creation builds a type-safe bag ready to hold only those objects.
var test1 = Bag<Int>() var test2 = Bag<String>()
The class-<type> definition used here can be clunky. If you expand the class to include a variadic initializer, the bag can infer a type from the arguments you pass. This avoids the angle brackets and type specification and leverages Swift’s strong type inference features.
class Bag<T: Hashable > { var _storage = Dictionary<T, Int>() init(_ items:T ...){} } var test3 = Bag(3, 5, 7, 5, 2)
Building the Bag
The code that follows fills out a trivial Bag implementation. It allows you to add items and remove them, and by conforming to the Printable protocol, its description property produces a representation suitable for inspection.
class Bag<T: Hashable> : Printable { var _storage = Dictionary<T, Int>() init(_ items : T ...) { for item in items { self.addItem(item) } } func addItem(item : T) { if let count = _storage[item] { _storage[item] = count + 1 } else { _storage[item] = 1 } } func removeItem(item: T) { if let count = _storage[item] { if (count > 1) { _storage[item] = count - 1 } else { _storage.removeValueForKey(item) } } } var description : String { return _storage.description } }
Each method remains agnostic as to the type of item it stores, using the generic T type indicator in place of actual class names. So long as the type can be hashed and referenced with a dictionary, the bag can store any kind of element.
Although this implementation provides the base functionality required by a bag, it does not yet support the enumeration mentioned earlier in this write-up, an important element for custom collections. Adding enumeration builds on the generic implementation and produces a stream of objects from the collection that stores them.
Adding Enumeration
To take advantage of enumeration, classes conform to the Sequence protocol. You may have encountered this protocol in the WWDC Intermediate Swift video. A sequence can be iterated through its entirety. Enable the protocol by declaring a generator type alias and implement a generate() function, as in the following extension.
extension Bag : Sequence { typealias GeneratorType = BagGenerator<T> func generate() -> BagGenerator<T>{ return BagGenerator<T>(_storage) } }
In the preceding snippet, enumeration features rely on a BagGenerator. This is a new class or structured built around the generic key type, which you can see from the <T> that follows each mention. You build generators to support this enumeration.
A simple generator requires just a few lines of code: a declaration of the kind of Element you're producing at each iteration and a next() function that produces each element. The element can be a single return type or a tuple. One of Swift’s really cool features is that it permits you to retrieve not just a single index but all the key elements involved in collection iteration. A standard array enumerates to a sequence of items. A dictionary enumerates to a key/value tuple. Bags should enumerate to keys and their counts.
Here is an example of a bag generator implementation. With the class Sequence extension and this generator, bags are ready for enumeration.
struct BagGenerator<T:Hashable> : Generator{ var _backingGenerator : DictionaryGenerator<T, Int> init(_ backingDictionary : Dictionary<T, Int>) { _backingGenerator = backingDictionary.generate() } typealias Element = (T, Int) mutating func next() -> (T, Int)? { return _backingGenerator.next() } }
Enumeration enables you to iterate through your bag to print out a list of its contents. The following enumeration retrieves a sequence of key-count pairs and prints them to the console.
for (key, count) in test3 { println("\(key): \(count)") }
Extending Enumeration
The BagGenerator implementation you just saw leveraged a one-to-one correspondence between the bag’s enumeration and that of the underlying dictionary. That’s not always possible or desirable. The following implementation tweaks the next() function to explicitly create a tuple instead of relying on redirection. There’s not much different in this example, but it exposes how to structure a custom return type.
struct BagGenerator<T:Hashable> : Generator{ var _backingGenerator : DictionaryGenerator<T, Int> init(_ backingDictionary : Dictionary<T, Int>) { _backingGenerator = backingDictionary.generate() } typealias Element = (T, Int) mutating func next() -> (T, Int)? { if let (key, value) = _backingGenerator.next(){ return (key, value) } return nil } }
See how far this might get pushed for custom enumeration by expanding this yet again. The following example returns a 3-tuple instead of a pair. This tuple consists of the key, its count, and the internal index. Although this index breaks the contract of an unordered set, it provides a good example of how to enumerate additional elements to create whatever return type you need for your applications.
struct BagGenerator<T:Hashable> : Generator{ var _backingGenerator : DictionaryGenerator<T, Int> var index = 0 init(_ backingDictionary : Dictionary<T, Int>) { _backingGenerator = backingDictionary.generate() } typealias Element = (T, Int, Int) mutating func next() -> (T, Int, Int)? { if let (key, value) = _backingGenerator.next(){ return (key, value, index++) } else { return nil } }
Wrap-Up
A Bag that can store any type provides a reusability advantage over one that is limited to a single class or structure. Generics enable your code to cater to many use cases for the most powerful coding solutions. By adding enumeration, your custom collections can participate as a flexible coding solution in Swift development.