Interfaces
Readers of collection-based code are looking for answers to different questions when they look at the interfaces you have declared for your variables and the implementations you chose for those variables. The interface declaration tells the reader about the collection: whether the collection is in a particular order, whether there are duplicate elements, and whether there is any way to look up elements by key or only through iteration.
The interfaces described below are:
- Array—Arrays are the simplest and least flexible collection: fixed size, simple accessing syntax, and fast.
- Iterable—The basic collection interface, allowing a collection to be used for iteration but nothing else.
- Collection—Offers adding, removing, and testing for elements.
- List—A collection whose elements are ordered and can be accessed by their location in the collection (i.e., “give me the third element”).
- Set—A collection with no duplicates.
- SortedSet—An ordered collection with no duplicates.
- Map—A collection whose elements are stored and retrieved by key.
Array
Arrays are the simplest interface for collections. Unfortunately, they don’t have the same protocol as other collections, so it’s harder to change from an array to a collection than from one kind of collection to another. Unlike most collections, the size of an array is fixed when it is created. Arrays are also different as they are built into the language, not provided by a library.
Arrays are more efficient in time and space than other collections for simple operations. The timing tests I did to accompany writing this suggest that array access (i.e. elements[i]) is more than ten times faster than the equivalent ArrayList operation (elements.get(i)). (As these numbers vary substantially in different operating environments, if you care about the performance difference you should time the operations yourself.) The flexibility of the other collection classes makes them more valuable in most cases, but arrays are a handy trick to be able to pull out when you need more performance in a small part of an application.
Iterable
Declaring a variable Iterable only says that it contains multiple values. Iterable is the basis for the loop construct in Java 5. Any object declared as Iterable can be used in a for loop. This is implemented by quietly calling the method iterator().
One of the issues to be communicated when using collections is whether clients are expected to modify them. Unfortunately, Iterable and its helper, Iterator, provide no way to state declaratively that a collection shouldn’t be modified. Once you have an Iterator, you can invoke its remove() method, which deletes an element from the underlying Iterable. While your Iterables are safe from having elements added, they can have elements removed without the object that owns the collection being notified.
As described in “Collection Accessor Method” on page 91, there are a few ways to ensure that a collection is not modified: wrapping it in a unmodifiable collection, creating a custom iterator that throws an exception when a client tries to modify the collection, or returning a safe copy.
Iterable is simple. It doesn’t even allow you to measure the size of instances; all you can do is iterate over the elements. Sub-interfaces of Iterable provide more useful behavior.
Collection
Collection inherits from Iterable, but it adds methods to add, remove, search for and count elements. Declaring a variable or method as a Collection leaves many options for an implementation class. By leaving the declaration as vaguely specified as possible, you retain the freedom to change implementation classes later without having the change ripple through the code.
Collections are a bit like the mathematical notion of sets, except that the operations performing the equivalent of union, intersection, and difference (addAll(), retainAll(), and removeAll()) modify the receiver instead of returning newly allocated collections.
List
To Collection, List adds the idea that elements are in a stable order. An element can be retrieved by providing its index to the collection. A stable sequence is important when the elements of a collection interact with each other. For example, a queue of messages that should be processed in their arrival order should be stored in a list.
Set
A Set is a collection that contains no duplicates (elements that would report that they are equal() to each other). This corresponds closely to the mathematical notion of set, although the metaphor is thin because adding an element to a Set modifies the collection rather than returning a new collection including the added element.
A Set discards information that most collections keep—the number of times an element appears. This is not a problem in cases where the presence or absence of an element is interesting but the number of times the element appears is not. For example, if I want to know who all the authors of books are in a library, I don’t care how many books each author wrote. I just want to know who they are. A Set is an appropriate way to implement such a query.
The elements in a Set are in no particular order. Just because you iterate through them in a certain order once does not mean that the elements will appear in the same order the next time. This lack of predictable order is not a limitation in cases where the elements don’t interact with each other.
Sometimes you want to store duplicates in a collection but remove them for a particular operation. Create a temporary Set and pass it to the operation:
printAuthors(new HashSet<Author>(getAuthors());
SortedSet
The ordering and uniqueness attributes of collections are not mutually exclusive. At times you’d like to keep a collection in order but eliminate duplicates. SortedSet stores ordered-but-unique elements.
Unlike the ordering of a List, which is related to the order in which elements were added or by explicit indexes passed to add(int, Object), the ordering in a SortedSet is provided by a Comparator. In the absence of an explicit order, the “natural order” of the elements is used. For example, strings are sorted in lexicographical order.
To compute the authors contributing to a library, you could use a SortedSet:
public Collection<String> getAlphabeticalAuthors() { SortedSet<String> results= new TreeSet<String>(); for (Book each: getBooks()) results.add(each.getAuthor()); return results; }
This example uses the default sorting of strings. If a Book had its author represented by an object, the code above might look like this:
public Collection<String> getAlphabeticalAuthors() { Comparator<Author> sorter= new Comparator<Author>() { public int compare(Author o1, Author o2) { if (o1.getLastName().equals(o2.getLastName())) return o1.getFirstName().compareTo(o2.getFirstName()); return o1.getLastName().compareTo(o2.getLastName()); } }; SortedSet<String> results= new TreeSet<String>(sorter); for (Book each: getBooks()) results.add(each.getAuthor()); return results; }
Map
The final collection interface is Map, which is a hybrid of the other interfaces. A Map stores values by key, but unlike a List, the key can be any object and not just an integer. The keys of a Map must be unique, a bit like sets, although the values can contain duplicates. The elements of a Map are in no particular order, also like a Set.
Because Map is not completely like any of the other collection interfaces, it stands alone, not inheriting from any of them. Maps are two collections at the same time; a collection of keys connected to a collection of values. You can’t simply ask a Map for its iterator, because it is not clear whether you want an iterator over the keys, over the values, or over the pairs of keys-and-values.
Maps are useful for implementing two of the implementation patterns: extrinsic state and variable state. Extrinsic state suggests storing special-purpose data related to an object separately from the object itself. One way to implement extrinsic state is with a Map whose keys are the objects and whose values are the related data. In variable state, different instances of the same class store different data fields. To implement this, have the object hold a map which maps from strings (representing the names of the virtual fields) to the values.