12.2 vector
The most useful standard-library container is vector. A vector is a sequence of elements of a given type. The elements are stored contiguously in memory. A typical implementation of vector (§5.2.2, §6.2) will consist of a handle holding pointers to the first element, one-past-the-last element, and one-past-the-last allocated space (§13.1) (or the equivalent information represented as a pointer plus offsets):
In addition, it holds an allocator (here, alloc), from which the vector can acquire memory for its elements. The default allocator uses new and delete to acquire and release memory (§12.7). Using a slightly advanced implementation technique, we can avoid storing any data for simple allocators in a vector object.
We can initialize a vector with a set of values of its element type:
vector<Entry> phone_book = { {"David Hume",123456}, {"Karl Popper",234567}, {"Bertrand Arthur William Russell",345678} };
Elements can be accessed through subscripting. So, assuming that we have defined << for Entry, we can write:
void print_book(const vector<Entry>& book) { for (int i = 0; i!=book.size(); ++i) cout << book[i] << '\n'; }
As usual, indexing starts at 0 so that book[0] holds the entry for David Hume. The vector member function size() gives the number of elements.
The elements of a vector constitute a range, so we can use a range-for loop (§1.7):
void print_book(const vector<Entry>& book) { for (const auto& x : book) // for "auto" see §1.4 cout << x << '\n'; }
When we define a vector, we give it an initial size (initial number of elements):
vector<int> v1 = {1, 2, 3, 4}; // size is 4 vector<string> v2; // size is 0 vector<Shape*> v3(23); // size is 23; initial element value: nullptr vector<double> v4(32,9.9); // size is 32; initial element value: 9.9
An explicit size is enclosed in ordinary parentheses, for example, (23), and by default, the elements are initialized to the element type’s default value (e.g., nullptr for pointers and 0 for numbers). If you don’t want the default value, you can specify one as a second argument (e.g., 9.9 for the 32 elements of v4).
The initial size can be changed. One of the most useful operations on a vector is push_back(), which adds a new element at the end of a vector, increasing its size by one. For example, assuming that we have defined >> for Entry, we can write:
void input() { for (Entry e; cin>>e; ) phone_book.push_back(e); }
This reads Entrys from the standard input into phone_book until either the end-of-input (e.g., the end of a file) is reached or the input operation encounters a format error.
The standard-library vector is implemented so that growing a vector by repeated push_back()s is efficient. To show how, consider an elaboration of the simple Vector from Chapter 5 and Chapter 7 using the representation indicated in the diagram above:
template<typename T> class Vector { allocator<T> alloc; // standard-library allocator of space for Ts T* elem; // pointer to first element T* space; // pointer to first unused (and uninitialized) slot T* last; // pointer to last slot public: // ... int size() const { return space-elem; } // number of elements int capacity() const { return last-elem; } // number of slots available for elements // ... void reserve(int newsz); // increase capacity() to newsz // ... void push_back(const T& t); // copy t into Vector void push_back(T&& t); // move t into Vector };
The standard-library vector has members capacity(), reserve(), and push_back(). The reserve() is used by users of vector and other vector members to make room for more elements. It may have to allocate new memory and when it does, it moves the elements to the new allocation. When reserve() moves elements to a new and larger allocation, any pointers to those elements will now point to the wrong location; they have become invalidated and should not be used.
Given capacity() and reserve(), implementing push_back() is trivial:
template<typename T> void Vector<T>::push_back(const T& t) { if (capacity()<=size()) // make sure we have space for t reserve(size()==0?8:2*size()); // double the capacity construct_at(space,t); // initialize *space to t ("place t at space") ++space; }
Now allocation and relocation of elements happens only infrequently. I used to use reserve() to try to improve performance, but that turned out to be a waste of effort: the heuristic used by vector is on average better than my guesses, so now I only explicitly use reserve() to avoid reallocation of elements when I want to use pointers to elements.
A vector can be copied in assignments and initializations. For example:
vector<Entry> book2 = phone_book;
Copying and moving vectors are implemented by constructors and assignment operators as described in §6.2. Assigning a vector involves copying its elements. Thus, after the initialization of book2, book2 and phone_book hold separate copies of every Entry in the phone book. When a vector holds many elements, such innocent-looking assignments and initializations can be expensive. Where copying is undesirable, references or pointers (§1.7) or move operations (§6.2.2) should be used.
The standard-library vector is very flexible and efficient. Use it as your default container; that is, use it unless you have a solid reason to use some other container. If you avoid vector because of vague concerns about “efficiency,” measure. Our intuition is most fallible in matters of the performance of container uses.
12.2.1 Elements
Like all standard-library containers, vector is a container of elements of some type T, that is, a vector<T>. Just about any type qualifies as an element type: built-in numeric types (such as char, int, and double), user-defined types (such as string, Entry, list<int>, and Matrix<double,2>), and pointers (such as const char*, Shape*, and double*). When you insert a new element, its value is copied into the container. For example, when you put an integer with the value 7 into a container, the resulting element really has the value 7. The element is not a reference or a pointer to some object containing 7. This makes for nice, compact containers with fast access. For people who care about memory sizes and run-time performance this is critical.
If you have a class hierarchy (§5.5) that relies on virtual functions to get polymorphic behavior, do not store objects directly in a container. Instead store a pointer (or a smart pointer; §15.2.1). For example:
vector<Shape> vs; // No, don't - there is no room for a Circle or a Smiley (§5.5) vector<Shape*> vps; // better, but see §5.5.3 (don't leak) vector<unique_ptr<Shape>> vups; // OK
12.2.2 Range Checking
The standard-library vector does not guarantee range checking. For example:
void silly(vector<Entry>& book) { int i = book[book.size()].number; // book.size() is out of range // ... }
That initialization is likely to place some random value in i rather than giving an error. This is undesirable, and out-of-range errors are a common problem. Consequently, I often use a simple range-checking adaptation of vector:
template<typename T> struct Vec : std::vector<T> { using vector<T>::vector; // use the constructors from vector (under the name Vec) T& operator[](int i) { return vector<T>::at(i); } // range check const T& operator[](int i) const { return vector<T>::at(i); } // range check const objects; §5.2.1 auto begin() { return Checked_iter<vector<T>>{*this}; } // see §13.1 auto end() { return Checked_iter<vector<T>>{*this, vector<T>::end()};} };
Vec inherits everything from vector except for the subscript operations that it redefines to do range checking. The at() operation is a vector subscript operation that throws an exception of type out_of_range if its argument is out of the vector’s range (§4.2).
For Vec, an out-of-range access will throw an exception that the user can catch. For example:
void checked(Vec<Entry>& book) { try { book[book.size()] = {"Joe",999999}; // will throw an exception // ... } catch (out_of_range&) { cerr << "range error\n"; } }
The exception will be thrown, and then caught (§4.2). If the user doesn’t catch an exception, the program will terminate in a well-defined manner rather than proceeding or failing in an undefined manner. One way to minimize surprises from uncaught exceptions is to use a main() with a try-block as its body. For example:
int main() try { // your code } catch (out_of_range&) { cerr << "range error\n"; } catch (...) { cerr << "unknown exception thrown\n"; }
This provides default exception handlers so that if we fail to catch some exception, an error message is printed on the standard error-diagnostic output stream cerr (§11.2).
Why doesn’t the standard guarantee range checking? Many performance-critical applications use vectors and checking all subscripting implies a cost on the order of 10%. Obviously, that cost can vary dramatically depending on hardware, optimizers, and an application’s use of subscripting. However, experience shows that such overhead can lead people to prefer the far more unsafe builtin arrays. Even the mere fear of such overhead can lead to disuse. At least vector is easily range checked at debug time and we can build checked versions on top of the unchecked default.
A range-for avoids range errors at no cost by implicitly accessing all elements in the range. As long as their arguments are valid, the standard-library algorithms do the same to ensure the absence of range errors.
If you use vector::at() directly in your code, you don’t need my Vec workaround. Furthermore, some standard libraries have range-checked vector implementations that offer more complete checking than Vec.