Containers
Arrays have several disadvantages. Earlier in this chapter we discussed their lack of size information, which means you must use two arguments to pass an array to a function. It also means that you cannot check an array index at runtime to see whether it's out of bounds. It is easy to crash a program by using the wrong index; what is perhaps worsebecause the program seems to workis that memory can be silently overwritten. All C programmers will tell you that these are some of the worst bugs to solve. Built-in arrays are also inflexible in that they have a fixed size that must be a constant. Although it is very fast to access array data randomly, insertions and deletions are slow.
The standard library defines a number of container types. A container holds a number of elements, like an array, but it is more intelligent. In particular, it has size information and is resizable. We will discuss three kinds of standard containers in the following sections: vector, which is used like a built-in array, but is resizeable; list, which is easy to insert elements into; and map, which is an associative array. That is, it associates values of one type with another type.
Resizable Arrays: std::vector
You use the vector container type the same way you use an ordinary array, but a vector can grow when required. The following is a vector of 10 ints:
;> vector<int> vi(10); ;> for(int i = 0; i < 10; i++) vi[i] = i+1; ;> vi[5]; (int&) 6 ;> vector<int> v2; ;> v2.size(); (int) 0 ;> v2 = vi; ;> v2.size(); (int) 10
vector is called a parameterized type. The type in angle brackets (<>) that must follow the name is called the type parameter. vector is called parameterized because each specific type (vector<int>, vector<double>, vector<string>, and so on) is built on a specific base type, like a built-in array. In Chapter 10, "Templates," I will show you how you can build your own parameterized types, but for now it's only important that you know how to use them.
vi is a perfectly ordinary object that behaves like an array. That is, you can access any element very quickly using an index; this is called random access. Please note that the initial size (what we would call the array dimension) is in parentheses, not in square brackets. If there is no size (as with v2) then the vector is initially of size zero. It keeps its own size information, which you can access by using the size() method. You cannot initialize a vector in the same way as an array (with a list of numbers), but you can assign them to each other. The statement v2 = vi actually causes all the elements of vi to be copied into v2. A vector variable behaves just like an ordinary variable, in fact. You can pass the vi vector as an argument to a function, and you won't need to pass the size, as in the following example:
void show_vect(vector<int> v) { for(int i = 0; i < v.size(); i++) cout << v[i] << ` `; cout << endl; } ;> show_vect(vi); 1 2 3 4 5 6 7 8 9 10
You can resize the vector vi at any point. In the following example the elements of vi are initialized to random numbers between 0 and 99. (n % 100 will always be in that range). vi is then resized to 15 elements:
;> for(int i = 0; i < 10; i++) vi[i] = rand() % 100; ;> show_vect(vi); 41 67 34 0 69 24 78 58 62 64 ;> vi.resize(15); show_vect(vi); 41 67 34 0 69 24 78 58 62 64 0 0 0 0 0
You can resize the vi vector without destroying its values, but this can sometimes be quite a costly operation because the old values must be copied. Note that vectors are passed to functions by value, not by reference. Remember that passing by value involves making a copy of the whole object. In the following example, the function try_change() tries to modify its argument, but doesn't succeed. Earlier in this chapter ("Passing Arrays to Functions") you saw a similar example with built-in arrays, which did modify the first element of its array argument.
;> vector<int> v2 = vi; ;> v2.size(); (int) 15 ;> v2[0]; (int&) 41 ;> void try_change(vector<int> v) { v[0] = 747; } ;> try_change(v2); ;> v2[0]; (int&) 41
At this point, you may be tired of typing vector<int>. Fortunately, C++ provides a shortcut. You can create an alias for a type by using the typedef statement. The form of the typedef statement is just like the form of a declaration, except the declared names are not variables but type aliases. You can use these typedef names wherever you would have used the original type. Here are some examples of how to use typedef, showing how the resulting aliases can be used instead of the full type:
;> typedef unsigned int uint; ;> typedef unsigned char uchar; ;> typedef vector<int> VI; ;> typedef vector<double> DV; ;> uint arr[10]; ;> DV d1(10),d2(10); ;> VI v1,v2; ;> int get(VI v, int i) { return v[i]; }
Think of typedef names as the equivalent of constants. Symbolic constants make typing easier (typing pi to 12 decimal places each time is tedious) and make later changes easier because there is only one statement to be changed. In the same way, if I consistently use VI throughout a large program, then the code becomes easier to type (and to read). If I later decide to use some other type instead of vector<int>, then that changes becomes straightforward.
As you have learned, passing a vector (or any standard container) to a function involves a full copy of that vector. This can make a noticeable difference to a program's performance if the function is called enough times. You can mark an argument so that it is passed by reference, by using the address operator (&). You can further insist that it remains constant, as we did earlier in the chapter for arrays and as shown in the following example:
void by_reference (vector<int>& vi) { vi[0] = 0; } void no_modify (const vector<int>& vi) { cout << vi[0] << endl; }
Generally, you should pass vectors and other containers by reference; if you need to make a copy, it's best to do it explicitly in the function and make such reference arguments const, unless you are going to modify the vector. When experienced programmers see something passed by reference, they assume that someone is going to try to change it. So the preferred way of passing containers is by const reference, as in the preceding example. You can always use the typedef names to make things look easier on the eye, as shown here:
int passing_a_vector (const VI& vi) { return vi[0]; }
The standard string is very much like a vector<char>, and it is considered an "almost container." Strings can also be indexed like arrays, so if s is a string, then s[0] would be the first character (not substring), and s[s.size()-1] would be the last character.
Linked Lists: std::list
vectors have strengths and weaknesses. As you have seen, any insertion requires moving elements, so if a vector contained several million elements (and why not?), insertion could be unacceptably slow. Although vectors grow automatically, that process can also be slow because it involves copying all the elements in the vector.
Lists are also sequences of elements, but they are not accessed randomly, and they are therefore not like arrays. Starting with an empty list, you append values by using push_back(), and you insert values at the front of the list by using push_front(). back() and front() give the current values at each end. To remove values from the ends, you use pop_front() and pop_back(). The following is an example of creating a list:
;> list<int> li; ;> li.push_back(10); ;> li.push_front(20); ;> li.back(); (int) 10 ;> li.front(); (int) 20 ;> li.size(); (int) 2 ;> li.pop_back(); ;> li.back(); (int) 20
You can remove from a list all items with a certain value. After the remove operation, the list contains only "two":
;> list<string> ls; ;> ls.push_back("one"); ls.push_back("two"); ls.push_back("one"); ;> ls.remove("one");
Associative Arrays: std::map
In mathematics, a map takes members of some input set (say 0..n-1) to another set of values; a simple example would be an array. The standard C++ map is not restricted to contiguous (that is, consecutive) values like an array or a vector, however. Here is a simple map from int to int:
;> map<int,int> mii; ;> mii[4] = 2; (int&) 2; ;> mii[88] = 7; (int&) 7 ;> mii.size(); (int) 2 ;> mii[4]; (int&) 2 ;> mii[2]; (int&) 0
You access maps the same way you access arrays, but the key values used in the subscripting don't have to cover the full range. To create the map in the preceding example by using arrays, you would need at least 89 elements in the array, whereas the map needs only 2. If you consider a map of phone numbers and contact names, you can see that an ordinary array is not an option. maps become very interesting when the key values are non-integers; we say that they associate strings with values, and hence they are often called associative arrays. Typically, a map is about as fast as a binary search.
;> map<int,string> mis; ;> mis[6554321] = "James"; (string&) "James"; ;> mis.size(); (int) 1 ;> map<string,int> msi; ;> msi["James"] = 6554321; (int&) 6554321 ;> msi.size(); (int) 1 ;> msi["Jane"]; (int&) 0 ;> msi.size(); (int) 2
Something that is important to note about maps is that they get bigger if you are continuously querying them with different keys. Say you are reading in a large body of text, looking for a few words. If you are using array notation, each time you look up a value in the map, the map gets another entry. So a map of a few entries can end up with thousands of entries, most of which are trivial. Fortunately, there is a straightforward way around this: You can use the map's find() method. First, you can define some typedef names to simplify things:
;> typedef map<string,int> MSI; ;> typedef MSI::iterator IMSI; ;> IMSI ii = msi.find("Fred"); ;> ii == msi.end(); (bool) true
The find() method returns a map iterator, which either refers to an existing item or is equal to the end of the map.
Maps are some of the most entertaining goodies in the standard library. They are useful tools, and you can use them to write very powerful routines in just a few lines. Here is a function that counts word frequencies in a large body of text (testing this case, the first chapter of Conan Doyle's Hound of the Baskervilles, courtesy of the Gutenberg Project):
int word_freq(string file, MSI& msi) { ifstream in(file.c_str()); string word; while (in >> word) msi[word]++; return msi.size(); } ;> word_freq("chap1.txt",msi); (int) 945 ;> msi["the"]; (int&) 94
This example uses the shorthand for opening a file, and it assumes that the file will always exist. The real fun happens on the fourth line in this example. For each word in the file, you increment the map's value. If a word is not originally present in the map, msi[word] is zero, and a new entry is created. Otherwise, the existing value is incremented. Eventually, msi will contain all unique words, along with the number of times they have been used. This example is the first bit of code in this book that really exercises a machine. The UnderC implementation is too slow for analyzing large amounts of text, but Chapter 4, "Programs and Libraries," shows how to set up a C++ program that can be compiled into an executable program.
Stacks and Queues
Sometimes it's useful to build up a vector element by element. This works exactly like adding to the end of a list. You can add new elements at the end with push_back(); back() gives the value of the last value of the vector, and pop_back() removes the last value, decrementing the size.
;> typedef vector<int> VI; ;> VI vs; ;> vs.push_back(10); ;> vs.push_back(20); ;> show_vect(vs); 10 20 ;> vs.size(); (int) 2 ;> vs.back(); (int) 20 ;> vs.pop_back(); ;> vs.back(); (int) 10 ;> vs.size(); (int) 1 void read_some_numbers(VI& vi, string file) { int val; ifstream in(file.c_str()); while (in >> val) vi.push_back(val); }
Often you are given input without any idea of how many numbers to expect. If you use push_back(),the vector automatically increases in size to accommodate the new numbers. So the function read_some_numbers() will read an arbitrary number of integers and add them to the end of the vector.
There is no push_front() method because that would potentially be an expensive operation. If you really need to do it, you can use vi.insert(vi.begin(),val).
The operations push and pop define a stack. A stack is similar to the spring-loaded device often used for dispensing plates in cafeterias. As you remove plates from the top of the device (that is, "pop the stack"), more plates rise and are ready to be taken. You can also push plates onto the pile. A stack operates in first-in, last-out (FILO) fashion: if you push 10, 20, and 30, then you will pop 30, 20, and 10. Stacks are one of the basic workhorses of computer science, and you see them all over the place. A common use is to save a value, as in the following example:
void push(int val) { vi.push_back(val); } int pop() { int val = vi.back(); vi.pop_back(); return val; } ;> int val = 1; ;> push(val); // save val; ;> val = 42; // modify val; (int) 42 .... do things with val....... ;> val = pop(); // restore val;
A queue, on the other hand, operates in first-in, first-out (FIFO) fashion, similarly to a line of waiting people, who are served in first come, first served order. A vector is not a good implementation of a queue because inserting at the front causes all entries to shuffle along. lists, however, are good candidates for queuing. You add an item to a queue by using push_front(), and you take an item off the end by using pop_back(). Queues are commonly used in data communications, where you can have data coming in faster than it can be processed. So incoming data is bufferedthat is, kept in a queue until it is used or the buffer overflows. The good thing about a list is that it never overflows, although it can underflow, when someone tries to take something off an empty queue; therefore, it is important to check size. Graphical user interface systems such as Windows typically maintain a message queue, which contains all the user's input. So it is possible to type faster than a program can process keystrokes.