How to Design a Generic Algorithm
Here is our task. We are given a vector of integer values. We are asked to return a new vector holding all the values that are less than 10. A quick but inflexible solution is the following:
vector<int> less_than_10( const vector<int> &vec ) { vector<int> nvec; for ( int ix = 0; ix < vec.size(); ++ix ) if ( vec[ ix ] < 10 ) nvec.push_back( vec[ ix ] ); return nvec; }
If the user wants all the elements less than 11, we must either create a new function or generalize this one to allow the user to specify a value against which to compare the elementsfor example:
vector<int> less_than( const vector<int> &vec, int less_than_val );
But our next task is actually somewhat more difficult. We must allow the user to specify an alternative operation, such as greater than, less than, and so on. How can we parameterize an operation?
One solution is to replace the less-than operator with a function call. We add a third parameter, pred, specifying a pointer to a function having a parameter list of two integers and returning a bool. less_than() isn't the right name any longer, so let's call it filter():
vector<int> filter( const vector<int> &vec, int filter_value, bool (*pred)( int, int ));
For our user's convenience, we also define a number of relational functions that can be passed to filter():
bool less_than( int v1, int v2 ) { return v1 < v2 ? true : false; } bool greater_than( int v1, int v2 ) { return v1 > v2 ? true : false; }
and so on. The user can then either pass one of these functions to an invocation of filter() or define her own relational function. The only constraint is that the function passed must return bool and accept two integers in its parameter list. Here is how filter() might be invoked:
vector<int> big_vec; int value; // ... fill big_vec and value vector<int> lt_10 = filter( big_vec, value, less_than );
The only task left for us is actually to implement filter():
vector<int> filter( const vector<int> &vec, int filter_value, bool (*pred)( int, int )) { vector<int> nvec; for ( int ix = 0; ix < vec.size(); ++ix ) // invokes the function addressed by pred // tests element vec[ix] against filter_value if ( pred( vec[ ix ], filter_value )) nvec.push_back( vec[ ix ] ); return nvec; }
This implementation of filter() explicitly iterates across each element using a for loop. Let's replace the use of the for loop with the find_if() generic algorithm. We repeatedly apply find_if() to the sequence to identify each element that meets the criteria defined by the user-specified pointer to function. How might we do that?
Let's start with finding every element equal to 10. The find() generic algorithm takes three arguments: the two iterators marking the first and 1 past the last element to examine, and the value we are looking for. In the following code, count_occurs() illustrates how to apply find() repeatedly to a container without looking at any element twice:
int count_occurs( const vector<int> &vec, int val ) { vector<int>::const_iterator iter = vec.begin(); int occurs_count = 0; while (( iter = find( iter, vec.end(), val )) != vec.end() ) { ++occurs_count; ++iter; // address next element } return occurs_count; }
The while loop assigns iter the return value of find(). find() returns an iterator addressing an element equal to val or, if no matching element is found, an iterator equal to vec.end(). When iter is equal to vec.end(), the loop terminates.
The success of the while loop depends on our advancing iter 1 past the matching element with each iteration of the loop. For example, let's say that vec contains the following elements: {6,10,8,4,10,7,10}. The declaration statement
vector<int>::const_iterator iter = vec.begin();
initializes iter to address the first element of the vector that holds the value 6. find() returns an iterator addressing the second element. Before we reinvoke find(), we must advance iter by 1. find() is next invoked with iter addressing the third element. find() returns an iterator addressing the fifth element, and so on.
Function Objects
Before we reimplement filter() to support find_if(), let's look at the predefined function objects provided by the standard library. A function object is an instance of a class that provides an overloaded instance of the function call operator. Overloading the call operator allows a function object to be used just as if it were a function.
A function object implements what we would otherwise define as an independent function. Why do we bother? The primary reason is efficiency. We can inline the call operator, thereby eliminating the function call overhead that comes with invoking the operation through a pointer to function.
The standard library predefines a set of arithmetic, relational, and logical function objects. In the following list, type is replaced by a built-in or class type in an actual use of the function object:
Six arithmetic function objects: plus<type>, minus<type>, negate<type>, multiplies<type>, divides<type>, and modulus<type>
Six relational function objects: less<type>, less_equal<type>, greater<type>, greater_equal<type>, equal_to<type>, and not_equal_to<type>
Three logical function objects, using the &&, ||, and ! operators, respectively: logical_and<type>, logical_or<type>, and logical_not<type>
To use the predefined function objects, we must include the associated header file:
#include <functional>
For example, by default, sort() orders its elements in ascending order using the less-than operator of the underlying element type. If we pass sort() the greater-than function object, the elements are now sorted in descending order:
sort( vec.begin(), vec.end(), greater<int>() );
The syntax
greater<int>()
causes an unnamed greater class template object to be created and passed into sort().
binary_search() expects the elements that it searches to be sorted by the less-than operator. For it to search our vector correctly, we must now pass it an instance of the function object used to sort our vector:
binary_search( vec.begin(), vec.end(), elem, greater<int>() );
Let's display the Fibonacci series in a series of increasingly impenetrable disguises: each element added to itself, each element multiplied by itself, each element added to its associated Pell series element, and so on. One way to do this is by using the transform() generic algorithm and the function objects plus<int> and multiplies<int>.
transform() must be passed (1) a pair of iterators to mark the element range to transform, (2) an iterator to point to the beginning of the container from which to fetch the values to apply to the transformation, (3) an iterator to point to the beginning of the container where we are to place the result of each transformation, and (4) the function object (or pointer to function) representing the operation to apply. For example, here is our addition of the Pell elements to those of the Fibonacci:
transform( fib.begin(), fib.end(), // (1) pell.begin(), // (2) fib_plus_pell.begin(), // (3) plus< int >() ); // (4)
In this example, the target vector, pell, must be at least as large as fib, or else the transform() algorithm will overflow pell.
In this next call of transform(), we multiply each element by itself and store the result by overriding the original element:
transform( fib.begin(), fib.end(), // (1) fib.begin(), fib.begin(), // (2), (3) multiplies< int >() ); // (4)
Function Object Adapters
These function objects do not quite work with what we need to do with find_if(). The less<type> function object, for example, expects two values. It evaluates to true if the first value is less than the second. In our case, each element must be compared against the value specified by the user. Ideally, what we need to do is to turn less<type> into a unary operator by binding the second value to that specified by the user. In this way, less<type> compares each element against that value. Can we actually do that? Yes. The standard library provides an adapter mechanism to do just that.
A function object adapter modifies a function object. A binder adapter converts a binary function object into a unary object by binding one of the arguments to a particular value. This is just what we need. There are two binder adapters: bind1st, which binds the value to the first operand, and bind2nd, which binds the value to the second. Here is a possible modification of filter() using the bind2nd adapter:
vector<int> filter( const vector<int> &vec, int val, less<int> < ) { vector<int> nvec; vector<int>::const_iterator iter = vec.begin(); // bind2nd( less<int>, val ) // binds val to the second value of less<int> // less<int> now compares each value against val while (( iter = find_if( iter, vec.end(), bind2nd( lt, val ))) != vec.end() ) { // each time iter != vec.end(), // iter addresses an element less than val nvec.push_back( *iter ); iter++; } return nvec; }
How might we generalize filter() further to eliminate its dependence both on the element type of the vector and on the vector container itself? To eliminate the dependency on the element type, we turn filter() into a template function and add the type to our template declaration. To eliminate the vector container dependency, we pass in a first,last iterator pair. Instead, we add another iterator to the parameter list indicating where we should begin copying the elements. Here is our reimplementation:
template <typename InputIterator, typename OutputIterator, typename ElemType, typename Comp> OutputIterator filter( InputIterator first, InputIterator last, OutputIterator at, const ElemType &val, Comp pred ) { while (( first = find_if( first, last, bind2nd( pred, val ))) != last ) { // just to see what is going on ... cout << "found value: " << *first << endl; // assign value, then advance both iterators *at++ = *first++; } return at; }
Can you see how you might actually call filter()? Let's write a small program to test it using both a built-in array and a vector. We need two of each container type: one to hold the values to be filtered, and one to hold the elements that get filtered. For the moment, we define the target container to be the same size as the original container. In another section, we look at an alternative solution using insert iterator adapters.
int main() { const int elem_size = 8; int ia[ elem_size ] = { 12, 8, 43, 0, 6, 21, 3, 7 }; vector<int> ivec( ia, ia+elem_size ); // containers to hold the results of our filter() int ia2[ elem_size ]; vector<int> ivec2( elem_size ); cout << "filtering integer array for values less than 8\n"; filter( ia, ia+elem_size, ia2, elem_size, less<int>() ); cout << "filtering integer vector for values greater than 8\n"; filter( ivec.begin(), ivec.end(), ivec2.begin(), elem_size, greater<int>() ); }
When compiled and executed, this program generates the following output:
filtering integer array for values less than 8 found value: 0 found value: 6 found value: 3 found value: 7 filtering integer vector for values greater than 8 found value: 12 found value: 43 found value: 21
A negator adapter reverses the truth value of a function object. not1 reverses the truth value of a unary function object. not2 reverses the truth value of a binary function object. For example, to identify the elements greater than or equal to 10, we can negate the result of the less<int>() function object:
while (( iter = find_if( iter, vec.end(), not1( bind2nd( less<int>, 10 )))) != vec.end() )
In general, there is no one solution to a problem. Our approach to finding all the elements less than a value, for example, involves looking at each element in turn and copying each value if it is less than the specified value. That solves our problem but is not the only approach we might have taken.
An alternative approach is the following: First, we sort the vector. Next, using find_if(), we locate the first element that is greater than the value. Finally, we delete all the elements from that found element to the end of the vector. Actually, we'll sort a local copy of the vector. Users might not appreciate our changing the element order of their vector. Here is a nontemplate version of this solution:
vector<int> sub_vec( const vector<int> &vec, int val ) { vector<int> local_vec( vec ); sort( local_vec.begin(), local_vec.end() ); vector<int>::iterator iter = find_if( local_vec.begin(), local_vec.end(), bind2nd( greater<int>, val )); local_vec.erase( iter, local_vec.end() ); return local_vec; }
Okay, whew. This is an intense section, and making sense of it might require a second reading and possibly writing some code. A good exercise is to try your hand at turning sub_vec() into a template function along the lines of filter(). Let me summarize what we've done.
We start with a function to find the elements in a vector of integers that have a value less than 10. We decide that hard-coding the value is too restrictive.
We first add a value parameter to allow the user to indicate a value against which to compare the vector elements.
We next add a pointer to a function parameter to allow the user to indicate which comparison filter to apply.
We then introduce function objects, which provide an alternative, more efficient method of passing an operation into a function. We briefly review the built-in function objects provided by the standard library. (In Chapter 4, we look at how to write our own function objects.)
Finally, we reimplement the function as a template function. To support multiple container types, we pass an iterator pair marking the first and 1 past the last element to traverse. To support multiple element types, we parameterize the element type. To support both pointers to functions and function objects, we parameterize the comparison operation to apply to the elements.
Our function is now independent of the element type, the comparison operation, and the container. In short, we have transformed our original function into a generic algorithm.