Using Library Algorithms in C++

Feb 15, 2002

␡

⎙ Print

Page 1 of 5 Next >

Many container operations apply to more than one type of container and have the same interface for each type that supports them. In this article, Andrew Koenig and Barbara Moo show how the library exploits these common interfaces to provide a collection of standard algorithms. By using these algorithms, you can avoid writing (and rewriting) the same code repeatedly. More important, you can write programs that are smaller and simpler than you would write otherwise -- sometimes astonishingly so.

This sample chapter is derived from the book Accelerated C++ (Addison Wesley Longman, 2000), by Andrew Koenig and Barbara Moo. It is part of Bjarne Stroustrup's C++ In-Depth series.

This chapter is from the book 

Accelerated C++: Practical Programming by Example

Learn More Buy

Many container operations apply to more than one type of container. For example, vector, string, and list allow us to insert elements by calling insert and remove elements by calling erase. These operations have the same interface for each type that supports them. For that matter, many container operations also apply to the string class.

Every container—as well as the string class—provides companion iterator types, which let us navigate through a container and examine its elements. Again, the library ensures that every iterator that supplies an operation does so through the same interface. For example, we can use the ++ operator to advance any type of iterator from one element to the next, we can use the * operator to access the element associated with any type of iterator, and so on.

In this article, we'll see how the library exploits these common interfaces to provide a collection of standard algorithms. By using these algorithms, we can avoid writing (and rewriting) the same code over and over again. More important, we can write programs that are smaller and simpler than we would write otherwise—sometimes astonishingly so.

Like containers and iterators, algorithms also use consistent interface conventions. This consistency lets us learn a few of the algorithms and then apply that knowledge to others as the need arises. In this chapter, we'll use several of the library algorithms to solve problems related to processing strings and student grades. Along the way, we'll cover most of the core concepts in the algorithm library.

Unless we say otherwise, the <algorithm> header defines all the algorithms that we introduce in this chapter.

Analyzing `string`s

We can use this loop to concatenate two character pictures:

for (vector<string>::const_iterator it = bottom.begin(); it !
= bottom.end(); ++it) ret.push_back(*it);

This loop is equivalent to inserting a copy of the elements of bottom at the end of ret, an operation that vectors provides directly:

ret.insert(ret.end(), bottom.begin(), bottom.end());

This problem has an even more general solution: We can separate the notion of copying elements from that of inserting elements at the end of a container, as follows:

copy(bottom.begin(), bottom.end(), back_inserter(ret));

Here, copy is an example of a generic algorithm, and back_inserter is an example of an iterator adaptor.

A generic algorithm is an algorithm that is not part of any particular kind of container, but instead it takes a cue from its arguments' types about how to access the data it uses. The standard library's generic algorithms usually take iterators among their arguments, which they use to manipulate the elements of the underlying containers. So, for example, the copy algorithm takes three iterators, which we'll call begin, end, and out, and copies all the elements in the range [begin, end) to a sequence of elements starting at out and extending as far as necessary. In other words,

copy(begin, end, out);

has the same effect as

while (begin != end) *out++ = *begin++;

except that the while body changes the values of the iterators, and copy doesn't.

Before we describe iterator adaptors, we should note that this loop depends on the use of the postfix version of the increment operators. These operators differ from the prefix versions, which we have used up to now, in that begin++ returns a copy of the original value of begin, incrementing the stored value of begin as a side effect. In other words,

it = begin++;

is equivalent to

it = begin; ++begin;

The increment operators have the same precedence as *, and they are both right-associative, which means that *out++ has the same meaning as *(out++). Thus,

*out++ = *begin++;

is equivalent to the more verbose

{ *out = *begin; ++out; ++begin; }

Let's return to iterator adaptors, which are functions that yield iterators with properties that are related to their arguments in useful ways. The iterator adaptors are defined in <iterator>. The most common iterator adaptor is back_inserter, which takes a container as its argument and yields an iterator that, when used as a destination, appends values to the container. For example, back_inserter(ret) is an iterator that, when used as a destination, appends elements to ret. Therefore,

copy(bottom.begin(), bottom.end(), back_inserter(ret));

copies all of the elements of bottom and appends them to the end of ret. After this function completes, the size of ret will have increased by bottom.size().

Notice that we could not call

// error—retis not an iterator copy(bottom.begin(), bottom.end(), ret);

because copy's third parameter is required to be an iterator, and we supplied a container as the corresponding argument. Nor could we call

// error—no element at ret.end() copy(bottom.begin(), 
bottom.end(), ret.end());

This latter mistake is particularly insidious because the program will compile. What it does when you try to run it is another story entirely. The first thing copy will try to do is assign a value to the element at ret.end(). There's no element there, so what the implementation will do is anybody's guess.

Why is copy designed this way? Because separating the notions of copying elements and expanding a container allows programmers to choose which operations to use. For example, we might want to copy elements on top of elements that already exist in a container, without changing the container's size. As another example, we might want to use back_inserter to append elements to a container that are not merely copies of another container's elements.

Another Way to `split`

Another function that we can write more directly using the standard algorithms is split. The hard part of writing that function was dealing with the indices that delimited each word in the input line. We can replace the indices by iterators and use standard-library algorithms to do much of the work for us:

// true if the argument is whitespace, 
false otherwise bool space(char c) { return isspace(c); } 
// false if the argument is whitespace, 
true otherwise bool not_space(char c) { return !isspace(c); } 
vector<string> split(const string& str) 
{ typedef string::const_iterator iter; vector<string> ret; 
iter i = str.begin(); while (i != str.end()) { 
// ignore leading blanks i = find_if(i, str.end(), not_space); 
// find end of next word iter j = find_if(i, str.end(), space); 
   // copy the characters in [i, j) if (i != str.end()) 
ret.push_back(string(i, j)); i = j; } return ret; }

This code uses a lot of new functions, so it will take a bit of explanation. The key idea to keep in mind is that it implements the same algorithm as the original, using i and j to delimit each word in str. Once we've found a word, we copy it from str and push the copy onto the back of ret.

This time, i and j are iterators, not indices. We use typedef to abbreviate the iterator type so that we can use iter instead of the longer string::const_iterator. Although the string type does not support all of the container operations, it does support iterators. Therefore, we can use the standard-library algorithms on the characters of a string, just as we can use them on the elements of a vector.

The algorithm that we use in this example is find_if. Its first two arguments are iterators that denote a sequence; the third is a predicate, which tests its argument and returns true or false. The find_if function calls the predicate on each element in the sequence, stopping when it finds an element for which the predicate yields true.

The standard library provides an isspace function to test whether a character is a space. However, that function is overloaded so that it will work with languages such as Japanese that use other character types, such as wchar_t ( § 1.3/14). It's not easy to pass an overloaded function directly as an argument to a template function. The trouble is that the compiler doesn't know which version of the overloaded function we mean because we haven't supplied any arguments that the compiler might use to select a version. Accordingly, we'll write our own predicates, called space and not_space, that make clear which version of isspace we intend.

The first call to find_if seeks the first nonspace character that begins a word. Remember that one or more spaces might begin a line or might separate adjacent words in the input. We don't want to include these spaces in the output.

After the first call to find_if, i will denote the first nonspace, if any, in str. We use i in the next call to find_if, which looks for the first space in [i, str.end()). If find_if fails to find a value that satisfies the predicate, it returns its second argument, which, in this case, is str.end(). Therefore, j will be initialized to denote the blank that separates the next word in str from the rest of the line, or, if we are on the last word in the line, j will be equal to str.end().

At this point, i and j delimit a word in str. All that's left is to use these iterators to copy the data from str into ret. In the earlier version of split, we used string::substr to create the copy. However, that version of split operated on indices, not iterators, and there isn't a version of substr that operates on iterators. Instead, we construct a new string directly from the iterators that we have. We do so by using an expression, string(i, j), that is somewhat similar to the definition of spaces. Our present example constructs a string that is a copy of the characters in the range [i, j). We push this new string onto the back of ret.

It is worth pointing out that this version of the program omits the tests of the index i against str.size(). Nor are there the obvious equivalent tests of the iterator against str.end(). The reason is that the library algorithms are written to handle gracefully calls that pass an empty range. For example, at some point the first call to find_if will set i to the value returned by str.end(), but there is no need to check i before passing it to the second call to find_if. The reason is that find_if will look in the empty range [i, str.end()) and will return str.end() to indicate that there is no match.

Palindromes

Another character-manipulation problem that we can use the library to solve succinctly is determining whether a word is a palindrome. Palindromes are words that are spelled the same way front to back and back to front. For example, civic, eye, level, madam, and rotor are all palindromes.

Here is a compact solution that uses the library:

bool is_palindrome(const string& s) 
{ return equal(s.begin(), s.end(), s.rbegin()); }

The return statement in this function's body calls the equal function and the rbegin member function, both of which we have not yet seen.

Like begin, rbegin returns an iterator, but this time it is an iterator that starts with the last element in the container and marches backward through the container.

The equal function compares two sequences to determine whether they contain equal values. As usual, the first two iterators passed to equal specify the first sequence. The third argument is the starting point for the second sequence. The equal function assumes that the second sequence is the same size as the first, so it does not need an ending iterator. Because we pass s.rbegin() as the starting point for the second sequence, the effect of this call is to compare values from the back of s to values in the front. The equal function will compare the first character in s with the last. Then it will compare the second to the next-to-last, and so on. This behavior is precisely what we want.

Finding URLs

As the last of our examples of character manipulation, let's write a function that finds Web addresses, called uniform resource locators (URLs), that are embedded in a string. We might use such a function by creating a single string that holds the entire contents of a document. The function would then scan the document and find all the URLs in it.

A URL is a sequence of characters of the form:

protocol-name://resource-name

where protocol-name contains only letters, and resource-name may consist of letters, digits, and certain punctuation characters. Our function will take a string argument and will look for instances of :// in that string. Each time we find such an instance, we'll look for the protocol-name that precedes it and the resource-name that follows it.

Because we want our function to find all the URLs in its input, we'll want it to return a vector<string>, with one element for each URL. The function executes by moving the iterator b through the string, looking for the characters :// that might be a part of a URL. If we find these characters, it looks backward to find the protocol-name, and it looks forward to find the resource-name:

vector<string> find_urls(const string& s) 
{ vector<string> ret; typedef string::const_iterator iter; 
iter b = s.begin(), e = s.end(); 
// look through the entire input while (b != e) { 
// look for one or more letters followed by :// b = url_beg(b, e); 
// if we found it if (b != e) { 
// get the rest of the URL iter after = url_end(b, e); 
// remember the URL ret.push_back(string(b, after)); 
   // advance band check for more URLs on this 
line b = after; } } return ret; }

We start by declaring ret, which is the vector into which we will put the URLs as we find them, and by obtaining iterators that delimit the string. We will have to write the url_beg and url_end functions, which will find the beginning and end of any URL in the input. The url_beg function will be responsible for identifying whether a valid URL is present and, if so, for returning an iterator that refers to the first character of the protocol-name. If it does not identify a URL in the input, then it will return its second argument (e, in this case), to indicate failure.

If url_beg finds a URL, the next task is to find the end of the URL by calling url_end. That function will search from the given position until it reaches either the end of the input or a character that cannot be part of a URL. It will return an iterator positioned one past the last character in the URL.

Thus, after the calls to url_beg and url_end, the iterator b denotes the beginning of a URL, and the iterator after denotes the position one past the last character in the URL:

e 
text http ://http://www.acceleratedcpp.com more text 
107 
b = url_beg(b, e) 
after = url_end(b, e)

We construct a new string from the characters in this range and push that string onto the back of ret.

All that remains is to increment the value of b and to look for the next URL. Because URLs cannot overlap one another, we set b to (one past) the end of the URL that we just found and continue the while loop until we've looked at all the input. Once that loop exits, we return the vector that contains the URLs to our caller.

Now we have to think about url_beg and url_end. The url_end function is simpler, so we'll start there:

string::const_iterator url_end(string::const_iterator b, 
string::const_iterator e) { return find_if(b, e, not_url_char); }

This function just forwards its work to the library find_if function. The predicate that we pass to find_if is one that we will write, named not_url_char. It will return true when passed a character that cannot be in a URL:

bool not_url_char(char c) { 
// characters, in addition to alphanumerics, that can appear in a 
URL static const string url_ch = "~;/?:@=&$-_.+!*'(),"; 
   // see whether c can appear in a URL and return the negative 
return !(isalnum(c) || find(url_ch.begin(), url_ch.end(), c) != 
url_ch.end()); }

Despite being small, this function uses a fair bit of new material. First is the use of the static storage class specifier. Local variables that are declared to be static are preserved across invocations of the function. Thus, we will construct and initialize the string url_ch only on the first call to not_url_char. Subsequent calls will use the object that the first call constructed. Because url_ch is a const string, its value will not change once we have initialized it.

The not_url_char function also uses the isalnum function, which the <cctype> header defines. This function tests whether its argument is an alphanumeric character (a letter or a digit).

Finally, find is another algorithm that we haven't used yet. It is similar to find_if, except that, instead of calling a predicate, it looks for the specific value given as its third argument. As with find_if, if the value that we want is present, the function returns an iterator denoting the first occurrence of the value in the given sequence. If the value is not found, then find returns its second argument.

With this information in hand, we can now understand the not_url_char function. Because we negate the value of the entire expression before we return it, not_url_char will yield false if c is a letter, a digit, or any of the characters in url_ch. If c is any other value, the function returns true.

Now the hard part begins: implementing url_beg. This function is messy because it must deal with the possibility that the input might contain :// in a context that cannot be a valid URL. In practice, we'd probably have a list of acceptable protocol-names and look only for those. For simplicity, though, we'll limit ourselves to being sure that one or more letters precede the :// separator and at least one character follows it:

string::const_iterator url_beg(string::const_iterator b, 
string::const_iterator e) { static const string sep = "://"; 
typedef string::const_iterator iter; 
// i marks where the separator was found iter i = b; 
while ((i = search(i, e, sep.begin(), sep.end())) != e) { 
// make sure the separator isn't at the beginning or end of the 
line if (i != b && i + sep.size() != e) { 
// beg marks the beginning of the protocol-name iter beg = i; 
while (beg != b && isalpha(beg[-1])) --beg; 
   // is there at least one appropriate character before and 
after the separator? if (beg != i && !not_url_
char(i[sep.size()])) return beg; } 
   // the separator we found wasn't part of a URL; 
advance i past this separator i += sep.size(); } return e; }

The easy part is writing the function header. We know that we'll be passed two iterators denoting the range in which to look and that we'll return an iterator that denotes the beginning of the first URL in that range, if one exists. We also declare and initialize a local string, which will hold the characters that make up the separator that identifies a potential URL. Like url_ch in the not_url_char function, this string is static and const. Thus, we will not be able to change the string, and its value will be created only on the first invocation of url_beg.

The function executes by placing two iterators into the string delimited by band e:

e 
109 
b 
text http://www.acceleratedcpp.com more text 
beg i

The iterator i will denote the beginning of the URL separator, if any, and beg will indicate the beginning of the protocol-name, if any.

The function first looks for the separator by calling search, a library function that we haven't used before. This function takes two pairs of iterators: The first pair denotes the sequence in which we are looking, and the second pair denotes the sequence that we want to locate. As with other library functions, if search fails, it returns the second iterator. Therefore, after the call to search, either i denotes (one past) the end of the input string, or it denotes a : that is followed by //.

If we found a separator, the next task is to get the letters (if any) that make up the protocol-name. We first check whether the separator is at the beginning or the end of the input. If the separator is in either of those places, we know that we don't have a URL because a URL has at least one character on each side of its separator. Otherwise, we need to try to position the iterator beg. The inner while loop moves beg backward through the input until it hits either a nonalphabetic character or the beginning of the string. It uses two new ideas: The first is the notion that if a container supports indexing, so do its iterators. In other words, beg[-1] is the character at the position immediately before the one that beg denotes. We can think of beg[-1] as an abbreviation for *(beg - 1). The second new idea is the isalpha function, defined in <cctype>, which tests whether its argument is a letter.

If we were able to advance the iterator over as much as a single character, we can assume that we've found a protocol-name. Before returning beg, we still have to check that there's at least one valid character following the separator. This test is more complicated. We know that there is at least one more character in the input because we're inside the body of an if that compares the value of i + sep.size() with e. We can access the first such character as i[sep.size()], which is an abbreviation for *(i + sep.size()). We test whether that character can appear in a URL by passing the character to not_url_char. This function returns true if the character is not valid, so we negate the return to check whether the character is valid.

If the separator is not part of a URL, then the function advances i past the separator and keeps looking.

This code uses the decrement operator, which we have not previously used. It works like the increment operator, but it decrements its operand instead. As with the increment operator, it comes in prefix and postfix versions. The prefix version, which we use here, decrements its operand and returns the new value.

Page 1 of 5 Next >

🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Privacy Notice

Overview

Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security

Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children

This site is not directed to children under the age of 13.

Marketing

Pearson may send or direct marketing communications to users, provided that

Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
Such marketing is consistent with applicable law and Pearson's legal obligations.
Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out

Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

As required by law.
With the consent of the individual (or their parent, if the individual is a minor)
In response to a subpoena, court order or legal process, to the extent permitted or required by law
To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
To investigate or address actual or suspected fraud or other illegal activities
To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links

This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020

Email Address