Home > Articles > Programming > C/C++

This chapter is from the book 

This chapter is from the book

20.7. String Tokenizing

Although the functionality in the preceding sections can perform nearly any form of pattern matching, C++11 also provides string-tokenizing functionality that is a superior alternative to the C-library strtok function. Tokenization is the process of breaking a string into a series of individual words, or tokens.

To take advantage of this feature, use the following syntax, in which str represents a string object containing the target string:

sregex_token_iterator  iter_name(str.begin(), str.end(), regex_obj, -1);
sregex_token_iterator  end_iter_name;

As with sregex_iterator, sregex_token_iterator is an adapter built on top of the string class; you can use the underlying template, regex_token_iterator, with other kinds of strings.

sregex_token_iterator performs a range of operations, most of which are similar to what the standard iterator does, as described in Section 20.5, β€œβ€œFind All,” or Iterative Searches.” Specifying -1 as the fourth argument makes the function skip over any patterns matching the regex_obj, causing the iterator to iterate through the tokensβ€”which consist of text between each occurrence of the pattern.

For example, the following statements find each word, in which words are delimited by any series of spaces and/or commas.

#include <regex>
#include <string>
using std::regex;
using std::string;
using std::sregex_token_iterator;
. . .
// Delimiters are spaces (\s) and/or commas
regex re("[\\s,]+");
string s = "The White Rabbit,  is very,late.";
sregex_token_iterator it(s.begin(), s.end(), re, -1);
sregex_token_iterator reg_end;
for (; it != reg_end; ++it) {
     std::cout << it->str() << std::endl;
}

These statements, when executed, print the following, ignoring spaces and commas (except as to recognize them as delimiters):

The
White
Rabbit
is
very
late.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.