- 20.1. Overview of C++11 Regular Expressions
- 20.2. Dealing with Escape Sequences (\)
- 20.3. Constructing a RegEx String
- 20.4. Matching and Searching Functions
- 20.5. "Find All," or Iterative, Searches
- 20.6. Replacing Text
- 20.7. String Tokenizing
- 20.8. Catching RegEx Exceptions
- 20.9. Sample App: RPN Calculator
- Exercises
20.2. Dealing with Escape Sequences (\)
Escape sequences are a little tricky in C++ regular expressions, because they occur in two contexts.
- C++ assigns special meaning to the backslash within a string literal and requires it to be escaped to be read as an actual backslash: To represent a single backslash, it’s necessary to place double backslashes (\\) in the source code. (Exception: Raw literals, supported by C++11, remove the need to escape characters.)
- The regular-expression interpreter also recognizes a backslash as the escape character. To render a special character literally, you must precede it with a backslash (\).
Consequently, if you want to render a special character literally, then, within a C++ literal string, you must precede the character with two backslashes, not just one.
For example, suppose you want to specify a pattern that matches an actual plus sign (+). The pattern is specified in source code this way:
std::regex reg("\\+");
When the C++ compiler reads the literal, “\\+”, it interprets \\ as an escape sequence that represents a single backslash. The actual string data that gets stored in memory is therefore:
\+
This is the string read by the regular-expression interpreter. It interprets “\+” as an actual plus sign (+).
Consider the following regular-expression pattern:
std::regex reg("\\++");
Notice what’s going on here: The first three characters (\\+) represent a literal plus sign (+). The fourth character (+) has its usual—and special—meaning; this second plus sign modifies the overall pattern to mean, “Match one or more copies of the preceding expression.” The string as a whole therefore matches any of the following:
+ ++ +++++
How do you represent a literal backslash, should you ever need to do that? That is, what is the regular-expression pattern that matches a target string consisting of one or more backslashes? The answer is that you need four backslashes.
using std::regex; regex reg("\\\\+"); // Matches one or more backslashes.
This regular-expression object, reg, would match any of the following:
str1[] = "\\" // Represents "\". str2[] = "\\\\" // Represents "\\". str3[] = "\\\\\\" // Represents "\\\".
Note that if you use raw-string literals, supported by the C++11 specification, you don’t have to deal with C literal-string escape conventions, so this example would be coded as:
regex reg(R"\\+"); // Matches one or more backslashes. str1[] = R"\" // Represents "\". str2[] = R"\\" // Represents "\\". str3[] = R"\\\" // Represents "\\\".
The use of R does not change the format of the strings (which is still const char*); it merely changes how literal text inside the quoted string is interpreted.