- The Most Useful Regular-Expression Characters
- Escape Characters and Other Pitfalls
- Replacing Text: Some Simple Examples
- A Few More Words
Escape Characters and Other Pitfalls
If you already understand the basics of using regular expressions and want to start learning about search-and-replace functions in the new C++ library, you can skip this section. For everyone else, here's a little bit more review.
The C++ regular-expression library uses the backslash (\) as an escape character, which answers the question, "What if I want to match an actual parenthesis or bracket symbol, an actual asterisk (*), or an actual plus sign (+)?" For example, this expression matches the set of strings I listed earlier (ct, cat, caat, and so on):
ca*t
Whereas the following expression matches ca*t exactly, and nothing else:
ca\*t
That's because the backslash, as an escape character, says "Ignore the special meaning of the next character, and treat it literally."
But a problem occurs when we start writing actual C+ code, because the C++ language also uses the backslash as an escape character. Consider what happens if you specify ca\*t as a regular-expression object:
#include <regex> . . . regex reg= "ca\*t";
The problem is that C++ itself also recognizes the backslash as an escape character, so specifying a regular expression like this in C++ code won't do what you want. Instead, you have to code it this way:
regex reg= "ca\\*t";
C++ interprets two backslashes (\\) as a single backslash, so the following string is actually what gets passed to the regex functions:
ca\*t
That will do what we want.
The following table gives examples of how this technique works. Notice that matching an actual backslash in the text to be searched requires four backslashes, not one or two!
C++ Source Code |
Resulting String |
Regex Action |
ca*t |
ca*t |
Match ct with any number of a characters in between. |
ca\\*t |
ca\*t |
Match ca*t exactly |
ca\\\\t |
ca\\t |
Match ca\t exactly. |
\\(x\\) |
\(x\) |
Match (x) precisely. |
Remember that when you want a special character to lose its meaning and be interpreted literally, you have to use a backslash—which means using two backslashes (\\) in C++ source code. Parentheses have a special meaning in forming groups.