Sequences
As ever, the simplest construct is a sequence and the simplest regular expression is just a sequence of characters:
red
This will match, or find, any occurrence of the three letters r, e, and d, in that order in a string.
Thus the words red, lettered, and credible would all be picked up because they contain "red" within them.
Anchors
To provide greater control over the outcome of matches, we can supply some special characters (known as metacharacters to limit the scope of the search (see Table 1):
Table 1
Metacharacters Used in Sequences
Example |
Meaning |
^red |
Only at the start of a line |
red$ |
Only at the end of a line |
/Wred |
Only at the start of a word |
red/W |
Only at the end of a word |
These metacharacters are often called anchors because they fix, or anchor, the position of the re within a sentence or word. There are several other anchors defined in the re module documentation, which we don't cover in this article.
Wildcards
Sequences can also contain wildcard characters that can substitute for any character. The most general wildcard character is a period. Try this in Python:
>>> import re >>> re.match('be.t', 'best') <re.MatchObject instance at 864460> >>> re.match('be.t', 'bess')
The message in angle brackets tells us that the regular expression 'be.t', passed as the first argument, matches the string 'best' passed as the second argument. 'be.t' will also match 'beat', 'bent', 'belt', and so on. The second example did not match because 'bess' didn't end in t, so no MatchObject was created. Try out a few more matches to see how this works.
Ranges or Sets
The next type of wildcard is a range or set. This consists of a collection of letters enclosed in square brackets; the regular expression will search for any one of the enclosed letters.
>>> re.match('s[pwl]am', 'spam') <re.MatchObject instance at 7cab40>
By using a "^" sign as the first element of the group, we can say that it should look for any character except those listed. Thus, in this example
>>> re.match('[^f]ool', 'cool') <re.MatchObject instance at 864890> >>> re.match('[^f]ool','fool')
we can match 'cool' and 'pool', but we will not match 'fool'.
Finally, we can group sequences of characters, or other units, together by enclosing them in parentheses. Although this is not particularly useful in isolation, it is useful when combined with the repetition and conditional capabilities.