1.6 Regular Expressions
Perl is rich with capabilities. Probably the two most important features of Perl are hashes and regular expressions, each of which is described further in a separate chapter. In this section, we will introduce regular expressions and give a few simple examples. A regular expression is a set of characters used to express a pattern. This pattern is then used to determine if a string matches it. Once a pattern is matched, it may be stored, printed, replaced, or some action can be taken as a result of the match.
1.6.1 Pattern matching
It is a recurring theme that a program needs to verify that some input is an integer or that some input is composed of alphabetic characters. It should be noted that there are many other well-known patterns such as e-mail addresses, phone numbers, etc.
Pattern matching may be used to verify input. Although there are many issues with regard to regular expressions and pattern matching, our intent is to show a few simple examples and postpone the details until later.
In the simplest case, a regular expression is enclosed in forward slashes and is matched against $_. Thus, the following program prints all lines that match the pattern mike, that is, the lines contain the consecutive characters "m," "i," "k," and "e":
while(<STDIN>) { print if ( /mike/ ); }
Of course this is different than the question "Does $line equal the string mike?" This latter question is coded as:
if ($line eq "mike" ) { print $line; }
There are many regular expression metacharacters, that is, characters that do not represent themselves, but have special meanings. For example, the period means "any character"; therefore, the following program prints any line if it matches any three-letter pattern beginning with "r" and ending with "t":
while(<STDIN>) { print if ( /r.t/ ); }
This would include lines containing "rat," "rot," "rut," and also "strut" and "rotation." It would not match "rt3" or "rt." In other words, the period must match a single character regardless of what it is. You can use ^ and $ to anchor the pattern to the beginning or end of a string, respectively. You may also match against any string, not just $_, but in doing so, you need to use the operator =~. The following example illustrates these points:
while($line = <STDIN>) { print "$line" if ($line =~ /^r.t/ ); }
The above example prints all lines if they begin with "r," then have "any character," and end with "t." Case-sensitivity is always honored with regular expressions. To ignore case, just place an "i" after the pattern.
while($line = <STDIN>) { print "$line" if ($line =~ /r.t$/i ); }
The above code prints those lines that end with "r," "any character," "t," regardless of case. Keep in mind that regular expressions are with respect to strings and not lines, even though the examples thus far have all dealt with lines.
1.6.2 Substitutions
Another reason for doing a pattern match is to make a substitution if there is a match. Perl uses the s operator for this action. The following code will print each line entered on the standard output with Michael replacing Mike:
while(<STDIN>) { s/Mike/Michael/g; print; }
The substitution operator automatically operates on $_. If you omit the "g," then the substitution only occurs on the first occurrence of the pattern on each line. In either case, the match is made against $_, unless a specific variable is named. If you need to name the variable, you may code it as shown here:
while($line = <STDIN>) { $line =~ s/Mike/Michael/g; print $line; }