- 8.1 What Is a Regular Expression?
- 8.2 Expression Modifiers and Simple Statements
- 8.3 Regular Expression Operators
- 8.4 What You Should Know
- 8.5 What's Next?
8.2 Expression Modifiers and Simple Statements
A simple statement is an expression terminated with a semicolon. Perl supports a set of modifiers that allow you to further evaluate an expression based on some condition. A simple statement may contain an expression ending with a single modifier. The modifier and its expression are always terminated with a semicolon. When evaluating regular expressions, the modifiers may be simpler to use than the full-blown conditional constructs (discussed in Chapter 7, "If Only, Unconditionally, Forever").
The modifiers are
- if
- unless
- while
- until
- foreach
8.2.1 Conditional Modifiers
The if Modifier
The if modifier is used to control a simple statement consisting of two expressions. If Expression1 is true, Expression2 is executed.
Example 8.3
(In Script) 1 $_ = "xabcy\n"; 2 print if /abc/; # Could be written: print $_ if $_ =~ /abc/; (Output) xabcy
Explanation
- The $_ scalar variable is assigned the string xabcy.
- When the if modifier is followed directly by a regular expression, Perl assumes that the line being matched is $_, the default placeholder for pattern matching. The value of $_, xabcy, is printed if the regular expression abc is matched anywhere in the string.a The expression could have been written as if $_ =~ /abc/. (The =~ match operator will be discussed at the end of this chapter.)
Example 8.4
(In Script) 1 $_ = "I lost my gloves in the clover."; 2 print "Found love in gloves!\n" if /love/; # Long form: if $_ =~ /love (Output) Found love in gloves!
Explanation
- The $_ is assigned the string I lost my gloves in the clover.
- The regular expression love is matched in the $_ variable, and the string Found love in gloves! is printed; otherwise, nothing will be printed. The regular expression love is found in both gloves and clover. The search starts at the left-hand side of the string, so that matching love in gloves will produce the true condition before clover is reached. If $_ (or, for that matter, any other scalar) is used explicitly after the if modifier, then the =~ pattern matching operator is necessary when evaluating the regular expression.
8.2.2 The DATA Filehandle
In the following examples, the special filehandle called DATA is used as an expression in a while loop. This allows us to directly get the data from the same script that is testing it, rather than reading input from a separate text file. (You will learn all about filehandles in Chapter 10, "Getting a Handle on Files.") The data itself is located after the _ _DATA_ _2 special literal at the bottom of each of the example scripts. The _ _DATA_ _ literal marks the logical end of the script and opens the DATA filehandle for reading. Each time a line of input is read from <DATA>, it is assigned by default to the special $_ scalar. Although $_ is implied, you could also use it explicitly, or even some other scalar. The format used is shown in the following examples.
Example 8.5.
(The Script) 1 while(<DATA>){ 2 print if /Norma/; # Print the line if it matches Norma } 3 _ _DATA_ _ Steve Blenheim Betty Boop Igor Chevsky Norma Cord Jon DeLoach Karen Evich (Output) Norma Cord
Explanation
- The special DATA filehandle gets its input from the text after the _ _DATA_ _ token. When the while loop is entered, a line of input is stored in the $_ scalar variable. The first line stored in $_ is Steve Blenheim. The next time around the loop, Betty Boop is stored in $_, and this continues until all of the lines following the _ _DATA_ _ token are read and processed.
- Only the lines containing the regular expression Norma are printed. $_ is the default for pattern matching; it could also have been written as print $_ if $_ =~ /Norma/;.
- The DATA filehandle gets its data from the lines that follow the _ _DATA_ _ token.
Example 8.6
(The Script) 1 while(<DATA>){ 2 if /Norma/ print; # Wrong! } 3 _ _DATA_ _ Steve Blenheim Betty Boop Igor Chevsky Norma Cord Jon DeLoach Karen Evich (Output) Execution of script aborted due to compilation errors.
Explanation
- The special DATA filehandle gets its input from the text after the _ _DATA_ _ token. The while loop iterates through each line of text. Each line of input is assigned to $_, the default scalar used to hold a line of input and to test pattern matches.
- The modifier must be at the end of the expression, or a syntax error results. This statement should be print if /Norma/ or if(/Norma/) {print;}. (Similar to the grep command for UNIX.)
The unless Modifier
The unless modifier is used to control a simple statement consisting of two expressions. If Expression1 is false, Expression2 is executed. Like the if modifier, unless is placed at the end of the statement.
Example 8.8
(The Script) 1 while(<DATA>){ 2 print unless /Norma/; # Print line if it doesn't match Norma } 3 _ _DATA_ _ Steve Blenheim Betty Boop Igor Chevsky Norma Cord Jon DeLoach Karen Evich (Output) Steve Blenheim Betty Boop Igor Chevsky Jon DeLoach Karen Evich
Explanation
- The special DATA filehandle gets its input from the text after the _ _DATA_ _ token. The while loop is entered and the first line below the _ _DATA_ _ token is read in and assigned to $_, and so on.
- All lines that don't contain the pattern Norma are matched and printed. (Similar to the grep -v command for UNIX.)
- The DATA filehandle gets its data from the lines that follow the _ _DATA_ _ token.
8.2.3 Looping Modifiers
The while Modifier
The while modifier repeatedly executes the second expression as long as the first expression is true.
The until Modifier
The until modifier repeatedly executes the second expression as long as the first expression is false.
The foreach Modifier
The foreach modifier evaluates once for each element in its list, with $_ aliased to each element of the list, in turn.
Example 8.11
(The Script) 1 @alpha=(a .. z, "\n"); 2 print foreach @alpha; (Output) abcdefghijklmnopqrstuvwxyz
Explanation
- A list of lowercase letters is assigned to array @alpha.
- Each item in the list is aliased to $_ and printed, one at a time, until there are no more items in the list.