Effective Perl Programming: Files and Filehandles
- Item 51. Don't ignore the file test operators.
- Item 52. Always use the three-argument open.
- Item 53. Consider different ways of reading from a stream.
- Item 54. Open filehandles to and from strings.
- Item 55. Make flexible output.
- Item 56. Use File::Spec or Path::Class to work with paths.
- Item 57. Leave most of the data on disk to save memory.
It's easy to work with files in Perl. Its heritage includes some of the most powerful utilities for processing data, so it has the tools it needs to examine the files that contain those data and to easily read the data and write them again.
Perl's strength goes beyond mere files, though. You probably think of files as things on your disk with nice icons. However, Perl can apply its file-handle interface to almost anything. You can use the filehandle interface to do most of the heavy lifting for you. You can also store filehandles in scalar variables, and select which one you want to use later.
Item 51. Don't ignore the file test operators.
One of the more frequently heard questions from newly minted Perl programmers is, "How do I find the size of a file?" Invariably, another newly minted Perler will give a wordy answer that works, but requires quite a bit of typing:
my ( $dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks ) = stat($filename);
Or, perhaps they know how to avoid the extra variables that they don't want, so they use a slice (Item 9):
my ($size) = ( stat $filename )[7];
When you are working this hard to get something that should be common, stop to think for a moment. Perl is specifically designed to make the common things easy, so this should be really easy. And indeed, it is if you use the -s file test operator, which tells you the file size in bytes:
my $size = -s $filename;
Many people overlook Perl's file test operators. Maybe they are old C programmers, maybe they've seen only the programs that other people write, or they just don't trust them. This is a shame; they are succinct and efficient, and tend to be more readable than equivalent constructs written using the stat operator. Curiously, the file test operators are the first functions listed in perlfunc, because they are under the literal -X. If you want to read about them, you tell perldoc to give you the function named -X:
% perldoc -f -X
File tests fit into loops and conditions very well. Here, for example, is a list of the text files in a directory. The -T file test decides if the contents are text by sampling part of the file and guessing.
Almost all file tests use $_ by default:
my @textfiles = grep { -T
} glob "$dir_name/*";
The -M and -A file tests return the modification and access times of the file, but in days relative to the start of the program. That is, Perl takes the time the program was started, subtracts the time the file was modified or accessed, and gives you back the result in days. Positive values are in the past, and negative values indicate times after the start of the program. That seems really odd, but it makes it easy to measure age in terms a human can understand. If you want to find the files that haven't been modified in the past seven days, you look for a -M value that is greater than 7:
my $old_files = grep { -M > 7
} glob '*';
If you want to find the files modified after your program started, you look for negative values. In this example, if -M returns something less than zero, map gives an anonymous array that has the name of the file and the modification age in days; otherwise, it gives the empty list:
my @new_files = map {-M
< 0 ? [ $_,-M
] : () } glob '*';
Reusing work
If you want to find all of the files owned by the user running the program that are executable, you can combine the file tests in a grep:
my @my_executables = grep {-o
and-x
} glob '*';
The file test operators actually do the stat call for you, figure out the answer, and give it back to you. Each time you run a file test, Perl does another stat. In the last example, Perl did two stats on $_.
If you want to use another file test operator on the same file, you can use the virtual _ filehandle (the single underscore). It tells the file test operator to not call stat and instead reuse the information from the last file test or stat. Simply put the _ after the file test you want. Now you call only one stat for each item in the list:
my @my_executables = grep {-o
and-x _
} glob '*';
Stacked file tests
Starting with Perl 5.10, you can stack file test operators. That is, you test the same file or filehandle for several properties at the same time. For instance, if you want to check that a file is both readable and writable by the current user, you list the -r and -w file tests before the file:
use 5.010;
if ( -r -w
$file ) {
print "File is readable and writable\n";
}
There's nothing especially magic about this, since it's a syntactic shortcut for doing each operation independently. Notice that the equivalent long form does the test closest to the file first:
if (-w
$file and-r
$file ) { print "File is readable and writable\n"; }
Rewriting the example from the previous section, you'd have:
my @my_executables = grep { -o -x
} glob '*';
Things to remember
- Don't call stat directly when a file test operator will do.
- Use the _ virtual filehandle to reuse data from the last stat.
- Stack file test operators in Perl 5.10 or later.