Item 3: Avoid Ruby’s Cryptic Perlisms
If you’ve ever used the Perl programming language then you undoubtedly recognize its influence on Ruby. The majority of Ruby’s perlisms have been adopted in such a way that they blend perfectly with the rest of the ecosystem. But others either stick out like an unnecessary semicolon or are so obscure that they leave you scratching your head trying to figure out how a particular piece of code works.
Over the years, as Ruby matured, alternatives to some of the more cryptic perlisms were added. As more time went on, some of these holdovers from Perl were deprecated or even completely removed from Ruby. Yet, a few still remain, and you’re likely to come across them in the wild. This item can be used as a guide to deciphering those perlisms while acting as a warning to avoid introducing them into your own code.
The corner of Ruby where you’re most likely to encounter features borrowed from Perl is a set of cryptic global variables. In fact, Ruby has some pretty liberal naming rules when it comes to global variables. Unlike with local variables, instance variables, or even constants, you’re allowed to use all sorts of characters as variable names. Recalling that global variables begin with a “$” character, consider this:
def
extract_error (message)if
message =~/^ERROR:\s+(.+)$/
$1
else
"no error"
end
end
There are two perlisms packed into this code example. The first is the use of the “=~” operator from the String class. It returns the position within the string where the right operand (usually a regular expression) matches, or nil if no match can be found. When the regular expression matches, several global variables will be set so you can extract information from the string. In this example, I’m extracting the contents of the first capture group using the $1 global variable. And this is where things get a bit weird. That variable might look and smell like a global variable, but it surely doesn’t act like one.
The variables created by the “=~” operator are called special global variables. That’s because they’re scoped locally to the current thread and method. Essentially, they’re local values with global names. Outside of the extract_error method from the previous example, the $1 “global” variable is nil, even after using the “=~” operator. In the example, returning the value of the $1 variable is just like returning the value of a local variable. The whole situation can be confusing. The good news is that it’s completely unnecessary. Consider this alternative:
def
extract_error (message)if
m = message.match(/^ERROR:\s+(.+)$/
) m[1
]else
"no error"
end
end
Using String#match is much more idiomatic and doesn’t use any of the special global variables set by the “=~” operator. That’s because the match method returns a MatchData object (when the regular expression matches) and it contains all of the same information that was previously available in those special global variables. In this version of the extract_error method, you can see that using the index operator with a value of 1 gives you the same string that $1 would have given you in the previous example. The bonus feature is that the MatchData object is a plain old local variable and you get to choose the name of it. (It’s fairly common to make an assignment inside the conditional part of an if expression like this. That said, it’s all too easy to use “=” when you really meant “==”. Watch out for these kinds of mistakes.)
Besides those set by the “=~” operator, there are other global variables borrowed from Perl. The one you’re most likely to see is $:, which is an array of strings representing the directories where Ruby will search for libraries that are loaded with the require method. Instead of using the $: global variable, you should use its more descriptive alias: $LOAD_PATH. As a matter of fact, there are more descriptive versions for all of the other cryptic global variables such as $; and $/. But there’s a catch. Unlike with $LOAD_PATH, you have to load a library to access the other global variables’ aliases:
require('English'
)
Once the English library is loaded, you can replace all those strange global variables by their longer, more descriptive aliases. For a full list of these aliases, take a look at the documentation for the English module.
There’s one last perlism you should be aware of. Not surprisingly, it also has something to do with a global variable. Consider this:
while
readline printif
~/^ERROR:/
end
If you think this code is a bit obfuscated, then congratulations, you’re in good company. You might be wondering what the print method is actually printing and what that regular expression is matching against. It just so happens that all of the methods in this example are working with a global variable—the $_ variable to be more precise.
So, what’s going on here? It all starts with the readline method. More specifically, it’s the Kernel#readline method. (In Item 6, we’ll dig more into how Ruby determines that, in this context, readline comes from the Kernel module.) This version of readline is a little different from its counterpart in the IO class. You can probably gather that it reads a line from standard input and returns it. The subtle part is that it also stores that line of input in the $_ variable. (Kernel#gets does the same thing but doesn’t raise an exception when the end-of-file marker is reached.) In a similar fashion, if Kernel#print is called without any arguments, it will print the contents of the $_ variable to standard output.
You can probably guess what that unary “~” operator and the regular expression are doing. The Regexp#~ operator tries to match the contents of the $_ variable against the regular expression to its right. If there’s a match, it returns the position of the match; otherwise, it returns nil. While all these methods might look like they are somehow magically working together, you now know that it’s all thanks to the $_ global variable. But why does Ruby even support this?
The only legitimate use for these methods (and the $_ variable) is for writing short, simple scripts on the command line, so-called “one liners.” This allows Ruby to compete with tools such as Perl, awk, and sed. When you’re writing real code you should avoid methods that implicitly read from, or write to, the $_ global variable. These include other similar Kernel methods I haven’t listed here such as chomp, sub, and gsub. The difference with those is that they can no longer be used in recent versions of Ruby without using either the “-n” or the “-p” command-line option to the Ruby interpreter. That is, it’s like these methods don’t even exist without one of those command-line options. That’s a good thing.
Now you can see how some of the more cryptic perlisms can affect the readability, and thus maintainability, of your code. Especially those obscure global variables and the ones that are global in name only. It is best to use the more Ruby-like methods (String#match vs. String#=~) and the longer, more descriptive names for global variables ($LOAD_PATH vs. $:).
Things to Remember
- Prefer String#match to String#=~. The former returns all the match information in a MatchData object instead of several special global variables.
- Use the longer, more descriptive global variable aliases as opposed to their short cryptic names (e.g., $LOAD_PATH instead of $:). Most of the longer names are only available after loading the English library.
- Avoid methods that implicitly read from, or write to, the $_ global variable (e.g., Kernel#print, Regexp#~, etc.).