- 2.1 Representing Ordinary Strings
- 2.2 Representing Strings with Alternate Notations
- 2.3 Using Here-Documents
- 2.4 Finding the Length of a String
- 2.5 Processing a Line at a Time
- 2.6 Processing a Byte at a Time
- 2.7 Performing Specialized String Comparisons
- 2.8 Tokenizing a String
- 2.9 Formatting a String
- 2.10 Using Strings As IO Objects
- 2.11 Controlling Uppercase and Lowercase
- 2.12 Accessing and Assigning Substrings
- 2.13 Substituting in Strings
- 2.14 Searching a String
- 2.15 Converting Between Characters and ASCII Codes
- 2.16 Implicit and Explicit Conversion
- 2.17 Appending an Item Onto a String
- 2.18 Removing Trailing Newlines and Other Characters
- 2.19 Trimming Whitespace from a String
- 2.20 Repeating Strings
- 2.21 Embedding Expressions Within Strings
- 2.22 Delayed Interpolation of Strings
- 2.23 Parsing Comma-Separated Data
- 2.24 Converting Strings to Numbers (Decimal and Otherwise)
- 2.25 Encoding and Decoding rot13 Text
- 2.26 Encrypting Strings
- 2.27 Compressing Strings
- 2.28 Counting Characters in Strings
- 2.29 Reversing a String
- 2.30 Removing Duplicate Characters
- 2.31 Removing Specific Characters
- 2.32 Printing Special Characters
- 2.33 Generating Successive Strings
- 2.34 Calculating a 32-Bit CRC
- 2.35 Calculating the MD5 Hash of a String
- 2.36 Calculating the Levenshtein Distance Between Two Strings
- 2.37 Encoding and Decoding base64 Strings
- 2.38 Encoding and Decoding Strings (uuencode/uudecode)
- 2.39 Expanding and Compressing Tab Characters
- 2.40 Wrapping Lines of Text
- 2.41 Conclusion
2.3 Using Here-Documents
If you want to represent a long string spanning multiple lines, you can certainly use a regular quoted string:
str = "Once upon a midnight dreary, While I pondered, weak and weary..."
However, the indentation will be part of the string.
Another way is the here-document, a string that is inherently multiline. (This concept and term are borrowed from older languages and contexts.) The syntax is the << symbol, followed by an end marker, then zero or more lines of text, and finally the same end marker on a line by itself:
str = <<EOF Once upon a midnight dreary, While I pondered weak and weary,... EOF
Be careful about things such as trailing spaces on the final end marker line. Current versions of Ruby will fail to recognize the end marker in those situations.
Note that here-documents may be "stacked"; for example, here is a method call with three such strings passed to it:
some_method(<<str1, <<str2, <<str3) first piece of text... str1 second piece... str2 third piece of text. str3
By default, a here-document is like a double-quoted string—that is, its contents are subject to interpretation of escape sequences and interpolation of embedded expressions. But if the end marker is single-quoted, the here-document behaves like a single-quoted string:
str = <<'EOF' This isn't a tab: \t and this isn't a newline: \n EOF
If a here-document's end marker is preceded by a hyphen, the end marker may be indented. Only the spaces before the end marker are deleted from the string, not those on previous lines.
str = <<-EOF Each of these lines starts with a pair of blank spaces. EOF
Here is a style I personally like. Let's assume the existence of the margin method defined here:
class String def margin arr = self.split("\n") # Split into lines arr.map! {|x| x.sub!(/\s*\|/,"")} # Remove leading characters str = arr.join("\n") # Rejoin into a single line self.replace(str) # Replace contents of string end end
I've commented this fairly heavily for clarity. Parts of it involve features explained elsewhere in this chapter or later chapters.
It's used in this way:
str = <<end.margin |This here-document has a "left margin" |at the vertical bar on each line. | | We can do inset quotations, | hanging indentions, and so on. end
The word end is used naturally enough as an end marker. (This, of course, is a matter of taste. It "looks" like the reserved word end but is really just an arbitrary marker.) Each line starts with a vertical bar, which is then stripped off each line (along with the leading whitespace).