- 2.1 Representing Ordinary Strings
- 2.2 Representing Strings with Alternate Notations
- 2.3 Using Here-Documents
- 2.4 Finding the Length of a String
- 2.5 Processing a Line at a Time
- 2.6 Processing a Character or Byte at a Time
- 2.7 Performing Specialized String Comparisons
- 2.8 Tokenizing a String
- 2.9 Formatting a String
- 2.10 Using Strings as IO Objects
- 2.11 Controlling Uppercase and Lowercase
- 2.12 Accessing and Assigning Substrings
- 2.13 Substituting in Strings
- 2.14 Searching a String
- 2.15 Converting Between Characters and ASCII Codes
- 2.16 Implicit and Explicit Conversion
- 2.17 Appending an Item onto a String
- 2.18 Removing Trailing Newlines and Other Characters
- 2.19 Trimming Whitespace from a String
- 2.20 Repeating Strings
- 2.21 Embedding Expressions within Strings
- 2.22 Delayed Interpolation of Strings
- 2.23 Parsing Comma-Separated Data
- 2.24 Converting Strings to Numbers (Decimal and Otherwise)
- 2.25 Encoding and Decoding <tt>rot13</tt> Text
- 2.26 Encrypting Strings
- 2.27 Compressing Strings
- 2.28 Counting Characters in Strings
- 2.29 Reversing a String
- 2.30 Removing Duplicate Characters
- 2.31 Removing Specific Characters
- 2.32 Printing Special Characters
- 2.33 Generating Successive Strings
- 2.34 Calculating a 32-Bit CRC
- 2.35 Calculating the SHA-256 Hash of a String
- 2.36 Calculating the Levenshtein Distance Between Two Strings
- 2.37 Encoding and Decoding Base64 Strings
- 2.38 Expanding and Compressing Tab Characters
- 2.39 Wrapping Lines of Text
- 2.40 Conclusion
2.8 Tokenizing a String
The split method parses a string and returns an array of tokenized strings. It accepts two parameters: a delimiter and a field limit (which is an integer).
The delimiter defaults to whitespace. Actually, it uses $; or the English equivalent $FIELD_SEPARATOR. If the delimiter is a string, the explicit value of that string is used as a token separator:
s1 = "It was a dark and stormy night." words = s1.split # ["It", "was", "a", "dark", "and", # "stormy", "night"] s2 = "apples, pears, and peaches" list = s2.split(", ") # ["apples", "pears", "and peaches"] s3 = "lions and tigers and bears" zoo = s3.split(/ and /) # ["lions", "tigers", "bears"]
The limit parameter places an upper limit on the number of fields returned, according to these rules:
- If it is omitted, trailing null entries are suppressed.
- If it is a positive number, the number of entries will be limited to that number (stuffing the rest of the string into the last field as needed). Trailing null entries are retained.
- If it is a negative number, there is no limit to the number of fields, and trailing null entries are retained.
These three rules are illustrated here:
str = "alpha,beta,gamma,," list1 = str.split(",") # ["alpha","beta","gamma"] list2 = str.split(",",2) # ["alpha", "beta,gamma,,"] list3 = str.split(",",4) # ["alpha", "beta", "gamma", ","] list4 = str.split(",",8) # ["alpha", "beta", "gamma", "", ""] list5 = str.split(",",-1) # ["alpha", "beta", "gamma", "", ""]
Similarly, the scan method can be used to match regular expressions or strings against a target string:
str = "I am a leaf on the wind..." # A string is interpreted literally, not as a regex arr = str.scan("a") # ["a","a","a"] # A regex will return all matches arr = str.scan(/\w+/) # ["I", "am", "a", "leaf", "on", "the", "wind"] # A block will be passed each match, one at a time str.scan(/\w+/) {|x| puts x }
The StringScanner class, from the standard library, is different in that it maintains state for the scan rather than doing it all at once:
require 'strscan' str = "Watch how I soar!" ss = StringScanner.new(str) loop do word = ss.scan(/\w+/) # Grab a word at a time break if word.nil? puts word sep = ss.scan(/\W+/) # Grab next non-word piece break if sep.nil? end