Regular Expression Solutions to Common Problems
NOTE
The examples presented here are not the ultimate solutions to the problems presented. By now it should be clear that there rarely is an ultimate solution. More often, multiple solutions exist with varying degrees of tolerance for the unpredictable, and there is always a trade-off between performance of a pattern and its capability to handle any and all scenarios thrown at it. With that understanding, feel free to use the patterns presented here (and if needed, tweak them as suits you best).
North American Phone Numbers
The North American Numbering Plan defines how North American telephone numbers are formatted. As per the plan, telephone numbers (in the U.S.A., Canada, much of the Caribbean, and several other locations) are made up of a three-digit area code (technically, the NPA or numbering plan area) and then a seven-digit number (which is formatted as a three-digit prefix followed by a hyphen and a four-digit line number). Any digits may be used in a phone number with two exceptions: The first digit of the area code and the first digit of the prefix may not be 0 or 1. The area code is often enclosed within parentheses, and is often separated from the actual phone number by a hyphen. Matching one of (555) 555-5555 or (555)555-5555 or 555-555-5555 is easy; matching any of them (assuming that that is what you need) is a bit trickier.
J. Doe: 248-555-1234 B. Smith: (313) 555-1234 A. Lee: (810)555-1234
\ (?[2-9]\ d\ d\ )?[ -]?[2-9]\ d\ d-\ d{ 4}
J. Doe: 248-555-1234 B. Smith: (313) 555-1234 A. Lee: (810)555-1234
The pattern begins with the curious-looking \ (?. Parentheses are optional; \ ( matches (, and ? matches 0 or 1 instance of that (. [2-9]\ d\ d matches a three-digit area code (the first digit must be 2 through 9). \ )? matches the optional closing parenthesis. [ -]? matches a single space or a hyphen, if either of them exist. [2-9]\ d\ d-\ d{ 4} matches the rest of the phone number, the three-digits prefix (the first digit of which must be 2 through 9), followed by a hyphen and four more digits.
This pattern could easily be modified to handle other presentation formats. For example, 555.555.5555:
J. Doe: 248-555-1234 B. Smith: (313) 555-1234 A. Lee: (810)555-1234 M. Jones: 734.555.9999
[\ (.]?[2-9]\ d\ d[\ ).]?[ -]?[2-9]\ d\ d[-.]\ d{ 4}
J. Doe: 248-555-1234 B. Smith: (313) 555-1234 A. Lee: (810)555-1234 M. Jones: 734.555.9999
The opening match now tests for ( or . as an optional set, using pattern [\ (.]?. Similarly, [\ ).]? tests for ) or . (also both optional), and [-.] tests for – or .. Other phone number formats could be added just as easily.