- Introduction
- Creating and Using String Objects
- Formatting Strings
- Accessing Individual String Characters
- Analyzing Character Attributes
- Case-Insensitive String Comparison
- Working with Substrings
- Using Verbatim String Syntax
- Choosing Between Constant and Mutable Strings
- Optimizing StringBuilder Performance
- Understanding Basic Regular Expression Syntax
- Validating User Input with Regular Expressions
- Replacing Substrings Using Regular Expressions
- Building a Regular Expression Library
3.12. Replacing Substrings Using Regular Expressions
You want to replace all substrings that match a regular expression with a different substring that also uses regular-expression syntax.
Technique
Create a Regex object, passing the regular expression used to match characters in the input string to the Regex constructor. Next, call the Regex method Replace, passing the input string to process and the string to replace each match within the input string. You can also use the static Replace method, passing the regular expression as the first parameter to the method as shown in the last line of Listing 3.10.
Listing 3.10 Using Regular Expressions to Replace Numbers in a Credit Card with xs
using System; using System.Text.RegularExpressions; namespace _12_RegExpReplace { class Class1 { [STAThread] static void Main(string[] args) { Regex cardExp = new Regex( @"(\d{4})-(\d{4})-(\d{4})-(\d{4})" ); string safeOutputExp = "$1-xxxx-xxxx-$4"; string cardNum; Console.Write( "Please enter your credit card number: " ); cardNum = Console.ReadLine(); while( cardExp.Match( cardNum ).Success == false ) { Console.WriteLine( "Invalid card number. Try again." ); Console.Write( "Please enter your credit card number: " ); cardNum = Console.ReadLine(); } Console.WriteLine( "Secure Output Result = {0}", cardExp.Replace( cardNum, safeOutputExp )); } } }
Comments
Although input validation is an extremely useful feature of regular expressions, they also work well as text parsers. The previous recipe used regular expressions to verify that a particular string matched a regular expression exactly. However, you can also use regular expressions to match substrings within a string and return each of those substrings as a group. Furthermore, you can use a separate regular expression that acts on the result of the regular-expression evaluation to replace substrings within the original input string.
Listing 3.10 creates a regular expression that matches the format for a credit card. In that regular expression, you can see that it will match on four different groups of four digits apiece separated by a dash. However, you might also notice that each one of these groups is surrounded with parentheses. In an earlier recipe, I mentioned that to use a literal parenthesis, you must escape it using a backslash because of the conflict with regular-expression grouping symbols. In this case, you want to use the grouping feature of regular expressions. When you place a portion of a regular expression within parentheses, you are creating a numbered group. Groups are numbered starting with 1 and are incremented for each subsequent group. In this case, there are four numbered groups. These groups are used by the replacement string, which is contained in the string safeOutputExp. To reference a numbered group, use the $ symbol followed by the number of the group to reference. This sequence represents all characters within the input string that match the group expression within the regular expression. Therefore, in the replacement string, you can see that it prints the characters within the first group, replaces the characters in the second and third groups with xs, and finally prints the characters in the fourth group.
One thing to note is that you can use the RegEx class to view the groups themselves. If you change the regular expression to "\d{4}", you can then use the Matches method to enumerate all the groups using the foreach keyword, as shown in Listing 3.11. In the listing, the program first checks to make sure at least four matches were made. This number corresponds to four groups of four digits. Next, it uses a foreach enumeration on each Match object that is returned from the Matches method. If the match is in the second or third group, the values are replaced with xs; otherwise, the Match object's value, the characters within that group, are concatenated to the result string.
Listing 3.11 Enumerating Through the Match Collection to Perform Special Operations on Each Match in a Regular Expression
static void TestManualGrouping() { Regex cardExp = new Regex( @"\d{4}" ); string cardNum; string safeOutputExp = ""; Console.Write( "Please enter your credit card number: " ); cardNum = Console.ReadLine(); if( cardExp.Matches( cardNum ).Count < 4 ) { Console.WriteLine( "Invalid card number" ); return; } foreach( Match field in cardExp.Matches( cardNum )) { if( field.Success == false ) { Console.WriteLine( "Invalid card number" ); return; } if( field.Index == 5 || field.Index == 10 ) { safeOutputExp += "-xxxx-"; } else { safeOutputExp += field.Value; } } Console.WriteLine( "Secure Output Result = {0}", safeOutputExp ); }