- Introduction
- Creating and Using String Objects
- Formatting Strings
- Accessing Individual String Characters
- Analyzing Character Attributes
- Case-Insensitive String Comparison
- Working with Substrings
- Using Verbatim String Syntax
- Choosing Between Constant and Mutable Strings
- Optimizing StringBuilder Performance
- Understanding Basic Regular Expression Syntax
- Validating User Input with Regular Expressions
- Replacing Substrings Using Regular Expressions
- Building a Regular Expression Library
3.4. Analyzing Character Attributes
You want to evaluate the individual characters in a string to determine a character's attributes.
Technique
The System.Char structure contains several static functions that let you test individual characters. You can test whether a character is a digit, letter, or punctuation symbol or whether the character is lowercase or uppercase.
Comments
One of the hardest issues to handle when writing software is making sure users input valid data. You can use many different methods, such as restricting input to only digits, but ultimately, you always need an underlying validating test of the input data.
You can use the System.Char structure to perform a variety of text-validation procedures. Listing 3.5 demonstrates validating user input as well as inspecting the characteristics of a character. It begins by displaying a menu and then waiting for user input using the Console.ReadLine method. Once a user enters a command, you make a check using the method ValidateMainMenuInput. This method checks to make sure the first character in the input string is not a digit or punctuation symbol. If the validation passes, the string is passed to a method that inspects each character in the input string. This method simply enumerates through all the characters in the input string and prints descriptive messages based on the characteristics. Some of the System.Char methods for inspection have been inadvertently left out of Listing 3.5. Table 3.3 shows the remaining methods and their functionality. The results of running the application in Listing 3.5 apper in Figure 3.1.
Listing 3.5 Using the Static Methods in System.Char to Inspect the Details of a Single Character
using System; namespace _4_CharAttributes { class Class1 { [STAThread] static void Main(string[] args) { char cmd = 'x'; string input; do { DisplayMainMenu(); input = Console.ReadLine(); if( (input == "" ) || ValidateMainMenuInput( Char.ToUpper(input[0]) ) == 0 ) { Console.WriteLine( "Invalid command!" ); } else { cmd = Char.ToUpper(input[0]); switch( cmd ) { case 'Q': { break; } case 'N': { Console.Write( "Enter a phrase to inspect: " ); input = Console.ReadLine(); InspectPhrase( input ); break; } } } } while ( cmd != 'Q' ); } private static void InspectPhrase( string input ) { foreach( char ch in input ) { Console.Write( ch + " - "); if( Char.IsDigit(ch) ) Console.Write( "IsDigit " ); if( Char.IsLetter(ch) ) { Console.Write( "IsLetter " ); Console.Write( "(lowercase={0}, uppercase={1})", Char.ToLower(ch), Char.ToUpper(ch)); } if( Char.IsPunctuation(ch) ) Console.Write( "IsPunctuation " ); if( Char.IsWhiteSpace(ch) ) Console.Write( "IsWhitespace" ); Console.Write("\n"); } } private static int ValidateMainMenuInput( char input ) { // a simple check to see if input == 'N' or 'Q' is good enough // the following is for illustrative purposes if( Char.IsDigit( input ) == true ) return 0; else if ( Char.IsPunctuation( input ) ) return 0; else if( Char.IsSymbol( input )) return 0; else if( input != 'N' && input != 'Q' ) return 0; return (int) input; } private static void DisplayMainMenu() { Console.WriteLine( "\nPhrase Inspector\n-------------------" ); Console.WriteLine( "N)ew Phrase" ); Console.WriteLine( "Q)uit\n" ); Console.Write( ">> " ); } } }
Table 3.3 System.Char Inspection Methods
Name |
Description |
IsControl |
Denotes a control character such as a tab or carriage return. |
IsDigit |
Indicates a single decimal digit. |
IsLetter |
Used for alphabetic characters. |
IsLetterOrDigit |
Returns true if the character is a letter or a digit. |
IsLower |
Used to determine whether a character is lowercase. |
IsNumber |
Tests whether a character is a valid number. |
IsPunctuation |
Denotes whether a character is a punctuation symbol. |
IsSeparator |
Denotes a character used to separate strings. An example is the space character. |
IsSurrogate |
Checks for a Unicode surrogate pair, which consists of two 16-bit values primarily used in localization contexts. |
IsSymbol |
Used for symbolic characters such as $ or #. |
IsUpper |
Used to determine whether a character is uppercase. |
IsWhiteSpace |
Indicates a character classified as whitespace such as a space character, tab, or carriage return. |
Figure 3.1 Use the static method in the System.Char class to inspect character attributes.
The System.Char structure is designed to work with a single Unicode character. Because a Unicode character is 2 bytes, the range of a character is from 0 to 0xFFFF. For portability reasons in future systems, you can always check the size of a char by using the MaxValue constant declared in the System.Char structure. One thing to keep in mind when working with characters is to avoid the confusion of mixing char types with integer types. Characters have an ordinal value, which is an integer value used as a lookup into a table of symbols. One example of a table is the ASCII table, which contains 255 characters and includes the digits 0 through 9, letters, punctuation symbols, and formatting characters. The confusion lies in the fact that the number 6, for instance, has an ordinal char value of 0x36. Therefore, the line of code meant to initialize a character to the number 6
char ch = (char) 6;
is wrong because the actual character in this instance is ^F, the ACK control character used in modem handshaking protocols. Displaying this value in the console would not provide the 6 that you were looking for. You could have chosen two different methods to initialize the variable. The first way is
char ch = (char) 0x36;
which produces the desired result and prints the number 6 to the console if passed to the Console.Write method. However, unless you have the ASCII table memorized, this procedure can be cumbersome. To initialize a char variable, simply place the value between single quotes:
char ch = '6';