Coding Encoding
Let’s look at some code samples.
Text comes down to the wire as an NSData instance in which the “wire” could be a network condition or a file I/O action. To encode the text you are accessing, you can allocate an NSString and initialize it via the initWithData:encoding method. Notice the second parameter: encoding! So you need to know how the text was encoded.
Example 1: Encoding to ASCII
Listing 2.1 takes a string that contains accented characters, in this case the German word “Fußgängerübergänge.” We’ll examine what characters get “lost” from this encoding.
Listing 2.1 Encoding Text to ASCII
// Fußgängerübergänge - "sea voyage" NSString *uberUmlats = @"Fußgängerübergänge"; NSData *ASCIIData = [uberUmlats dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES]; NSString *encodeToASCII = [[NSString alloc] initWithData:ASCIIData encoding:NSASCIIStringEncoding]; NSLog(@"Encoded to ASCII - %@", encodeToASCII);
The encoding that is in place is ASCII via the NSASCIIEncoding specifier. The reason we get Fusgangerubergange as our return value is two-fold. First, by specifying ASCII we are limited to characters with code point values under 128, so essentially no accented characters, of which our string has three, “ß,” “ä,” and “ü.” Second, by using NSString’s dataUsingEncoding:allowLossyConversion: instance method, we can specify, in a sense, “Handle all the characters I give you, and if it’s an accented character, I’m okay with your losing that accent.” Although the result is not a correct German word, its display is very close, and its meaning can reasonably be interpreted. If we change the encoding type to NSMacOSRomanStringEncoding, there’s no guarantee how the characters will be converted and encoded. In fact, with this encoding, the result is an unintelligible Fu§g?¿nger¿berg¿nge.
Example 2: Returning the ASCII Value from a Character
Listing 2.2 takes a single character as its argument and returns the ASCII value associated with it.
Listing 2.2 Returning the ASCII Value for a Character
NSString *encodingFun = @"a"; if ([encodingFun length] > 0) { unichar ASCIIValue = [encodingFun characterAtIndex:0]; NSLog(@"ASCII value is %d", ASCIIValue); }
The returned ASCII value for the character a is 97. Note that the type unichar is used because it is a typedef for an unsigned short. The value returned by the characterAtIndex method is the Unicode decimal representation for the code point. An NSString object is usually represented by an array of unichars internally, hence the reason we are using unichar as a return type.
Example 3: Encoding an ASCII String to UTF-8
Listing 2.3 shows the potential of repairing some “damage.” Typically, if you are working with text that has accented characters, those characters are misinterpreted when encoded to ASCII. This example starts with a misinterpreted string and correctly encodes it to UTF-8. Note that a little magic incantation is involved with this snippet because we need to take an NSString object and covert it to a plain C string.
Listing 2.3 Encoding an ASCII String to UTF-8
NSString *notUTF = @"Nürnberg"; NSString *nowUTF = [NSString stringWithUTF8String:[notUTF cStringUsingEncoding: NSMacOSRomanStringEncoding]]; NSLog(@"Now a UTF8 string: %@", nowUTF);
The returned value is Nürnberg.
Example 4: Returning a String from an Encoding URL
Listing 2.4 takes an encoded string, in this case encoded from a valid URL, and returns text. With an encoded URL, many of the punctuation symbols and nonprinting characters are encoded, such as a space character encoded as %20. In the following argument, the chevrons < and > are encoded as %3C and %3E, respectively. The ampersand & is encoded as %26.
Listing 2.4 Returning a String from an Encoding URL
NSString *curentEncodedString = @"%3CTom%26Jerry%3E"; NSString *currentDecodedString = [curentEncodedString stringByReplacingPercentEscapesUsingEncoding:NSUTF8StringEncoding]; NSLog(@"Decoded string: %@", currentDecodedString);
The code returns <Tom&Jerry>.