Hello AV Foundation
Now that you have a high-level understanding of AV Foundation and some deeper insight into the details of digital media, let’s wrap up this chapter by having a little fun.
Mac OS X has long had the NSSpeechSynthesizer class, making it easy to add text-to-speech features in Cocoa applications. You can add similar functionality to your iOS apps using AV Foundation’s AVSpeechSynthesizer class. This class is used to speak one or more utterances, which are instances of a class called AVSpeechUtterance. If you wanted to speak the phrase “Hello World!” you could do so as follows:
AVSpeechSynthesizer *synthesizer = [[AVSpeechSynthesizer alloc] init]; AVSpeechUtterance *utterance = [[AVSpeechUtterance alloc] initWithString:@"Hello World!"]; [synthesizer speakUtterance:utterance];
If you ran this code, you would hear the phrase “Hello World!” being spoken in the default voice for your locale. Let’s put this functionality into action by building a simple app that will carry on a conversation with AV Foundation.
All the projects you’ll build throughout this book have a “starter” and “final” version in the book’s sample code repository. The final version is the completed project and is ready to build and run. The starter version has the user interface and supporting classes completed and contains stubbed versions of the classes you’ll be developing. Additionally, most of the sample projects have the code factored in a way to isolate the AV Foundation code from the rest of the application. This will make it easy for us to stay focused on AV Foundation without getting bogged down in the user interface details; it also makes the sample apps accessible to you whether your primary experience is in OS X or iOS.
In the book’s sample code repository, you’ll find a starter project in the Chapter 1 directory called HelloAVF_Starter. Figure 1.10 shows this app in action.
Figure 1.10 Hello AV Foundation!
In the project you’ll find a class called THSpeechController. This is the class in which you’ll develop the application’s text-to-speech functionality. Listing 1.1 shows the interface for this class.
Listing 1.1 THSpeechController.h
#import <AVFoundation/AVFoundation.h> @interface THSpeechController : NSObject @property (strong, nonatomic, readonly) AVSpeechSynthesizer *synthesizer; + (instancetype)speechController; - (void)beginConversation; @end
This class has a simple interface with just a couple points to note. The header begins with an import of <AVFoundation/AVFoundation.h>, which is the umbrella header for the framework. This will be a common fixture in all the code you write throughout the course of this book. The key method in this class is beginConversation, which will kick off the text-to-speech functionality you’ll be building in a minute and put the app into action. Let’s switch over to the class implementation (see Listing 1.2).
Listing 1.2 THSpeechController.m
#import "THSpeechController.h" #import <AVFoundation/AVFoundation.h> @interface THSpeechController () @property (strong, nonatomic) AVSpeechSynthesizer *synthesizer; // 1 @property (strong, nonatomic) NSArray *voices; @property (strong, nonatomic) NSArray *speechStrings; @end @implementation THSpeechController + (instancetype)speechController { return [[self alloc] init]; } - (id)init { self = [super init]; if (self) { _synthesizer = [[AVSpeechSynthesizer alloc] init]; // 2 _voices = @[[AVSpeechSynthesisVoice voiceWithLanguage:@"en-US"], // 3 [AVSpeechSynthesisVoice voiceWithLanguage:@"en-GB"]]; _speechStrings = [self buildSpeechStrings]; } return self; } - (NSArray *)buildSpeechStrings { // 4 return @[@"Hello AV Foundation. How are you?", @"I'm well! Thanks for asking.", @"Are you excited about the book?", @"Very! I have always felt so misunderstood", @"What's your favorite feature?", @"Oh, they're all my babies. I couldn't possibly choose.", @"It was great to speak with you!", @"The pleasure was all mine! Have fun!"]; } - (void)beginConversation { } @end
- Define the class’s required properties in the class extension, redefining the synthesizer property that was defined in the header so that it’s read/write. Additionally, define properties for the voices and speech strings that will be used in the conversation.
- Create a new instance of AVSpeechSynthesizer. This is the object performing the text-to-speech conversion. It acts as a queue for one or more instances of AVSpeechUtterance and provides you with the interface to control and monitor the progress of the ongoing speech.
- Create an NSArray containing two instances of AVSpeechSynthesisVoice. Voice support is currently very limited. You don’t have the ability to specify named voices like you can on the Mac. Instead, each language/locale has one predefined voice. In this case, speaker #1 will use the U.S. English voice and speaker #2 will use the British English voice. You can get a complete listing of supported voices by calling the speechVoices class method on AVSpeechSynthesisVoice.
- Create an array of strings defining the back and forth of the contrived conversation.
With the basic set up of the class complete, let’s move on and discuss the implementation of the beginConversation method, as shown in Listing 1.3.
Listing 1.3 Implementing the beginConversation Method
- (void)beginConversation { for (NSUInteger i = 0; i < self.speechStrings.count; i++) { AVSpeechUtterance *utterance = // 1 [[AVSpeechUtterance alloc] initWithString:self.speechStrings[i]]; utterance.voice = self.voices[i % 2]; // 2 utterance.rate = 0.4f; // 3 utterance.pitchMultiplier = 0.8f; // 4 utterance.postUtteranceDelay = 0.1f; // 5 [self.synthesizer speakUtterance:utterance]; // 6 } }
- Loop through the collection of speech strings, and for each you’ll create a new instance of AVSpeechUtterance, passing the string to its initWithString: initializer.
- Toggle back and forth between the two voices you defined previously. Even iterations will speak in the U.S. voice and odd iterations will speak in the British voice.
- Specify the rate at which this utterance will be spoken. I’m setting this to a value of 0.4 to slow it down slightly from its default. I should point out the documentation states the allowed rate is between AVSpeechUtteranceMinimumSpeechRate and AVSpeechUtteranceMaximumSpeechRate. These currently have values of 0.0 and 1.0, respectively. However, because these are constants, it’s possible their values could change in a future iOS release. If you’re modifying the rate property, it may be safer to calculate the rate as a percentage of the min and max range.
- Specify the pitchMultiplier for the utterance. This changes the pitch of the voice as it speaks this particular utterance. The allowed values for the pitchMultiplier are between 0.5 (low pitch) and 2.0 (high pitch).
- Specify a postUtteranceDelay of 0.1f. This causes the speech synthesizer to pause slightly before speaking the next utterance. You can similarly set a preUtteranceDelay.
Run the application and listen to the conversation. It’s Hello World done AV Foundation-style!
Experiment with the various AVSpeechUtterance settings to get an understanding of how they work. Audition some of the other available voices. Create an instance of AVSpeechUtterance with the entire text of War and Peace and sit back and relax.