AV Foundation
Sounds good, but usually game apps provide their own soundtracks. How do you play your own music? This is one of the most common media tasks of all, and has multiple solutions on iOS. One of the simplest is over in another framework, AV Foundation. There, you'll find the AVAudioPlayer. This class makes it easy to load in an audio file in an iOS-supported format and play it:
NSString *audioFilePath = [[NSBundle mainBundle] pathForResource: @"MySong" ofType: @"m4a"]; NSURL *audioFileURL = [NSURL fileURLWithPath: audioFilePath]; NSError *playerErr = nil; AVAudioPlayer *player = [[AVAudioPlayer alloc] initWithContentsOfURL: audioFileURL error: &playerErr]; if (!playerErr) { [player play]; }
This is nice, but it's only a taste of what AV Foundation can do. Actually, it's a bit of a misnomer—AV Foundation originally housed a few simple audio-only classes (AVAudioPlayer, AVAudioRecorder, and AVAudioSession) before its scope was dramatically expanded in iOS 4.0 to be an all-encompassing audio/video production framework. Ironically, Media Player was home to the only video player prior to iOS 4, in the form of the MPMoviePlayerController. All of this means the simplest players for audio and video files are in frameworks where you wouldn't expect to find them.
But who needs simple? Let's get fancy. AV Foundation offers capture, editing, export, and playback of audio and video, and has been ported from iOS to Mac OS X Lion, where it is the heir-apparent to the 20-year-old QuickTime media library.
For playback, AV Foundation offers the AVPlayer, which is responsible for decompressing and rendering an audio or video file or network stream. To do video, you use a helper class, the AVPlayerLayer, which offers a CALayer that renders the portion of the item being played by a given AVPlayer. To play a video file from your app bundle, a simple approach would look like this:
NSString *videoFilePath = [[NSBundle mainBundle] pathForResource: @"myVideo" ofType: @"m4a"]; NSURL *videoFileURL = [NSURL fileURLWithPath: videoFilePath]; AVPlayer *player = [[AVPlayer alloc] initWithURL: videoFileURL]; AVPlayerLayer *playerLayer = [AVPlayerLayer playerLayerWithPlayer: player]; [self.view.layer addSublayer: playerLayer];
The choice of a CALayer is an interesting design choice, especially since the similar MPMoviePlayerController provides a UIView for its display. The essential difference is that a UIView is a subclass of UIResponder, which means it responds to touch events. This makes sense, because the MPMoviePlayerController provides its own playback controls. With the AVPlayerLayer, it's up to you to provide a GUI, meaning this class is much more amenable to a custom presentation.
Playback only scratches the surface of what AV Foundation can do, given that it was built as the engine to power the iOS version of iMovie. The A/V features of the framework are basically split into two groups: editing/export/playback and capture. To understand the former, think of each item you want to play. This is an AVAsset, an abstraction over some kind of playable media, which consists of multiple tracks, such as one or more synchronized stream of audio, video, text, and so on. You can build these on the fly, via the AVComposition subclass (and its many subclasses), meaning you can create a composition with one or more video tracks that are built from segments of other assets' video tracks, and audio tracks assembled the same way. This is how you make a home video with your iOS device: Create an audio track with some favorite music, then drop in segments of audio and video that have been captured with the camera and mic, then export it as an .m4v when you're done.
That brings us to capture. In AV Foundation, the capture classes are almost wholly unrelated to editing/playback, and with good reason. Editing is an offline action: You build an AVComposition like a document, performing one edit after another, and then playing or exporting it. With capture, you're interested in processing the media as it enters the system: writing it to a file, performing some kind of image processing, etc. To capture video with an iOS device, you create an AVCaptureSession, discover the available inputs (the front and back cameras, and the various mics), and connect them to the session. You then add outputs to the session, such as a .mov file to record into, and/or a AVCaptureVideoDataOutput to allow you to get incoming video frames via a callback and perform some kind of processing on them.
AV Foundation imports a low-level C framework called Core Media, which defines common abstractions like a CMTime that measures time in arbitrary units. The reason to do this is that each track in a composition likely has different time-keeping needs; the video may be 30 frames per second, but the audio could have 48,000 samples per second. Rather than using floating-point approximations that could eventually get out of sync due to rounding errors, the video can keep track of time in 30ths of a second, and the audio can count in 48,000ths of a second. Core Media also offers a CMSampleBuffer that offers a content-agnostic abstraction around media samples: It could refer to one frame of video captured from the camera, or tens of thousands of audio samples read from a file. By using a common abstraction, the methods in AV Foundation can be kept simpler than if they had to offer multiple versions for each possible media type.