Core Audio
The existence of the lower-level Core Media framework might have you asking, "How low can I go?" The name "Core Media" is deceptive, as it is just about data types and convenient conversions; it does no significant capture/processing/playback work of its own. Core Audio, on the other hand, is another story entirely.
Core Audio is the lowest-level audio processing available to the third-party developer. Core Audio's focus is the processing of audio by the system—usually samples coming off the mic, samples going out to the headphones, or both—as it happens. The design of Core Audio allows for this processing to be performed with extremely low latency; the samples your app works with can be going out the headphones in as little as 10-20 milliseconds after they've passed through your code. This is ideal for things like virtual instruments, where there needs to be an immediate audible response to a user's input, or the illusion of the instrument will be destroyed.
Core Audio's main audio processing engine is called audio units, which are distinct software components that process audio in some way. Some audio units generate audio (by synthesizing it or playing from a file), others manipulate the audio (effects, stereo mixers), and still others manage the interaction with hardware (receiving captured samples from the mic, playing out samples to speakers or headphones). You can build AUGraphs of units, connecting them together in interesting ways to create a processing chain for audio, or you can insert your code into the audio processing by registering for callbacks when one or more units processes a buffer of samples.
Not that this is easy! To achieve its high level of flexibility and low latency, the Core Audio programming idiom is one of setting and getting properties and processing buffers in callbacks, some of which occur on real-time threads and must complete their work against a hard deadline, or be abandoned by their caller. With all the procedural C and all the malloc() and sizeof() calls that go with it, gaining Core Audio confidence is a tough task. Fortunately, we have a whole book for that (check out Learning Core Audio: A Hands-On Guide to Audio Programming for Mac and iOS, published by Addison-Wesley Professional).
Open AL
One final media framework to keep in mind is Open AL. While its programming conventions are nothing like the usual Cocoa Touch or Core Foundation frameworks—it was developed outside of Apple and designed to resemble the Open GL graphics API—its implementation is actually built atop Core Audio, and shares its low latency.
Low latency audio is crucial for games, where you want to hear in-game events at the same time you see them, and games are the primary use case for Open AL. In Open AL, you define a listener that represents the user's location in 3D space, then place sources in that space. By connecting audio buffers to the sources, the listener perceives the sound as coming from the source's position. Add lots of sources, move them around, and stream the buffers to the sources, and before long you have a very immersive audio environment.
Like Core Audio, nearly everything in Open AL is done by setting properties: A source's position in space is one property, its current buffer of audio is another, as is whether to loop that buffer over and over or go silent when it finishes.
You'll often find Open AL code on the web that makes use of utility functions defined in ALUT.h. These functions were deprecated around 2005 and don't exist in the iOS version of OpenAL. It's problematic for some developers because many examples use a LoadWAVFile() to get audio into an Open AL buffer. Fortunately, Core Audio has functions for audio file I/O, and goes a step further with the ExtAudioFile API, which allows you to read from encoded files like .mp3 and aac, and automatically convert into the uncompressed LPCM (linear pulse code modulation) representation that Open AL needs.
So What to Do?
With so many media frameworks, is it all too much? No, because the reason we have different frameworks is that each does different things. Once you have non-trivial needs, the frameworks you need will become obvious, because only one will do what you want. To offer a general set of guidelines:
If You Need |
Use |
Access to songs, podcasts, audiobooks in user's media library |
MPMediaItem and MPMediaItemQuery in Media Player |
Audio file playback |
AVAudioPlayer or AVPlayer in AV Foundation |
Recording to audio file |
AVCaptureSession in AV Foundation or Audio Queue in Core Audio |
Real-time processing of captured audio |
Audio Queue or Audio Units in Core Audio |
Positional game audio |
Open AL |
Audio mixing from multiple sources |
Audio Units in Core Audio |
MIDI instruments connected via wi-fi or Camera Connection Kit |
Core MIDI in Core Audio |
Video file playback |
MPMoviePlayerController in Media Player or AVPlayer/AVPlayerLayer in AV Foundation |
Video capture |
AVCaptureSession in AV Foundation |
Video editing and export |
AVComposition and AVAssetExportSession in AV Foundation |
Hopefully, this broad overview gives you a taste of all the things that are possible with the media frameworks on iOS, and a sense of where to get started. The multimedia feature set exposed by the iOS SDK is unparalleled, and the potential is utterly unlimited.