iTV Food Chain
Chapter Objectives
-
Introduce some of the content food chain
-
Describe entities, their roles and relationships
-
Describe content authoring, integration, packaging, and emission issues
-
Explain relations between MPEG-based and IP-based transports
Bringing iTV content to the consumer's receiver is complex and requires manipulation by various equipment operated by numerous organizations. It is more complex than traditional TV content in that it combines data with video. This chapter serves as a food chain overview rather than a complete reference. It discusses some of the issues that relate to the complex steps that iTV content passes through, from the authoring stage, through production, broadcasting, and all the way to viewing. The various protocols and processes are described that enable various aspects of content delivery and interactivity. The relationships between these protocols are also described.
3.1 Overview
Content passes through multiple steps on its way to the consumer TV display (see Figure 3.1). Video and audio may be produced in a motion picture studio or a live show recording and encoded into a digital TV transport bit stream. An iTV software application may be produced by software developers and compiled into a format that can be embedded within a digital TV transport stream. It is also possible for iTV content to be combined with content received from a satellite.
Figure 3.1. The passage of content from the studio or downlink to the viewer's TV display.
Regardless of the source, all content received is multiplexed, or combined, into a single transport stream that is then modulated, or reformatted, to be transmissible via a TV broadcast antenna, a satellite uplink, or an IP-based network. At the consumer's home, regardless of the path, the digital TV signal received by the set-top box is processed to produce a meaningful GUI and video format that can be displayed, and audio that can be rendered onto a speaker.
3.1.1 Distribution Model
In a typical (satellite) distribution model, digital programming is compressed at an uplink and delivered compressed, using MPEG-2 video digital format and an audio streaming format (e.g., AC-3), to a number of receiving facilities hosting cable headend equipment (see Figure 3.2). The receiving stations need not be connected to the main content uplink by physical cables or any means other than the satellite receiver dish. Each uplink generates multi-program MPEG-2 transport streams carrying up to hundreds of channels. Each satellite may be fed by more than one uplink through multiplexing. Each receiving facility may in turn receive content from multiple satellites. Therefore, it is possible to have many uplinks map to many satellites which map to many receiving station headends.
Figure 3.2. Cable content processing pipeline from feed to the TV display.
A satellite transponder processes the information as follows: A real-time MPEG-2 encoder processes the various components of a program (i.e., video, audio, and data), and generates a single program MPEG-2 transport stream for each program. Each of these transport streams is secured using a Conditional Access (CA) system. Subsequently, a real-time multiplexer is used to combine the single program streams with CA into a multiple program MPEG-2 transport stream that occupies the entire satellite channel capacity, which is typically up to 50 to 60 Mbps. Subsequently, modulation equipment applies some additional error correction, and modulates the signal using QPSK on an L-band signal (approximately 1.15–1.75 GHz). Subsequently, the transmission equipment up-converts each L-band carrier to the satellite transmission bands of C-band (approximately 5.85–6.5 GHz), Ku-band (approximately 7.9–8.5 GHz), or X-band (approximately 14–14.5 GHz). Typically, 24 L-band carriers are amplified, combined, and fed to the dish through a wave guide.
At the receiving station, the headend equipment is used to process and manage the signal received from the satellite. A headend room typically contains multiple racks that host a wide range of equipment performing a down-conversion process reversing the up-conversion done by the uplink station and further preparing the content for distribution to individual subscribers.
The down conversion to an L-band is performed using a low noise block converter and demodulated to recover the multi-program MPEG-2 transport from the QPSK signal. A demultiplexer then separates each multi-program transport into its single program components. Subsequently, the CA system decrypts each single program transport using the same key matching the uplink encryption key. Subsequently, the CA system encrypts each single program transport separately using a different local key to enable separate authorization of each program. In some cases it is necessary for the headend to enhance the single program streams to add or drop program elements; the current trend is to perform such grooming at the uplink side, thus reducing headend equipment costs.
Distributing the content to individual subscribers is one of the main functions of the cable system headend. Intuitively, one might view the headend as being at the edge of the broadband IP network. It features all of the components necessary for live content encoding, aggregation and distribution, as well as subscriber management (see Figure 3.3). The headend broadcasts video content protected with CA encryption. The receiver is connected through a public IP network to an ISP, which typically connects via a high-bandwidth (probably leased) private net work to the headend. Set-top boxes at the subscriber's site communicate with the headend to receive the GUI, content, and Internet access information.
Figure 3.3. A simplified headend ISP receiver interconnectivity diagram.
To deliver content from the headend to individual subscribers, it is carried through cables. To carry the digital signal over the analog physical cable, the MPEG transport streams are modulated using Quadrature Amplitude Modulation (QAM) according to ITU J.83 Annex B. Multiple single program transport streams, each typically at a bit rate of 6 Mbps, are multiplexed into a single multi-program transport stream with a combined bit rate of approximately 26.97 Mbps for 64-QAM, or approximately 38.81 Mbps for 256-QAM.
At the consumer's home, a hybrid digital and analog set-top box performs the final processing of the signal which results in a format that can be projected on a TV display. This final processing requires the viewer to perform a (virtual) channel selection, which for digital TVs, specifies major and minor channel numbers selecting a specific virtual channel. Through the use of EPGs, this selection is used to set the tuner frequency to extract the appropriate transport stream from the collection of streams carried on a wide range of frequencies (approximately 54 MHz to 750 Mhz). The QAM signal received by the tuner is demodulated to recover the multi-program MPEG-2 transport. That transport stream is fed to a demultiplexer that extracts the program associated with the selected virtual channel. Subsequently, a Point of Deployment (POD) device, which is a component of the CA system, decrypts the content to recover the program's audio, video, and data. The receiver's middleware processes that data and generates the GUI that is shown on the TV display.
To enable TV-commerce as well as other scenarios that require interactivity, each one of the consumer's cable modems interacts with a dedicated port within a CMTS, also known as the Cable Router, that is part of the headend. The CMTS enables transparent bidirectional IP traffic between the iTV set-top box and the headend; it enables the headend to serve as the ISP of the set-top box. The CMTS equipment may have a variety of backbone interfaces, including various ATM Adaptation Layers (AALn) over Synchronous Optical NETwork (SONET) STS-3c (155.52 Mbps) or DS3 Fiber Optic (45 Mbps), Fiber Distributed Data Interface (FDDI), IEEE 802.3, or Ethernet over 10BASE-T or 100BASE-T.
3.1.2 Short History
To place this in context, Figure 3.4 illustrates past trends of technology and capacity of cable systems in the US; Japan and South Korea have already improved over the US by at least one order of magnitude. The first generation set-top boxes relied on a 50 MHz cable connection and featured a return channel, Impulse Pay Per View (IPPV), interactive services, cable modem upstream, and IP Telephony.
Figure 3.4. Cable systems technology and capacity development.
Next came 450 MHz connections delivering approximately 60 channels, which were succeeded by 550 MHz connections carrying approximately 80 channels. To reduce deployment costs, these systems did not require set-top boxes and only a cable-ready TV was required. This is why, despite the huge increase in capacity, most services were in the analog domain even though the headend was digital. Each analog video channel is 6 MHz, where Channel 2 started at 54 MHz. Basic services included off-air broadcasts (e.g., NBC, government access, etc.). Expanded basic services included a wide range of cable only channels (e.g., CNN, ESPN, TWC, etc.). Premium services added movie channels (e.g., HBO and Showtime). In all cases the CA systems were analog.
Next, came the MPEG-based Digital Video Services (DVS), utilizing 750 MHz connections to deliver about 250 digital channels in addition to the analog channels. Services included Pay Per View (PPV), movie channels and Near Video On Demand (NVOD), and niche programming. A single 6 MHz band of 27 Mbps delivered 8–12 channels occupying 2–3 Mbps each. The additional spectrum is sometimes used for IP services such as 24-hour Internet connections or IP telephony.
Connections of the 1 GHz range are the near future, with trials underway. These new highspeed connections are not likely to speed up the Internet as they represent the last mile to the consumer's home and cannot alleviate the traffic congestion on backbone systems. Although the additional bandwidth could be used for additional programming, we are more likely to see an increase in customized services such as VOD, as well as new types of services such as HDTV, video conferencing, and video telephony.
3.1.3 Content Production Food Chain
One of the major driving forces behind interactive TV is the convergence of Internet technologies with television. As a result, many of the standards that have been proposed are derivative of Internet content formats, such as HTML and JavaScript. The DVB MHP, OCAP [OCAP], ATVEF, and ATSC Digital TV Application Software Environment (DASE), are all based on various HTML formats with extensions for interactive TV. The same route was followed by the SMPTE, concerned with standardization of technologies used to support production and the life cycle of iTV content, when it developed Declarative Data Essence (DDE) standard (version 1.0 is used by ATVEF). Related SMPTE standards include content exchange formats, asset management formats, communication protocols and connectors, and techniques for assembly and archiving of content.
The life of an Interactive TV program starts when its video and software application are produced (see Figure 3.5). The video is produced using traditional film making processes, and the software application is produced using traditional software engineering processes. Once the two components are produced, they need to be integrated by a group of creative personnel. The process of achieving a high-quality viewer experience is iterative, and may require, at times, back-and-forth interaction between the integration professionals and those producing the individual components. Such interaction is often challenging and time consuming due to language gaps and misaligned goals.
Figure 3.5. A simplification of the complex food chain from authoring to viewing.
Integration is often a daunting task as well. Quality assurance and testing engineers are often needed to ensure that the content indeed plays correctly on the target set-top box. A meta-data and configuration management professional is often needed to enable various downstream distribution processes such as ad insertion and syndication.
Once the components have been edited and integrated, they are stored in archives as files using various media such as DVDs, CDs, and tapes. Many issues relate to format and means of storage. Although traditional formats have been relatively stable, new formats have been developing rapidly rendering them usable for short periods of time. For example, a program recorded using a format popular in the year 2000 might not be readable to players in 2010. In contrast, a tape recorded in VHS format is usable decades after the date of recording.
Once iTV programs are released by their producer, they are likely to pass through to distributors (or advertisers). The distributors may deal with content aggregators, such as cable system owners or broadcasters, who are responsible for assembling material for the media distribution channels they own. Whereas broadcasters control TV stations, cable system owners control headends. The food chain issues are further complicated as the relationships between broadcasters and cable system owners develop. For example, headend operators can extract programs from a free broadcast and pass them through to cable subscribers.
Some iTV programs provide local interactivity, whereby the viewer interacts with the receiver or set-top box, without the need for outside communications. For iTV programs that require support for remote interactivity (e.g., to perform purchases or other financial transactions online), the set-top box needs access to a return channel. To enable remote interactivity, a bi-directional virtual interactive channel needs to be maintained between the set-top box and a server hosting the iTV program. The viewer, at one end of this channel, should be connected through an ISP; the headend operator or broadcaster, at the other end of the interaction channel, should be connected through a possibly different ISP for Internet access.
The entire suite of processes is complex and detailed. As an example, gradually more detailed views of the subprocesses leading from initial scripting through production to publication, are depicted in Figures 3.6 and 3.7.
Figure 3.6. A simplified content authoring process.
Figure 3.7. A simplified process leading from production to publication.
The input to the authoring process is a collection of concepts and abstract ideas about the project. Brainstorming and detailed teamwork are often required to produce usable scripts. Taking a script to the next level, the acquisition level, requires the right combination of talent and capital, typically possessed by large studios. During the acquisition phase, content is gathered from various sources such as cameras (i.e., filming), tapes (i.e., of existing content), and from libraries of iTV software components or applications. Next, is a phase in which various contributions are made to the material to fill whatever gaps are identified and not filled by the acquisition phase. In the editing phase, the material is assembled into a coherent presentation and viewer experience. Because many gaps are identified during the editing phase, contribution and editing are often intertwined and regarded as a single step.
The subsequent post-production process is multifaceted. Additional video editing may be performed on one system, special effects may be introduced, and additional audio work may be needed. The final steps prior to completion are the preview (i.e., quality assurance) and packaging. Added, in this phase, is the production meta-data, which summarizes much of the creative effort. This includes information about section transitions, special effects, and synchronization of those transitions and effects with images and sounds. Once the packaging is complete, the content is ready for archiving and syndication and streaming.
The input to the subprocess from production to broadcasting is the original material and production meta-data, such as format and resolution information, packaged at the end of the authoring process. The meta-data is passed through an editing station, which associates it with the original material in an editing storage facility. During the editing, captioning meta-data is linked to, and synchronized with, the material in the editing storage facility. From the editing storage, material is archived and organized using a content inventory system, or an asset management system, from which syndication is typically performed.
Once content is selected for broadcast, it is copied to a play-out storage facility and associated with a broadcast time slot. Next, it is reformatted and compiled according to the target broadcast venue. For on-the-air broadcast, the material is converted, using an encoder and multiplexer, into an MPEG-2 transport stream, modulated, and fed into the broadcast equipment. For Web publication, the material is placed on a Web server and reformatted to fit in a Web page. For cable broadcast, the material is encoded and multiplexed into an MPEG-2 transport stream that is placed on a data server accessible to the headend.
Some complex requirements arise even in simplified examples:
-
The boundary between data essence (i.e., the content), and meta-data (i.e., information about the content), is blurry.
-
Preproduction meta-data is required during acquisition for planning and guidance.
-
There is a need to manage the flow of essence and meta-data from acquisition devices into multiple editing and authoring tools, and possibly preview and packaging.
-
The flow of packaged material essence and meta-data into production and broadcast involves composition, integration, and all too often, format translation.
-
The meta-data must be synchronized with, and track, the essence as it is copied through a succession of physical and file-based media.
-
Packaged content may be further reused and syndicated.
-
Multiple versions of the content are required to support a wide range of distribution types.
-
Management of bindings between essence and meta-data requires careful configuration management.
The increasing capability of multimedia authoring tools to work in a networked environment is enabling changes to take place in those work flows. The traditional work flow, based around tape interchange, isolated nonlinear editing and authoring tools, and ad hoc meta-data systems, is being recast as a more integrated networked system with a consistent approach to the format and interchange of essence and meta-data. Enabling modern, more complex automated workflows, an open standards-based approach is advocated and pursued. The Advanced Authoring Format (AAF) is one such solution, with particular strengths in the film and television post-production industries; these strengths carry over well to the iTV realm [AAF-ASSOC] [AAF-SDK].
3.1.4 Transport versus Application Layers
Similar to Internet technologies, iTV technologies can be divided into transport and application layer groups; the additional network, data-link and physical layers are not within the scope of this chapter. We classify technology as belonging to the transport layer if it is concerned with the delivery (as opposed to processing) of data. Once the data is delivered, application layer technologies are used to process that data.
3.1.4.1 Transport Layer
The transport of iTV data is based on specifications published by IETF (e.g., IP), and the MPEG. Implementations of these transport technologies are subject to compliance with standards.
The IP, defined in Request For Comments (RFC) 0791, is a derivative of the Department of Defense (DoD) standard Internet protocol and is based on six earlier editions of the Advanced Research Project Agency (ARPA) IP specification dating back to the late 1970s. The IP provides for transmission of blocks of data, called datagrams, from sources to destinations, where sources and destinations are hosts identified by fixed length addresses. It allows for large chunks of data to be fragmented and reassembled for transmission through small packet networks.
MPEG is a working group of ISO/IEC in charge of the development of standards for coded representation of digital audio and video. Established in 1988, the group has produced MPEG-1, the standard on which such products as video CD and MP3 are based, MPEG-2, the standard on which such products as digital set-top boxes and DVD are based, MPEG-4, the standard for multimedia for the fixed and mobile Web, and MPEG-7, the standard for description and search of audio and visual content. Work on the new standard MPEG-21 Multimedia Framework started in June 2000 to provide an extensible and scalable multimedia packaging standard.
MPEG defines the transport stream format (used in DTV broadcast) and a program format (used in DVDs). The program format is widely used to store and play audio and video from files stored on devices such as hard disks or DVD-CDs. The MPEG transport format differs from the program format in that it is robust and allows for multiplexing. For example, with transport formats saved to a disk, it is hard to perform operations such as rewinding.
The transport protocol stack relies on the encapsulation of audio, video, and optionally, data, into a collection of MPEG packetized elementary streams, encapsulated within MPEG transport packets (see Figure 3.8). These packets could be carried over various modulations, such as DVB-PI (for CATV/SMATV headends), 8-VSB (for ATSC broadcasts), QAM (for headends), and QPSK (for IP-based CMTS routers), and over high-speed ATM IP networks such as AALn SDH and SPH.
Figure 3.8. The iTV transport protocol stack.
iTV content is a combination of video content, usually encoded in an MPEG bitstream, and some data, usually encoded with one of a variety of standard file formats such as class file formats, image formats, or other formats. iTV content differs from traditional video and Internet content in that it contains links between the video and the data.
There are several approaches to link video and data (see Figure 3.9). One approach is to provide a pointer from within the MPEG video content to the data delivered over an IP network (see Figure 3.9a); this approach was taken by ATVEF. Another approach is to deliver the data content together with the video content encapsulated as an MPEG elementary stream (see Figure 3.9b); this method was taken by DVB-MHP, ATSC-DASE, SCTE-OCAP, and MPEG-4. The two approaches can be combined (see Figure 3.9c).
Figure 3.9. The three possible approaches for enhancing TV with data.
3.1.4.2 Application Layer
The iTV application layer is essentially an environment in which iTV applications execute. As an example, a Web-browser could be regarded as an application layer environment, as it enables the execution of scripts, applets, and plug-ins. A typical iTV application is a collection of files that contain the code and data required for the application to run; the detailed definition is problematic and is discussed in Chapter 4. An execution environment is the software program considered part of the receiver's built-in software that launches the iTV application.
Many organizations and consortiums are working on Java-based (i.e., JVM-based) application execution environments, including DVB MHP, OpenCable OCAP, and ATSC DASE. Each application must specify a single entry (or root) file or class. When the execution environment receives an event indicating that a new application is available, it instantiates the entry class, which implements the JavaTV Xlet interface and invokes its methods to launch the application [JavaTV].
From the IP point of view, the application layer includes the methods by which packet transmission is utilized to implement a File Transfer Protocol (FTP) [FTP] or a request response mechanism, such as the Hyper Text Transfer Protocol (HTTP). From the iTV point of view, however, FTP and HTTP may be regarded as part of the transport, since their implementation is not the responsibility of the iTV application.
Some application layer technologies are subject to standardization whereas others are not. As a general rule of thumb, standardization compliance is required whenever a technology impacts the predictability of the execution of an iTV application authored on an environment different than the target received. The implication is that the Java APIs should be standardized and receiver support should be required. Further, some of the interaction between transport events and applications (e.g., acquisition of data from transport) should be standardized; especially complex issues arise when support for synchronization is expected.
3.1.5 Relationship between Transport and Application Technologies
Transport technologies enable encapsulating the data portion of the iTV program to be encoded by the broadcasting equipment in a fashion that can be decoded by receivers. Decoder technologies enable detecting data, extract it from the transport, and store data in the receiver's memory. Whereas the input interface to decoder technologies needs to comply with standards, the output interfaces, defining the seam between the transport and the application layer technologies, are not standardized.
As data is received, the transport layer components within a receiver notify the application layer components (see Figure 3.10). If the data received contains sufficient information to enable a launch, minimally including the entry file or class, then the application execution environment should either notify the viewer of the availability or launch the application without prompting the viewer.
Figure 3.10. The event-based relationship between transport and application layer technologies.
An MPEG-2 network is a collection of MPEG-2 transport streams, each containing one or more MPEG-2 programs, each containing one or more program elements, and each delivered by a sequence of packets. Each transport stream is identified by a transport stream ID, which must be unique within a network.