- Overview
- Broadcast Model
- Interactivity
- Data Delivery
- Authoring Content
- Packaging Content
- References
3.6 Packaging Content
Once authored, content is typically stored in a format that cannot be broadcast. A key step in the packaging process is to encode the content using MPEG encoders to produce MPEG transport streams (see Figure 3.24). Each elementary stream, namely video, audio, and data components, is encoded in a PES format. These large packets, typically kilobytes long, are further broken down into smaller packets that are 188 bytes long.
Figure 3.24. Steps of content packaging.
The conversion presented in Figure 3.24 of a video program of any format into an MPEG transport stream requires the specification of the desired bit-rate. Since the content has a known duration and a known size, the bit-rate required for encoding of that content is the total size divided by its duration. For example, a 5 MByte 10 second MPEG video would require a bandwidth of 4 Mbps.
The dominant factors directly impacting the size of the video file, and indirectly impacting the bandwidth required for its transmission, are the resolution and frame rate. The higher the resolution of each frame the more data is stored per frame and the larger the file becomes. The more frames per second are encoded the more data per second is stored. Consequently, one could reduce the bandwidth requirements by reducing the resolution, frame rate, or both.
3.6.1 Null Packet Insertion
At the physical layer, broadcast channels typically have fixed bandwidth. That is, the total number of bits delivered per second on a physical frequency spectrum is fixed. In the same way it is not possible for data to be delivered faster than the channel's bandwidth, and it is not possible for the channel to deliver no data, because receivers read bits from the communications line regardless whether these bits are meaningful or not.
In contrast, at the logical layer, e.g., at the MPEG transport layer, it is possible to transmit data at a variety of bandwidths. Stretching beyond the bandwidth of a broadcast channel is not within the scope of the MPEG standard; some extensions provide such capability. All that can be realistically considered is the case in which the bandwidth requirements are smaller than the physical bandwidth. In this case, the MPEG standard provides means to distinguish between bits that carry content and those that do not. For example, a channel in which 50% of the bits in a broadcast channel deliver content has a utilization of 50%.
The smallest data unit in an MPEG transport is a packet. Instead of specifying which bits carry data and which bits should be ignored, MPEG specifies which packets carry data and which packets should be ignored. Null packets, with a PID of 0x1FFF, do not carry data and should therefore be ignored by a decoder. The process of adding, or stuffing, null packets to a transport stream to achieve a desired target bandwidth is called null packet insertion.
Assuming that the frame rate is fixed, as the number of bits required to encode frames varies, the number of bits of carrying content for short periods (e.g., 1 ms), may vary (see Figure 3.25). As an example, assuming broadcasting over a channel with a fixed bandwidth of 6 Mbps, on average, 6 Kbits are delivered during each millisecond. Assuming 30 frames per second, this translates to a maximum of about 200 Kbits, or 25 kBs, per frame. As an example, it is not uncommon that some frames require 5 kBs while others require 20 kBs. For the 33.3 ms during which the 5 kBs frame is transported, the content is carried using a bandwidth utilization of about 1.2 Kbps, namely the null packets carry about 4.8Kbps of unused bandwidth. For the 33.3 ms during which the 20 kBs frame is transported, the content is carried using a bandwidth utilization of about 4.8 Kbps, namely the null packets carry about 1.2 Kbps of unused bandwidth.
Figure 3.25. The notion of varying bit-rate.
Since the maximum physical bandwidth cannot be surpassed and the number of bytes required for each frame is not fixed, it is not possible to achieve 100% utilization of the transport channel. A good utilization can be achieved by encoding the data such that the largest frames maximize the bandwidth, and at the same time, the variation of frame sizes is reduced.
Since bandwidths are not typically fully utilized, it is possible to utilize the unused portion for carriage of data. The replacement of null packets with packets delivering content (which may not be related to the video) is called opportunistic insertion.
3.6.2 Multiplexing Data with Video and Audio
Typically, a single broadcast stream is a multiplex of packets carrying various content types (see Figure 3.26). Data can be transported using packets in the same way that audio and video content is transported in packets. Therefore, broadcasting data files requires converting them into a packetized format that could be multiplexed and streamed in MPEG transports. The MPEG standard specifies the Data and Object carousel formats for that purpose.
Figure 3.26. Multiplexing data with video and audio.
As opposed to video and audio streams, data files are bounded in length. Typically, file can only be processed once fully loaded by an application. Receivers that are not tuned to a channel at the time the file's transmission starts cannot load the file. Because it is not possible to control at what time during the broadcast viewers tune to the channel carrying the data files, there is a need to repeatedly transmit the file's data. As an example, Figure 3.26 illustrates repeating a group of packets delivering data1 twice.
Multiplexers typically receive input from various content packetizers, which in turn take their input from various streamers (see Figure 3.27). Video and audio streams are encapsulated within MPEG transport stream packets. Data services are streamed out of a data server and encapsulated as DSM-CC data or object carousel streams in MPEG transport stream packets. The multiplexer stores the packets it receives in a buffer until they are transmitted. A CA module generates packets carrying the information used by receivers to decipher the data on reception. The bandwidth allocation is controlled by packet schedulers or otherwise emission managers, which signal to the multiplexer information that can be used to determine which buffered packet to place on the output. Once a packet is placed on the output, it is removed from the buffer.
Figure 3.27. A simplified multiplexing and emission diagram.
Typically, multiplexing is parameterized by bandwidth allocations for each input elementary stream, as well as by the output bit-rate. On initialization, the multiplexer preloads data into the temporary buffer, whose size is sufficient to store anywhere from a few milliseconds to several seconds of data. Packets from all packetizers enter the buffer with each elementary stream's bit rate. Intuitively, the multiplexer orders the packets to be sent out so that every packet in the buffer is guaranteed to be transmitted before the buffer is emptied. Because in practice round-robin buffers are used, the buffer is never emptied. Instead, this guarantee is for the duration of a buffer round.
The data server is a complex component that may not reside in the same organization as the multiplexer and components. The data server often contains a Web application server and a database server (see Figure 3.28). In addition to media streaming, the organization controlling the data server may be responsible for the operation of the data service, including the aspects of billing, collecting usage statistics, and providing customer service.
Figure 3.28. The iTV Server architecture used by Microsoft TV.
The use of a buffer introduces a delay between the time that packets enter the multiplexer and the time multiplexed packets exist. If a buffer can accommodate all packets transmitted during 1 second, then there can be up to a 1 second delay between input and output. Cascading multiplexers causes the delay to accumulate linearly. As a result, the same content traveling in different transmission paths may have different delays. In practice, the total accumulated delays could reach 3 seconds. For example, there might be a 3 second delay between the signal received from a TV antenna and the same signal received through the cable set-top box.
Increasing the size of the buffer increases the delay and, at the same time, increases the average accuracy of the bandwidth allocation. The minimum buffer size depends on the bandwidth allocation accuracy required, the bit rates to be supported, and the computational power of the multiplexer. For example, assuming a 19.2 Mbps HDTV channel, every millisecond about 13 packets (of 188 bytes each) need to be processed, scheduled for output, and taken out of the buffer. If the bandwidth requirements are described using units of 1/13 then in this example it is possible to use a small buffer of 1 millisecond. For example, we could allocate 10 out of 13 packets (about 77% or ~14.77 Mbps) to video, 1 out of 13 packets (about 7.7% or ~1.47 Mbps) to a dozen of high-fidelity audio streams, and 2 out of 13 packets (about 15.4% or ~2.95 Mbps) to data streams. If, instead, we need to allocate only 1% of the bandwidth to audio, then we cannot use units of 1/13; the minimum buffer size to achieve 1% accuracy in bandwidth allocation requires storing 100 packets, namely 18800 bytes of data. At a bit-rate of 19.2 Mbps, this translates to a delay of about 8 milliseconds, which is about 1/4 of a video frame. To improve the quality of the multiplexing, each of the elementary streams may already be padded with null-packets to achieve a fixed bit-rate that is used in the multiplexing.
The allocation of bandwidth drastically impacts viewer experience. Table 3.4 lists the standard ATSC data service profiles; opportunistic insertion implies insertion of data into null packets already present in the transport. Figure 3.29 illustrates how bandwidth allocation could be done for ATSC broadcasts. MPEG PSI would point to a small 4 kB Data Service Table (DST) that is transmitted using about a third of G1 and repeated about twice per second (i.e., receiver acquires it within at most 0.5 second of tuning). That DST points to the Initial Working Set (IWS) of all applications, which in total should not exceed 96 kBs to be transmitted using three (actually 2.66) G1 with an approximate repeat rate of 3 seconds (i.e., the application could launch within at most 3 seconds after tuning). The IWS contains code that loads the bulk of the application's data, which is assumed to be an average of at most 2.8125 MBs per data service repeating every 30 seconds; assuming 4 applications are delivered and may share data, one could use 2 MBs per application plus 3.25 MBs of shared data (i.e., 2 + 3.25/4 = 2.8125 MBs). Additional large data, such as video clips, must be delivered out of transport, either through a dedicated channel or over the Internet.
Figure 3.29. An example of bandwidth allocation scheme.
Table 3.4. ATSC Data Service Profiles
Profile Attribute |
Profile G1 |
Profile G2 |
Profile G3 |
Opportunistic A1 |
---|---|---|---|---|
Max terrestial data rate |
383,896 bps |
3,838,960 bps |
~19.2 Mbps |
up to ~19.2 Mbps |
Smoothing buffer size |
4500 bytes |
4500 bytes |
10000 bytes |
10000 bytes |
3.6.3 Signaling Insertion
Although the video, audio, and data streams coexist in the transport, they may be unrelated or otherwise not linked. To specify that a certain group of data files is associated with specific video and audio, there is a need to insert packets carrying the data specifying the binding, or links, between the stream. The insertion of such packets carrying these bindings is called signaling.
The MPEG standard provides a signaling mechanism called the PSI. The PSI is delivered using the PAT, PMT, and Conditional Access Table (CAT) described earlier. Each instance of these tables may be transported in multiple packets. To bind the various video, audio, and data streams that comprise a program, the PMT specifies one entry for each stream.
The construction of a PAT, PMT, and CAT is performed when the content is converted into a transport stream format. Because there are strict timing requirements on the repeat frequency of the PMT, multiplexers need to be careful to place the packets carrying the PMT such that the number of packets between the end of one occurrence and the beginning of the next does not exceed the requirement.
When inserting data into an MPEG transport, due to the potential complexity of data streams, in addition to the use of the PMT, a dedicated data binding table is needed. The DVB MHP uses the Application Information Table (AIT) to signal the binding of applications to the video and audio streams. The ATSC DASE uses the DST of the A90 Data Broadcast Standard.
The ATSC Data Broadcast Standard, for example, introduces the Service Description Framework (SDF), bridging between program components delivered in a broadcast and those components accessible through the Internet. The bridging is performed by introducing an entry in the PMT pointing to a DST which points to the data elementary streams delivered within the MPEG transports. In addition, a Network Resource Table (NRT), embedded within the MPEG transport, is used to specify URLs of resources that are used in conjunction with the video of the program. The ATSC A90 specification explains how the SDF is used for the discovery of program elements and the binding of iTV applications to these program elements.
The specification describes the additions to the ATSC Program and System Information Protocol (PSIP) standard to enable announcement of data services within the existing EPG mechanisms.
Virtual channels are associated with data services through PSIP as well. Each virtual channel in the VCT points to a program number, which uniquely identifies a PMT pointing to the SDF of the data services for virtual channel. This implies a model in which each virtual channel is associated with a collection of data elementary streams, in addition to the audio and video streams.
3.6.4 Announcement Insertion
To enable live broadcast of EPG, iTV broadcasting standards provide various mechanisms for inserting EPG data streams announcing programs before they are aired. This announcement is performed by inserting MPEG tables that are defined by each announcement standard.
For example, the ATSC PSIP standard is used in North America to transport program guide information (see Chapter 13 for details). PSIP is a collection of hierarchically arranged tables for describing system information and program guide data. The following tables are used:
-
The MGT defining the type, PIDs, and versions for all the other PSIP tables in this Transport Stream, except for the STT.
-
The System Time Table (STT), defining the current date and time of day, which is expected to be accurate within seconds. The STT enables receivers to perform time-related operations, such as displaying the show time, or starting to record a program; as an example, cell phones use a similar concept to determine the local time.
-
The VCT carried within the ATSC MPEG-2 transport stream, tabulate virtual channel attributes required for navigation and tuning. All virtual channels are listed in the VCT and are associated with a service location descriptor. The VCT exists in two versions: one for terrestrial and a second one for cable applications. The terrestrial and cable versions are similar in structure, with the latter redefining the semantics of some fields pertinent to cable operations.
-
The Rating Region Table (RRT) defining the TV parental guideline system referenced by any content advisory descriptor carried within the transport stream. Different ratings tables are valid for different regions or countries; they are often mandated by law.
-
The Event Information Tables (EIT) are used to announce upcoming and currently airing programs. The first four EITs (EIT-0, EIT-1, EIT-2, and EIT-3) describe 12 hours of events (TV programs), each with a coverage of 3 hours.
-
An Extended Text Table (ETT) is associated with each event to provide a description of that event.
3.6.5 Security Considerations
Participants of the iTV food chain must give consideration to their threat models. The security of specific paths through the food chain depends heavily on the security and integrity of its operating procedures, its personnel, and the administrative enforcement of those procedures. Some believe the design of a security framework needs to be driven by the attacks anticipated. Table 3.5 lists some of the threats that a security framework should address. Table 3.6 lists some of the security enhancement strategies that could be used. See Chapter 10 for more details. Table 3.7 lists security architecture patterns and services addressing threats.
Table 3.5. Threats to Consider
Threat |
Description |
---|---|
Masquerade |
In a masquerade attack, the attacker impersonates a trusted entity. The attacker may gain access to sensitive data and perform privileged operations. |
Modification |
In a modification attack bad content may be injected during its transport from one entity or organization to another. |
Repudiation |
In a repudiation attack, the attackers succeed in denying access to resources required by an application. |
Unauthorized operation |
The most common attack of this kind is purging of data. Redundancy of storage and conservative update policies could reduce the risk of data loss. |
Interception |
Sensitive data, such as credit card information, can be intercepted by adversaries without changing its route. |
Denial of Service |
Attackers may cause a server to be flooded with traffic beyond its capability to process this traffic. |
MisRouting |
Attackers may take control of data switches and servers for the purpose of misrouting the data. |
Table 3.6. Security Strategies
Issue |
Description |
---|---|
Know who to “trust” |
Because broadcasters are fully accountable for the content they air, they must know who prepared the content to have some degree of confidence that the terms of the contract with the distributor or producer are met. |
Mind your own “business” |
It should be possible to perform all security services on relevant portions of the content without touching irrelevant portions. |
Only what is signed is secure |
Signatures over updated content do not secure any information discarded by the update; only what is signed is secure. |
Only what is “seen” should be signed |
A signature secures any information introduced by enhancements to content, including both visible and invisible enhancements. |
“See” what is signed |
Food-chain participants should only have access to, and be able to manipulate, content that was updated and signed. |
Avoid unnecessary encryption |
An encrypted data file may be archived together with other files, and the entire archive may be encrypted; the damage could be major failure to access critical data. |
Secure the updates |
One should be extremely careful about potential weaknesses introduced between the original and temporary non-secure version of the data. |
Know who “has” which keys |
With public key signatures, a very large number of parties can hold a variety of keys. |
Algorithms, key lengths, certificates |
The strength of a particular signature depends on all links in the security chain. |
Table 3.7. Security Architecture Patterns and Services Addressing Threats.
Service |
Addressed Threats |
---|---|
Authentication of Client and Server Operation |
Masquerade, Unauthorized Operation |
Electronic Signatures |
Masquerade, Modification |
Virtual Private Network (VPN) |
Masquerade, Unauthorized Operation, Interception |
Firewall and proxy services |
Interception, Repudiation |
Encryption |
Interception, Unauthorized Operation |
Dynamic IP addressing, Dynamic DNS |
MisRouting, Denial of Service |
Network monitoring |
Denial of Service, Repudiation |
3.6.6 Conclusion
iTV food chain is long and complex. Due to the complexity of iTV, participants of the food chain often have to consider various new technical issues that greatly impact the viewer experience of the integrated iTV program. This includes envisioning the types of interactivity that is desired, the interactivity model (i.e., local vs. remote), data integration points (e.g., during scripting, editing, or production), the technologies to be used (e.g., Java vs. HTML), the delivery model (i.e., push vs. pull), protocols and bandwidth (i.e., network infrastructure), the target receivers (i.e., receiver requirements), security considerations (if any), EPG announcement options, and asset management model (e.g., AAF). Whereas traditional TV food chain has had to address many of these issues in isolation, the introduction of interactivity often requires considering the interaction between these issues. For example, the use of remote interactivity raises the issue of hot spots, as millions of receivers may request simultaneous access to the same data (e.g., ad Web page). These interaction are often critical in determining the viability of an iTV program.