The Moving Picture Experts Group
Leonardo Chiariglione – Convenor, ISO/IEC JTC 1/SC 29/WG 11 (MPEG)
Table of Contents
1 |
|
2 |
|
3 |
|
4 |
|
5 |
|
6 |
|
7 |
|
8 |
|
9 |
|
10 |
|
11 |
|
12 |
|
13 |
|
14 |
|
15 |
|
16 |
|
17 |
|
18 |
|
19 |
H. Nyquist [1] and W. R. Bennett [2] laid the foundations of digital signal processing, the former by establishing the conditions for statistical equivalence between time-continuous and sampled signals, and the latter by setting statistical bounds to errors for quantised (so-called Pulse Code Modulation or PCM) signals, i.e. converted to a form suitable for handling by digital computing machines.
If analogue signals of primary interest to humans – audio and video – are converted to digital according to Nyquist’s and Bennett’s precepts (a process that will be henceforth called “digitisation”), very high bitrate PCM signals are obtained. Although “high” is a reflection of the technological times (1.41 Mbits/s was exceedingly “high” in the early days of the internet thus prompting users to adopt the highly efficient MP3 compression format, see later), 216 Mbit/s of digital television is unmanageable even today in most open environments. This obstacle, along with the advantages to be gained by overcoming it, led to the creation of a new field of study: reduction of the bitrate of digitised audio and video signals, if possible without distortion, otherwise with a controlled distortion.
The first target application was in the speech area because of the drive started in the 1960’s to digitise the telecommunication networks and because telephone speech is from the beginning bound within the frequency spectrum of 0.3 to 3.4 kHz and therefore yields a rather reduced bitrate. Sampling at 8 kHz and 8 bits (companded, i.e. non-linearly quantised) gives 64 kbit/s, as enshrined in International Telecommunication Union, Telecommunication Standardisation Sector (ITU-T) Recommendation G.711 [3]).
Various algorithms have been employed to compress speech signals. The most straightforward algorithms – DPCM (i.e. differential PCM) – were not particularly successful because of their reduced capability to compress down to 32 kbit/s – generally not enough to justify adoption of the technology in the network.
Digital video took longer to surface because the bitrate resulting from digitisation was 3 orders of magnitude larger. Still ITU-T Recommendation H.120 applied DPCM to contiguous video samples within a video frame (hence called “intraframe coding”) and achieved further reduction by exploiting correlation between contiguous frames (hence called “interframe coding”) to a subsampled version of TV signals for videoconference. Thus the input bitrate of about 40 Mbit/s could be reduced down to 1.5/2 Mbit/s. This system, too, was not particularly successful because the bitrate was still too high and the compression/decompression equipment too expensive.
In the 1980s many were working on video and audio coding. Nippon Hoso Kyokai (NHK) developed and deployed an innovative hybrid (analogue/digital) HDTV transmission system called MUSE that led the Europeans to devise their own solution called HD-MAC, ITU-T developed a new video compression Recommendation H.261 that applied intraframe Discrete Cosine Transform (DCT) coding with motion-compensated interframe prediction, RAI-Telettra and General Instrument developed and manufactured HDTV codecs at bitrates that were thought to be unachievable until then, Philips and RCA developed and manufactured systems for interactive video on compact disc (CD) called respectively CD-i and DVI, another branch of the ITU-T called CMTT studied a so called “contribution” (i.e. “between studios”) codec and a group of European companies and institutions developed the Digital Audio Broadcasting (DAB) system specifications within the Eureka project EU 147 DAB.
One might have thought that a buoyant competitive market should have been left free to produce its own results.
Instead MPEG was established as a working group of the International Organisation for Standardisation (ISO) with the idea that the only way for digital audio and video to succeed, in a relatively short time, was based on a reference standard without the myriad technological barriers that had been imposed on analogue audio and video. The right time for that standard was toward the end of the 1980s because video and audio compression performance and VLSI implementability were heading for their first intersection sometime in the early 1990s.
Interactive audio and video on CD was thought to be the first business case for the standard that was eventually called MPEG-1 [5]. The standard is organised in five parts
Part 1 |
Systems |
Part 2 |
Video |
Part 3 |
Audio |
Part 4 |
Conformance testing |
Part 5 |
Software simulation |
Systems (defined in part 1 of the standard) is a packet-based multiplexer that can carry m video streams and n audio streams, all with the same time base. The stream carries timing information so that the receiving device can reconstruct a faithful replica – within the accuracy enabled by the standard – of the information generated at the encoder.
Video (defined in part 2 of the standard) provides a powerful compression technique based on the following assumptions:
MPEG-1 Video is a generic algorithm that can work with any parameter set. As this does not give enough guidance to build interoperable devices, MPEG-1 defines a Constrained Parameter Set satisfying the following conditions
M |
≤768 |
N |
≤576 |
#macroblocks/picture |
≤396 (352x288/256)) |
#macroblocks/second |
≤9900 (396x25) |
Picture rate |
≤30 Hz |
Interpolated pictures |
≤2 |
Bitrate |
≤1856 kbit/s |
Audio (defined in part 3 of the standard) includes three compatible versions called “layers” where
A “layer n” decoder is capable of decoding bistreams of lower layers but not higher layers.
A reference MPEG-1 diagram is given in Figure 1.
Figure 1 – MPEG-1 reference diagram
“MPEG-1 stream decoder” is specified by Part 1, “Video decoder” is specified by Part 2 and “Audio decoder” is specified by Part 3.
Specifically, MPEG-1 standardises syntax and semantics of the bitstream. In addition, only the decoding process is subject to the standard, while the process and decoder internal data representation is non-normative.
Additionally MPEG-1 has innovated the landscape of standards by providing
Performance of MPEG-1 Audio, as tested in the early 1990s is transparency at 384 kbit/s (Layer I), at 256 kbit/s (Layer II) and at 192 kbit/s (Layer III) where “Transparency” means that experts (so-called golden ears) are statistically unable to distinguish the original PCM stereo sound sampled at 48 kHz with 16 bits/sample from the coded version.
Early on, MPEG saw the benefit of developing a software implementation of the standard. Therefore Part 4 of the MPEG-1 standard is called “Conformance”. It provides the means to check that an instance of a decoder and that an instance of a bitstream conform to the standard.
Part 5 of MPEG-1 “Reference Software” contains the C implementation of encoders and decoders. It is to be noted that encoders are not optimised (in quality and real-time performance). However, they generate/are capable of handling conforming bitstreams. Some commercial implementations have reportedly been derived from part 5 of MPEG-1
MPEG-2 [6] was designed to be the standard enabling the digital transformation of the analogue television system designed half a century before. It is set of 10 standards
Part 1 |
Systems |
Part 2 |
Video |
Part 3 |
Audio |
Part 4 |
Conformance testing |
Part 5 |
Software simulation |
Part 6 |
System extensions - DSM-CC |
Part 7 |
Advanced Audio Coding |
Part 8 |
VOID |
Part 9 |
System extension RTI |
Part 10 |
Conformance extension - DSM-CC |
Part 11 |
IPMP on MPEG-2 Systems |
Systems defines an entity called Packetised Elementary Stream (PES). This is a compressed stream combined with system level information and packetised for use in two types of MPEG-2 Systems streams
Video contains
MPEG-2 Systems and Video were developed jointly with the ITU-T with the acronyms H.222 and H.262, respectively.
MPEG-2 Audio provides a multichannel-compatible extension of MPEG-1/Audio in the sense that it is
The standard also contains technology to extend the stereo compression features of MPEG-1 Audio. Unfortunately the backward compatibility of MPEG-2 Audio with MPEG-1 Audio limits its performance.
To overcome this limitation MPEG developed part 7 Advanced Audio Coding (AAC) to provide a multichannel solution without backward compatibility of Part 2. This employs a new algorithm to encode multichannel audio, providing improved performance, that materialises as transparency at 128 kbit/s. The coding gain is achieved through redundancy removal by means of high-resolution transform, coefficient quantisation and perceived noise reduction by using a model of the human auditory system and entropy coding.
In addition to Conformance and Reference Software (parts 4 and 5, respectively), MPEG-2 also includes part 6 with the title Digital Storage Media Command and Control (DSM-CC) for device-to-device and device-to-network interaction and other standards.
Figure 2 illustrates the main components of the standard
Figure 2 – MPEG-2 reference diagram
“MPEG-2 stream decoder” is specified by Part 1, “Video decoder” by Part 2, “Audio decoder” by Part 3 and “Interaction” by Part 6.
MPEG-4 [7] started as a standard for very low bitrate audio-visual coding, e.g. 10 kbit/s. Eventually MPEG-4 became that and a rather long list of other digital media technologies, some of which are
MPEG-4 comprises 25 parts, some of which are still under development
Part 1 |
Systems |
Part 2 |
Visual |
Part 3 |
Audio |
Part 4 |
Conformance testing |
Part 5 |
Reference Software |
Part 6 |
Delivery Multimedia Integration Framework |
Part 7 |
Optimised software for MPEG-4 tools |
Part 8 |
4 on IP framework |
Part 9 |
Reference Hardware Description |
Part 10 |
Advanced Video Coding |
Part 11 |
Scene Description and Application Engine |
Part 12 |
ISO Base Media File Format |
Part 13 |
IPMP Extensions |
Part 14 |
MP4 File Format |
Part 15 |
AVC File Format |
Part 16 |
Animation Framework eXtension (AFX) |
Part 17 |
Streaming Text Format |
Part 18 |
Font compression and streaming |
Part 19 |
Synthesized Texture Stream |
Part 20 |
Lightweight Application Scene Representation |
Part 21 |
MPEG-J Extension for rendering |
Part 22 |
Open Font Format |
Part 23 |
Symbolic Music Representation |
Part 24 |
Audio-System interaction |
Part 25 |
3D Graphics Compression Model |
Systems (part 1) provides the architecture of the standard and roughly corresponds to the Systems parts of the MPEG-1 and MPEG-2 standards.
Visual (part 2) contains a large number of video coding tools that are employed in two very popular profiles: Simple Profile (SP) and Advanced Simple Profile (ASP).
In 2001, MPEG teamed with the Video Coding Experts Group of the ITU-T and established a Joint Video Team (JVT) which developed a new generation video codec called Advanced Video Coding (AVC) as part 10 of MPEG-4. AVC has roughly twice the compression capability of MPEG-2 and MPEG-4. Subsequently AVC was extended with scalability functions yielding Scalable Video Coding (SVC). Currently AVC is being further extended with Multiview Video Coding (MVC) capabilities.
Audio contains a large set of coding tools through which it is possible to construct several audio and speech coding algorithms
In addition to the usual Conformance and Reference Software (parts 4 and 5, respectively), MPEG-4 also includes Part 7 “Optimised software for MPEG-4 tools” that provides examples of reference software that not just implement the standard correctly but also in optimised form, and Part 9 “Reference Hardware Description” where the reference software is in VHSIC Hardware Description Language (VHDL) for synthesis of VLSI chips.
Part 6 “Delivery Multimedia Integration Framework” (DMIF) provides a standard interface to access various transport mechanisms.
Part 8 “4 on IP framework” complements the generic MPEG-4 RTP payload defined by IETF as RFC 3640 [8].
MPEG 1 and MPEG-2 assume that information in decoded form leaves the decoder as sequences of PCM samples but the standards are silent on what is done with them. MPEG-4 Scene Description (part 11) provides technologies for the new functionality of “composing” different information elements in a “scene”.
The original technology is called Binary Format for MPEG-4 Scenes (BIFS) of which there exists a Java powered version called MPEG-J. A newer technology with similar functionalities is provided by Part 20 “Lightweight Application Scene Representation” (LASeR).
MPEG-4 provides standard solutions for coding of synthetic visual information for 3D graphics. These tools are specified in Part 2 - Face and Body Animation and 3D Mesh Compression, Part 11 - Interpolator Compression - and 16 - a complete framework, called Animation Framework eXtension (AFX), for efficiently coding the shape, texture and animation of interactive synthetic 3D objects. AFX attempts to unify MPEG-4’s tools related to 3D graphics.
An important component of AFX is 3D Mesh Coding to provide efficient encoding of 3-D polygonal meshes with
AFX introduces as well an advanced animation model for articulated models, a hierarchical representation of urban environments and several modern coding tools for 3D data.
Part 25 “3D Graphics Compression Model” specifies an architectural model able to accommodate third-party eXtensible Markup Language (XML) based description of scene graphs and graphics primitives with (potential) binarisation tools and with MPEG-4 3D Graphics Compression tools.
Synthetic Audio, called “Structured Audio”, is included in part 3. It provides the means to code sound using structured descriptions that are interpreted by a Structured Audio decoder to perform music and sound-effect synthesis. The Structured Audio Tools are: Structured Audio Orchestra Language (SAOL) providing synthesis methods, Structured Audio Score Language (SASL/MIDI) providing control parameters and Structured Audio Sample Bank Format (SASBF) providing the actual sample data.
The ISO Base Media File Format (part 12 of MPEG-4) is designed to contain timed media information for a presentation in a flexible, extensible format that facilitates interchange, management, editing, and presentation of the media. These may be ‘local’ to the system containing the presentation, or may be via a network or other stream delivery mechanism. Part 14 “MP4 File Format” extends the File Format to cover the needs of MPEG-4 scenes while part 15 “AVC File Format” supports the storage of AVC and MVC bitstreams.
The Streaming Text Format (part 17 of MPEG-4) defines text streams that are capable of carrying Third Generation Partnership Program (3GPP) Timed Text (specified in 3GPP TS 26.245). To transport the text streams, a flexible framing structure is specified that can be adapted to the various transport layers, such as RTP/UDP/IP and MPEG-2 Transport and Program Stream, for use in media such as broadcast and optical discs.
Among the remaining MPEG-4 technologies it is worth mentioning the Open Font Format (part 22). MPEG received a request from rights holders to convert the widely adopted OpenType specification to an ISO standard. As is the rule with MPEG standards, the OpenType specification was converted to a Working Draft and then balloted through the ISO-specified process of Committee Draft (CD), Final Committee Draft (FCD) and Final Draft International Standard (FDIS) stages.
The figure below provides a conceptual diagram of the structure of an MPEG-4 decoder with the role played by the main MPEG-4 technologies.
Figure 3 – MPEG-4 reference diagram
With reference to the figure the parts of the MPEG-4 standard specify the blocks as follows:
With MPEG-7 [9] MPEG made a kind of departure from its previous audio and video compression standards because it addressed the issue of “describing features of multimedia content”.
MPEG-7 provides the world’s most comprehensive set of audio-visual description tools, namely
MPEG-7 is organised in 12 parts and is still structured in a way that reminds one of the earlier MPEG standards.
Part 1 |
Systems |
Part 2 |
Description Definition Language |
Part 3 |
Visual |
Part 4 |
Audio |
Part 5 |
Multimedia Description Schemes |
Part 6 |
Reference Software |
Part 7 |
Conformance |
Part 8 |
Extraction and Use of MPEG-7 Descriptions |
Part 9 |
Profiles |
Part 10 |
Schema definition |
Part 11 |
Profile schemas |
Part 12 |
Query Format |
Systems (part 1) specifies the means for binarising DDL data, a methodology for carrying descriptions as streams and the means for accessing and synchronously consuming data.
Description Definition Language (part 2) standardises a language to specify Description Schemes and Descriptors derived from XML Schema to express relations, object orientation, composition, partial instantiation, etc.
Visual (part 3) offers a broad range of visual descriptors
Audio (part 4) offers a broad range of audio descriptors
Part 5 “Multimedia Description Schemes” (MDS) defines elements (Ds and DSs) that are generic (neither purely visual nor purely audio). This is a summary list
Part 12 “Query Format” specifies the interface between a requester and a responder for multimedia content retrieval systems (e.g.: MPEG-7 databases). This enables users to describe their search criteria with a set of precise input parameters and additionally allows users to specify a set of preferred output parameters to depict the returned result sets.
In 1999, much before the Web 2.0 hype, MPEG started a project driven by the vision of
a future where every human on the Earth is potentially an element of a network involving billions of content providers, value adders, packagers, service providers, resellers, consumers etc.
While many technologies were already available, it was clear that to make this future real there was a need for an infrastructure enabling electronic commerce of digital content.
At the basis of this project, soon called MPEG-21 [10], there are two key concepts:
MPEG-21 is a collection of seventeen standards whose integration enables Users to perform all functions on Digital Items that enable the realisation of the vision described above.
Part 1 |
Vision, Technologies and Strategy |
Part 2 |
Digital Item Declaration |
Part 3 |
Digital Item Identification and Description |
Part 4 |
IPMP Components |
Part 5 |
Rights Expression Language |
Part 6 |
Rights Data Dictionary |
Part 7 |
Digital Item Adaptation |
Part 8 |
Reference Software |
Part 9 |
File Format |
Part 10 |
Digital Item Processing |
Part 11 |
Evaluation Tools for Persistent Association |
Part 12 |
Test Bed for MPEG-21 Resource Delivery |
Part 13 |
VOID |
Part 14 |
Conformance |
Part 15 |
Event reporting |
Part 16 |
Binary format |
Part 17 |
Fragment Identification |
Part 18 |
Digital Item Streaming |
Part 19 |
Media Value Chain Ontology |
Part 1 Vision, Technologies and Strategy is a Technical Report, and lays down the scope and development plan of the project.
The foundational element of MPEG-21 is the definition of a structure that can flexibly accommodate the many components of a multimedia object. This includes, of course, the resources (media), but also identifiers, metadata, encryption keys, licenses etc. The specification of this structure is provided by Part 2 Digital Item Declaration (DID).
Identification of Digital Items is a key requirement in the digital space where everything must be uniquely and unambiguously identified in order to be managed. In MPEG-21 this function is provided by Part 3 Digital Item Identification (DII), a standard to handle identifiers in Digital Items.
A Digital Item can contain resources or even portions of a Digital Item that are protected. The component technologies that are needed to process those resources (i.e. to make them available in a form that can be processed by a machine) need to be standardised. This is done by Part 4 Intellectual Property Management and Protection (IPMP) Components. IPMP is the MPEG acronym for DRM and stands for Intellectual Property Management and Protection.
In the digital space, licenses play a similar role to licenses in the real world. The difference is that real world licences are expressed in natural language and are understood by humans, while the former must be expressed in a form that can be processed by a machine. Part 5 Rights Expression Language (REL) provides the technology to express rights in a rich form that is comparable to the richness of the human language.
The language mentioned above is only capable of expressing the syntax of a rights expression but says nothing of the semantics of the “verbs”, e.g. copy, store, display etc., that are employed by the language (even though the MPEG REL provides the semantics of a few key verbs). A standard semantics for verbs commonly used in the media environment in general is given by Part 6 Rights Data Dictionary (RDD).
When a Digital Item and its resources are transported over the network it may be necessary to “adapt” (e.g. reduce in bitrate) them to varying conditions. When a Digital Item and its resources reach a device, the resources may need to be “adapted” (e.g. subsampled) to match (e.g., device capabilities). Part 7 Digital Item Adaptation (DIA) specifies the syntax and semantics of the tools that may be used to assist in the adaptation of Digital Items, metadata and resources.
As for most other MPEG standards, MPEG-21 has a reference software implementation. This is provided by Part 8 Reference Software.
A Digital Item is an XML structure that can be moved from one device to another “as is”. However, it may be convenient to use a standard file format because in this case a device knows, by virtue of the definition of the file format itself, where specific Digital Item structures can be found. This is provided by Part 9 File Format.
A Digital Item is a static XML structure that contains all elements necessary to describe the resources contained in it, e.g. description of content, DRM information, etc. However, a Digital Item does not natively provide a way for a Digital Item creator to suggest how a user can interact with the Digital Item. Providing this additional information is the scope of Part 10 Digital Item Processing (DIP).
It is possible to establish associations – called Persistent Association Technologies (PAT) in MPEG-21 – between resources and certain metadata related to the resource using such technologies as “watermarking” and “fingerprinting”. As it is probably not necessary, and certainly premature at this stage, to standardise these association methods, Part 11 Evaluation Tools for Persistent Association provides the means to evaluate the performance of a given PAT to see how well it fulfils the requirements of the intended application. This, however, is a Technical Report, i.e. it is a simply guide to users.
A software test bed has been developed to enable experimentation with different means of resource delivery. The software is provided by Part 12 Test Bed for MPEG-21 Resource Delivery. This, however, is a Technical Report, i.e. it is simply a tool to help users experiment.
Conformance of an implementation is of course needed for MPEG-21 technologies as well. The purpose of Part 14 Conformance is to provide the necessary test methodologies and suites to be used to assess the conformity of a bitstream (typically an XML document) and a decoder (typically a parser) to the relevant MPEG-21 standard.
Certain application domains require a technology that can generate an event every time an action specified in the “Event Report Request” (ERR) contained in a Digital Item is made on a resource. The technology achieving this is specified in Part 15 Event Reporting (ER).
In MPEG-7 Systems MPEG had standardised a technology that allows the lossless conversion of a typically very bulky XML document to a binary format, preserving the ability to efficiently parse the binarised XML format. That technology has now been moved to MPEG-B Part 1 “Binary MPEG format for XML” (BiM). Now MPEG-7 Part 1 Systems and MPEG-21 Part 16 Binary format essentially reference the BiM technology specified in MPEG-B Part 1.
There are cases where it is necessary to identify a specific fragment of a resource as opposed to the entire set of data. Part 17 Fragment Identification (FID) specifies a normative syntax for URI Fragment Identifiers to be used for addressing parts of a resource from a number of Internet Media Types.
While part 9 provides a solution to transport a Digital Item in a file, Digital Items may also be transported over a streaming mechanism (e.g. in broadcasting or over IP networks). Therefore part 18 Digital Item Streaming (DIS) provides the technology to achieve this when the streaming mechanism employed is MPEG-2 Transport Stream and RTP/UDP/IP.
Part 19 Media Value Chain Ontology provides a standard representation of the terms in a vocabulary and their corresponding relationships for use in media value chains. An example is personal and commercial movies that include not only the movie itself but also related information like movie producer, movie owner, rights and limitations to modify the movie, as well as personal notes available to a certain user group.
As clear from the above list, MPEG has produced many component standards. However, technology integration has been left to implementers. The result has been that, e.g. ATSC uses MPEG-2 Systems and Video but a different Audio than specified by MPEG, and DivX uses MPEG-4 Visual, MP3 and AVI.
It is obviously within the scope of implementers to make such decisions, however this has shortcomings. It may take a long time to go from an MPEG standard to a product, while gratuitous incompatibilities between different implementations that often trouble end users may could be avoided with more careful choices.
With MPEG-A [11] MPEG has decided to engage in the area of “standard integration” considering that MPEG has (most of) the technologies needed, the internal expertise to do the integration job and the appropriate industry representation.
An interesting side-effect of the integration effort is that, while doing the integration, MPEG may discover (and actually has discovered) that not all components are there.
MPEG-A is still in full development (several parts are still to be completed). It currently comprises twelve parts.
Part 1 |
Purpose for Multimedia Application Formats |
Part 2 |
Music Player Application Format |
Part 3 |
Photo Player Application Format |
Part 4 |
Musical Slide Show Application Format |
Part 5 |
Media Streaming Application Format |
Part 6 |
Professional Archival Application Format |
Part 7 |
Open Access Application Format |
Part 8 |
Portable Video Application Format |
Part 9 |
Digital Multimedia Broadcasting Application Format |
Part 10 |
Video Surveillance Application Format |
Part 11 |
Video Stereoscopic Application Format |
Part 12 |
Interactive Music Player Application Format |
Part 1 Vision, Technologies and Strategy is a Technical Report, and lays down the scope and development plan of the project.
Part 2 “Music Player Application Format” has the purpose of enabling users to achieve an augmented experience of their sound resources by providing an “extended MP3 format”. This is achieved by adding more information in the now-ubiquitous MPEG File Format, namely MP3 Audio compression, MPEG-4/MPEG-21 File Format, an ID3 subset as MPEG-7 metadata and JPEG still picture compression
Part 3 “Photo Player Application Format” has the purpose of enabling users to achieve an augmented experience of their photo resources by adding more information to the ubiquitous JPEG File Format, namely
The Music Player Application Format was designed as a simple format for enhanced MP3 players and the Photo Player Application Format combines JPEG still images with MPEG-7 metadata. Part 4 “Musical Slideshow Application Format” builds on top of the Music Player and the Photo Player Application Formats and is a superset of these two MAFs.
Part 5 “Media Streaming Application Format” specifies how to use specific MPEG technologies to build a full-fledged media player for streaming governed content. However, in order to have a complete media streaming set-up, it is necessary to deploy a number of devices: a Content Provider Device containing the Digital Items and the actual resources, a License Provider Device containing the associated licences, an IPMP Tool Provider Device that end user devices can access to get any IPMP Tools needed to make the resources usable, a Domain Management Device that handles sets of devices and users and a Media Streaming Player. The standard specifies the data formats and the protocols exchanged between a Media Streaming Player and the other devices.
The purpose of part 6 “Professional Archival Application Format” is to provide a standard packaging format for carriage of digital multimedia content, metadata to describe context information related to digital multimedia content stored in the archive, metadata to describe the logical structure of how the digital multimedia content is stored in the archive, identification of processing tools that are applied to the digital multimedia content as well as data protection and integrity tools, data governance tools, and data compression tools.
Part 7 “Open Access Application Format” defines a format designed for users who own rights to a piece of content and have an interest in releasing it in such a way that other users can freely access it but without making it public domain. The solution is the release of content that is governed in a “light-weight” form. The Open Access Application Format packages different contents into a single container file and provides a mechanism to attach metadata information, by using MPEG-7 and MPEG-21 technologies. The MPEG-21 REL is used to model the intentions of the license. MPEG-21 Event Reporting provides a feedback mechanism, which can notify the author, when a user wants to derive a content or extract an item out of the container file.
Part 8 “Portable Video Application Format” defines a format for the use of video files on portable devices giving users the possibility to use the content interactively.
Digital Multimedia Broadcasting (DMB) is a mobile TV service enabling users to acquire and consume information anywhere. However, users may not be able to consume content at their convenient time. Part 9 “Digital Multimedia Broadcasting Application Format” defines a standard file format that can be used to store in and exchange DMB content between DMB terminals. DMB Multimedia Application Format specifies how to combine the variety of DMB contents with associated information for a presentation in a well-defined format that facilitates interchange, management, editing, and presentation of the DMB contents.
Part 10 “Video Surveillance Application Format” provides a lightweight wrapper to the video content from the MPEG technologies, video coding, related metadata and file format, suitable for video surveillance.
Part 11 “Video Stereoscopic Application Format” provides a format for a creator to take and for a service provider to distribute stereoscopic images, enabling users to have more realistic experiences (with or without special glasses) and to store the stereoscopic content for possible redistribution.
Part 12 “Interactive Music Application Format” defines a format to package interactive music content with audio tracks before mixing, so users can freely control the individual audio tracks. This allows the producer to create several versions (producer mixing 1, producer mixing 2, karaoke, rhythmic, and so on) with just one piece of music, using the metadata structure for mixing information.
The maturing of multimedia technology is making less compelling the need to provide systems-video-audio “packages” as in previous MPEG standards (up to and including MPEG-7). Indeed various products and services currently available in the marketplace freely mix different technologies from the different standards and MPEG has done the same in its MPEG-A standards. To respond to the continuing need to cope with technological advances with new systems, video and audio standards, MPEG has started three new systems, video and audio standards “containers” called MPEG-B, MPEG-C and MPEG-D, respectively.
MPEG-B [12] currently contains five parts.
Part 1 |
Binary MPEG format for XML |
Part 2 |
Fragment Request Unit |
Part 3 |
XML Representation of IPMP-X messages |
Part 4 |
Codec Configuration Representation |
Part 5 |
Bitstream Syntax Description Language |
Part 1 “Binary MPEG format for XML” (BiM) provides a standard set of generic technologies to transmit and compress XML documents, addressing a broad spectrum of applications and requirements. It relies on schema knowledge between encoder and decoder in order to reach high compression efficiency, and provides fragmentation mechanisms for ensuring transmission and processing flexibility.
Part 2 “Fragment Request Unit” specifies a technology enabling a terminal to request XML fragments of immediate interest. This significantly reduces processing and storage requirements at the terminal and can enable applications on constrained devices that would not otherwise be possible.
Part 3 “XML Representation of IPMP-X Messages” provides an XML representation of the IPMP-X messages defined in MPEG-4 part 13 with extensions.
Part 4 “Codec Configuration Representation” provides a compressed digital representation of a video decoder and of the corresponding bitstream, assuming that the receiving terminal shares a library of video coding tools with the transmitter.
Part 5 “Bitstream Syntax Description Language” provides a normative grammar to describe, in XML, the high-level syntax of a bitstream. The resulting XML document is called a Bitstream Syntax Description (BSD). BSD does replace the original binary format and, in most cases, it does not describe the bitstream on a bit-per-bit basis, but rather its high-level structure, e.g., how the bitstream is organized in layers or packets of data. BSD is itself scalable, i.e. it may describe the bitstream at different syntactic layers (e.g., finer or coarser levels of detail), depending on the application.
MPEG-C [3] currently contains four parts.
Part 1 |
Accuracy specification for implementation of integer-output IDCT |
Part 2 |
Fixed point 8x8 DCT/IDCT |
Part 3 |
Auxiliary Video Data Representation |
Part 4 |
Video Tool Library |
Part 1 “Accuracy specification for implementation of integer-output IDCT” specifies the IDCT accuracy that is equivalent to or extends the IEEE 1180 standard which has been withdrawn.
Part 2 “Fixed-point 8x8 inverse discrete cosine transform and discrete cosine transform” specifies a particular fixed-point approximation to the ideal 8x8 IDCT and DCT function, fulfilling the 8x8 IDCT conformance requirements for the MPEG-1, MPEG-2 and MPEG-4 part 2 video coding standards.
Part 3 “Auxiliary Video Data Representation” specifies how auxiliary data such as pixel-related depth or parallax values, are to be represented when encoded by MPEG video standards in the same way as ordinary picture data.
Part 4 “Video Tool Library” contains a collection of descriptions of video coding tools, called Functional Units, as referenced in MPEG-B Part 4..
MPEG-D, formally ISO/IEC 23003 MPEG Audio Technologies, currently contains 3 parts.
Part 1 |
MPEG Surround |
Part 2 |
Spatial Audio Object Coding |
Part 3 |
Unified speech and audio coding |
Part 1 “MPEG Surround” provides an efficient bridge between stereo and multichannel presentations in low-bitrate applications. The MPEG Surround technology supports very efficient parametric coding of multi-channel audio signals, so as to permit transmission of such signals over channels that typically support only the transmission of stereo (or even mono) signals. Moreover, MPEG Surround provides complete backward compatibility with non-multichannel audio systems.
Part 2 “Spatial Audio Object Coding” represents several audio objects by first combining the object signals into a mono or stereo signal, whilst extracting parameters from the individual object signals based on knowledge of human perception of the sound stage. These parameters are coded as a low bitrate side-channel that the decoder uses to render an audio scene from the stereo or mono down-mix, such that the aspects of the output composition can be decided at the time of decoding.
Part 3 “Unified speech and audio coding”, a standard still in the early phases of development, aims at defining a single technology that codes speech, music, and speech mixed with music, and that is consistently as good as the best of the state-of-the-art speech coders such as Adaptive Multi Rate – WideBand plus (AMR-WB+) and the state-of-the-art music coders (HE-AAC V2) in the 24 kbit/s stereo to 12 kbit/s mono operating range.
MPEG-E, also called MPEG Multimedia Middleware (M3W) [14], is a complete set of standards defining technologies required in a multimedia device. It is organised in eight parts
Part 1 |
Architecture |
Part 2 |
Multimedia API |
Part 3 |
Component Model |
Part 4 |
Resource and Quality Management |
Part 5 |
Component Download |
Part 6 |
Fault Management |
Part 7 |
System Integrity Management |
Part 8 |
Reference Software and Conformance |
Part 1 “Architecture” describes the M3W architecture and APIs.
Part 2 “Multimedia API” specifies access to the functionalities provided by conforming multimedia platforms such as Media Processing Services (including coding, decoding and trans-coding), Media Delivery Services (through files, streams, messages), Digital Rights Management (DRM) Services, Access to data (e.g. media content) and Access to, Edit and Search Metadata.
Part 3 “Component Model” specifies a technology enabling cost effective software development and an increase in productivity through software reuse and easy software integration.
Part 4 “Resource and Quality Management” specifies a framework for resource management aiming to optimise and guarantee the Quality of Service that is delivered to the end-user in a situation where resources are constrained.
Part 5 “Component Download” specifies a download framework enabling controlled download of software components to a device.
Part 6 “Fault Management” specifies a framework for fault management with the goal to have a dependable/reliable system in the context of faults. These can be introduced due to upgrades and extensions out of the control of the device vendor, or because it is impossible to test all traces and configurations in today’s complex software systems.
Part 7 “System Integrity Management” specifies a framework for integrity management with the goal to have controlled upgrading and extension, in the sense that there is a reduced chance of breaking the system during an upgrade/extension or to provide the ability to restore a consistent configuration.
Part 8 “Reference Software and Conformance” is the usual complement as with the other MPEG standards.
MPEG eXtensible Middleware is a standard designed to provide access to the most relevant MPEG technologies via standard APIs. One goal is to accelerate adoption and use of MPEG technologies by making it easy to design and deploy MPEG standards-based media value chains. It is organised in four parts
Part 1 |
Architecture and Technologies |
Part 2 |
Application Programming Interface |
Part 3 |
Reference Software and Conformance |
Part 4 |
MXM Protocols |
Part 1 Architecture and Technologies
Part 2 Application Programming Interface
Part 3 Reference Software and Conformance
Part 4 MXM Protocols
The MPEG Rich Media User Interface standard is organised in three parts
Part 1 |
Widgets |
Part 2 |
Advanced User Interaction Interface |
Part 3 |
Reference Software and Conformance |
Part 1 Widgets
Part 2 Advanced User Interaction Interface
Part 3 Reference Software and Conformance
The Media Context and Control standard provides a standard framework enabling the interoperability between virtual worlds (i.e. virtual spaces where people can work, interact, play, travel, learn and augment real life) and aspects of the real world (sensors, actuators, social and welfare systems, banking, insurance, travel, real estate and many others). It is organised in seven parts
Part 1 |
Architecture |
Part 2 |
Control Information |
Part 3 |
Sensory Information |
Part 4 |
Virtual World Object Characteristics |
Part 5 |
Data Formats for Interaction Devices |
Part 6 |
Common Types and Tools |
Part 7 |
Reference Software |
Part 1 Architecture
Part 2 Control Information
Part 3 Sensory Information
Part 4 Virtual World Object Characteristics
Part 5 Data Formats for Interaction Devices
Part 6 Common Types and Tools
Part 7 Reference Software
In its 20 years of existence MPEG has operated very much like a company churning out new products (standards) for its customers – the multimedia industry – very often by anticipating industry needs based on industry inputs and internal assessments.
These are some of the areas under investigation, at different stages of development.
Many products and services impacting the lives of millions of people are based on MPEG standard. This chapter will mention the most important.
MPEG is an offspring of traditional standardisation but has continuously innovated itself to cope with evolving technology and the inflow of new industries in need of multimedia standards.
Some of the innovations are the definition of decoder-only standards with its ability to allow industry to compete in encoders, the definition of profiles and levels to increase interoperability between application domains without burdening some of them with unnecessary features, the execution of subjective tests to verify the performance of the audio and video coding standards, the release of a normative reference software implementation of a decoder and an informative software implementation of an encoder.
MPEG produces standards that are deliberately kept at a generic level so as to enhance their scope of use by more industries that can share the format while independently adding the elements that are specific of their application fields in contrast to the traditional approach of industries defining vertical standards without consideration of horizontal commonalities.
MPEG provides a unique route to convert new technology into standards because of its process of selecting technologies for introduction in new standards entirely on the basis of commonly agreed technical parameters. This has the advantage that MPEG standards are typically the best technical standards in a given field but also the disadvantage that sometimes a significant number of patents may be needed to practice the standards. Patent pools are typically established to solve this problem.
[1] H. Nyquist, "Certain topics in telegraph transmission theory", Trans. AIEE, vol. 47, pp. 617-644, Apr. 1928 |
[2] W. R. Bennett, “Spectra of Quantized Signals,” Bell Syst. Tech. J., vol. 27, pp 446-472, July 1948 |
[3] ITU-T Recommendation G.711, Pulse code modulation (PCM) of voice frequencies |
[4] ITU-T Recommendation H.120, Codecs for videoconferencing using primary digital group transmission |
[5] ISO/IEC 11172, Information Technology – Coding of moving pictures and associated audio at up to about 1.5 Mbit/s |
[6] ISO/IEC 13818, Information Technology – Generic coding of moving pictures and associated audio |
[7] ISO/IEC 14496, Information Technology – Coding of audio-visual objects |
[8] IETF Request for Comments 3640, RTP Payload Format for Transport of MPEG-4 Elementary Streams |
[9] ISO/IEC 15938, Information Technology – Multimedia content description interface |
[10] ISO/IEC 21000, Information Technology – Multimedia framework |
[11] ISO/IEC 23000, Information Technology – Multimedia Application Format |
[12] ISO/IEC 23001, Information Technology – MPEG Systems Technologies, |
[13] ISO/IEC 23002, Information Technology – MPEG Video Technologies, |
[14] ISO/IEC 23004, Information Technology – MPEG Multimedia Middleware (M3W) [15] ISO/IEC 23005, Information Technology – Media Context and Control |
[16] ISO/IEC 23006, Information Technology – MPEG Extensible Middleware |
[17] ISO/IEC 23007, Information Technology – MPEG Rich Media User Interface |
3DV |
3D Video |
3GPP |
Third Generation Partnership Program |
AAC |
Advanced Audio Coding |
AFX |
Animation Framework eXtension |
AMR-WB+ |
Adaptive Multi Rate – WideBand plus |
ASP |
Advanced Simple Profile |
AVC |
Advanced Video Coding |
BIFS |
Binary Format for MPEG-4 Scenes |
BiM |
Binary MPEG format for XML |
BSD |
Bitstream Syntax Description |
BSDL |
BSD Language |
CD |
Committee Draft |
CD |
Compact Disc |
CELP |
Code Excited Linear Predictive coding |
DAB |
Digital Audio Broadcasting |
DCT |
Discrete Cosine Transform |
DDL |
Description Definition Language |
DIA |
Digital Item Adaptation |
DID |
Digital Item Declaration |
DII |
Digital Item Identification |
DIP |
Digital Item Processing |
DIS |
Digital Item Streaming |
DMB |
Digital Multimedia Broadcasting |
DMIF |
Delivery Multimedia Integration Framework |
DMP |
Digital Media Project |
DPCM |
Differential PCM |
DRM |
Digital Rights Management |
DS |
Description Schemes |
DSM-CC |
Digital Storage Media Command and Control |
EPG |
Electronic Program Guide |
ER |
Event Reporting |
ERR |
Event Report Request |
EXIF |
EXchangeable Image Format |
FCD |
Final Committee Draft |
FDIS |
Final Draft International Standard |
FID |
Fragment Identification |
FTV |
Free-viewpoinT Video |
HE AAC |
High Efficiency AAC |
IDCT |
Inverse DCT |
IETF |
Internet Engineering Task Force |
IPMP |
Intellectual Property Management and Protection |
IPMP-X |
IPMP eXtensions |
ISO |
International Organisation for Standardisation |
ITU |
International Telecommunication Union |
ITU-T |
ITU, Telecommunication Standardisation Sector |
JVT |
Joint Video Team |
LASeR |
Lightweight Application Scene Representation |
LOD |
Level of Detail |
M3W |
MPEG Multimedia Middleware |
MAF |
Multimedia Application Format |
MDS |
Multimedia Description Schemes |
MP3 |
MPEG-1 Audio Layer III |
MPEG |
Moving Picture Experts Group |
MVC |
Multiview Video Coding |
PAT |
Persistent Association Technologies |
PCM |
Pulse Code Modulation |
PES |
Packetised Elementary Stream |
PS |
Program Stream |
PSI |
Presentation of Structured Information |
RDD |
Rights Data Dictionary |
REL |
Rights Expression Language |
RFC |
Request For Comments |
RoSE |
Representation of Sensory Experience |
RTP |
Real Time Protocol |
SAOL |
Structured Audio Orchestra Language |
SASBF |
Structured Audio Sample Bank Format |
SASL |
Structured Audio Score Language |
SBR |
Spatial Band Replication |
SP |
Simple Profile |
SVC |
Scalable Video Coding |
TS |
Transport Stream |
VHDL |
VHSIC Hardware Description Language |
WIM TV |
Web, IP and Mobile TV |
XML |
eXtensible Markup Language |
XMT |
eXtensible MPEG-4 Textual format |