In spite of it having so much to say to the human psyche

Audio and MPEG by Leonardo Chiariglione In spite of it having so much to say to the human psyche, the audio signal is relatively easy to deal with compared to the much broader band visual signal. For this reason, when digital technologies were becoming more convenient for the transmission of speech signals, the PCM representation of speech was soon standardized by the ITU-T (A and mu laws), because even with the early digital technology of the 1960s it was possible to manage digital signals of 56-64 kbit/s. In the following decades a number of ITU-T standards for speech compression were produced.

For the very same reason the entertainment industry did not have much motivation to apply compression to sound signals. Bandwidth in a broadcast channel is (or used to be) plentiful, and so was the ability to store bits on compact discs and magnetic tapes.

Things started changing in the second half of the 1980s when broadcasters envisioned providing a large number of high-quality music channels as a new form of their radio service. At the same time digital television under development at that time was also in need of a high-quality sound compression standard because bandwidth was constrained by the accompanying much wider band video signal. Some consumer electronics companies were also thinking of a new type of digital cassette recorder that would store music bits on the old compact cassette.

MPEG - the Moving Picture Experts Group - was established in 1988 and set out to develop a full audio-visual coding standard... MPEG – the Moving Picture Experts Group – was established in 1988 and set out to develop a full audio-visual coding standard with the name of MPEG-1. The audio portion – so-called MPEG-1 Audio – is designed to compress a digital stereo sound with a total bitrate of 1.4 to 1.5 Mbit/s (depending on the sampling frequency) down to a few hundred kbit/s while preserving high quality. MPEG-1 is structured in Layers, from I to III, the higher Layer giving higher compression with higher complexity required of the device. Layer I gives transparency, i.e., subjective equivalence with the uncompressed original, at 384 kbit/s; Layer II does the same at 256 kbit/s and Layer III at 192 kbit/s.

MPEG-1 was approved in November 1992 and its Layer I and II versions were immediately deployed. MPEG-1 Layer III, however, remained dormant for a few years. But the simultaneous coming together of the multimedia PC with a CD-ROM unit, the Pentium with its number-crunching capability and the Internet produced a new phenomenon: MP3, a.k.a. MPEG-1 Audio Layer III. Since then the music world has not been the same again.

Basic Structure of the MPEG-1 Audio Encoder

In 1990 MPEG started a new project, digital television or MPEG-2. MPEG-1 Audio already provided a good technology for the audio part of the digital television project, but what about multichannel audio? MPEG-2 Audio, approved in 1994, provided a technology that would allow those who had already deployed MPEG-1 Audio (stereo) services to upgrade to multichannel. Subjective tests carried out showed that one needed 640 kbit/s (2.5 times the bitrate for MPEG-1 Audio Layer II) to get transparent quality.

Assembled experts at the 35th MPEG gathering in Tampere, July 1996

MPEG, however, came to realize that if "backwards compatibility" (the technical name given to the ability to upgrade an existing service) was a plus for those already in the business, those who had not been constrained by past deployments would rather prefer to offer an "unconstrained" multichannel coding scheme. This was the motivation for the MPEG AAC (Advanced Audio Coding) standard, started in 1993 and completed in 1997. With AAC it is possible to get transparent stereo quality at 128 kbit/s and transparent multichannel quality at 320 kbit/s.

At the same meeting in which the AAC project was kicked off, MPEG decided to embark on another more ambitious project, MPEG-4 "Coding of audio-visual objects," completed in 1999. Content usually comes to the end user in a pre-packaged form – radio, television, compact cassette, CD etc. – with little possibility of interaction beyond switching channels, turning off the radio, changing the CD track, or fast forwarding, etc. The Web, however, offers a rich form of interaction: you open a Web page and you can watch it, click on a hot spot, jump to another page and so on. Nothing of that kind can be done with radio or television.

MPEG-4 is a standard that lets you interact with content at a much finer detail than is possible today. So MPEG-4 is a standard that lets you interact with content at a much finer detail than is possible today. If the author has decided to let you do so, you can compose your own version of Beethoven's Ninth Symphony by adjusting the level of the different instruments to suit your wishes. Not only that, MPEG-4 allows you to compose the different audio sources with a different spatial arrangement.

MPEG-4 audiovisual objects composited in a scene example

Music, however, is not just sound. While listening to a piece of music you might like to be able to follow the lyrics, possibly without the voice of the singer (for example, karaoke), well synchronized with the music itself. If you are practicing your instrument, you might like to see the scores. Or you might like to use a MIDI file instead because you are in such a remote place that nothing more than that can reach you. Or you might like to see a photo of your preferred singer taken at that particular moment or listen to an interview with the guitar player, the statement of an opinion maker, the rating of an independent agency, some news related to the song or information about how the sales of the song are going, etc. And of course the music itself could be accompanied by some videos or animations.

...coding of general audio...starts at a bitrate of 6 kbit/s and a bandwidth below 4 kHz but also includes broadcast quality audio from mono up to multichannel. In terms of compression coding, MPEG-4 supports the coding of speech signals at bitrates from 2 kbit/s up to 24 kbit/s. For coding of general audio, ranging from very low bitrates up to high quality, a wide range of bitrates and bandwidths is covered. It starts at a bitrate of 6 kbit/s and a bandwidth below 4 kHz but also includes broadcast quality audio from mono up to multichannel. An important feature is the fact that AAC is an integral part of MPEG-4 Audio.

Besides an effective compression ratio MPEG-4 Audio provides a number of features that make it unique. Among them are:

speed change that allows the change of the time scale without altering the pitch during the decoding process,
pitch change that allows the change of the pitch without altering the time scale,
bitrate scalability that allows a bitstream to be parsed into a bitstream of lower bitrate such that the combination can still be decoded into a meaningful signal,
error robustness that provides improved performance on error-prone transmission channels. and
audio effects that provide the ability to process decoded audio signals to achieve functions for mixing, reverberation, spatialization, etc.

In addition to providing an effective technology for natural speech and sound, MPEG-4 Audio also provides solutions for the synthetic equivalent. The former allows intelligible synthetic speech to be generated from a text, or a text with prosodic parameters (pitch contour, phoneme duration, and so on) to be given as input to a standard Text to Speech. The latter supports the generation of synthetic audio by means of a Structured Audio Decoder that allows the application of score-based control information to musical instruments described in a special language.

MPEG-4 is a standard that has anticipated the needs of the rights holder community because it contains a framework in which it is possible to plug in external components devised to manage and protect an MPEG-4 application from unauthorized use. This framework, called IPMP (Intellectual Management and Protection) framework, is effectively a new layer that is put on top of the traditional compression layer and has been designed to retain, in the protected domain, all real-time and synchronization features of the compression layer.

...think of being able to find a song by humming or creating a compilation based on the type of melody or the background music... MPEG did not stop at this point. In 1997 it started a new project, MPEG-7 "Multimedia Content Description Interface." Unlike the other MPEG standards whose goal was to reduce the number of bits required to represent the information without losing its quality, MPEG-7 represents the semantic meaning of the information. What is the use of MPEG-7 Audio? Just think of being able to find a song by humming or creating a compilation based on the type of melody or the background music, etc. July 2001 is scheduled for the final approval of MPEG-7.

One might think that with MPEG-7 everything necessary to enable digital audio in all its forms had been completed, but in June 2000 MPEG has started a new project MPEG-21 "Multimedia Framework." The vision here is one where 6 billion people cover potentially all roles of the traditional well-structured value chain: authors, performers, producers, retailers, consumers, resellers, etc. The project is at the very beginning but its main thrust can be described as the development of all technologies that allow the creation of an environment where multimedia objects at different stages can be traded by big corporations as well as by individuals. This requires the standardization of such technologies as:

content identification – the ID card of a multimedia object,
linkage of data with metadata – the linkage of content with its description and vice-versa,
interoperable content protection – going beyond the current MPEG-4 IPMP framework to enable a use of protection that is transparent to the end user, and
interfacing with financial transaction platforms,

along with other technologies.

While unprotected "MP3" has possibly been a surprising awakening to the recording industry – but also one with the potential of creating a new rapport between authors, producers and consumers – in the long term the relationship between Audio and MPEG is better compared to an unending love affair with an exciting future ahead.

Leonardo Chiariglione is head of the Multimedia Technologies and Services Research Division of CSELT, the corporate research center of Telecom Italia. The Moving Pictures Experts Group was founded by him in 1988 and won an Emmy for its work in 1996. He has a masters in Electronic Engineering from Polytechnic of Turin and received his Ph.D. from Univ. of Tokyo. In addition to serving as the executive director of SDMI, he is also active in other international standards development projects.

Moving Picture Experts Group

MPEG Audio

Structured Audio Orchestra Language - SAOL

MPEG-4 Industry Forum

Secure Digital Music Initiative - SDMI

ISO Online

MPEG Pointers & Resources

GRAMMY Gateway is the internet nexus of reliable information about recorded sound. Every month, a new feature highlights issues of concern to professionals, scholars and others interested in staying informed on all aspects of the recording industry from history to cutting edge trends. If you would like to review past features, please visit the GRAMMY Gateway Archive Several hundred researched web links with descriptions provide quick access to information.