The vision and the role of
MPEG-4 in the future of multimedia
Leonardo Chiariglione, CSELT - Italy
The multimedia world of today did not happen the way it was thought
it would happen. Instead of people receiving Gbit/s of multimedia data through
optical fibres, today people receive digital television programs through
satellite and cable, talk and exchange messages and files on mobile phones,
watch movies fron DVD, find all sorts of information and entertain all sorts of
relationships on the web, listen to music compilations downloaded from the web,
watch postage stamp size video from the web and play computer games on game
consoles etc.
Convergence - the much abused word – is nowhere to be seen. The world
is populated of vertical systems where proprietary technologies abound. This is
all the more surprising if one think that the basic information units – audio
and video – are technologically all the same.
Started in July 1993 the MPEG-4 has grown to a very comprehensive and
industry-neutral set of tools capable of satisfying the needs of the multimedia
world:
it is delivery systems and transport agnostic, so that users – both content providers and end users – can effectively abstract from the layers lower than and including transport;
it provides a full set of compression tools for audio (speech and music) and video from very low to very high bitrates supporting a multiplicty of functionalities;
it provides tools to represent special types of synthetic audio-visual information, such as synthetic music, character strings annotated with other information, human faces and bodies;
it provides efficient tools for compressing time-varying 2D and 3D objects;
it enables bit-efficient composition in a 2D or 3D space of different objects;
it comprises a framework supporting Management and Protection of content that is being extended to provide interoperability at the level of protected content.
Therefore MPEG-4 is capable of providing the technology platform on top
of which the world of multimedia can flourish. This is already happening but
there is a long way to go.
Some examples:
Fixed
line terminals connected by ISDN or ADSL can receive high-quality moving
pictures and audio in streaming mode, but this should also be possible on
mobile terminals which can only use a few tens kbit/s;
Rights
holders would like to exploit the benefits of music distribution over the
web to deliver high-quality audio on mobile and portable devices without
losing in such a way their rights are not compromised;
The
chimera of offering web services on TV sets has attracted many companies
which have invested resources with no results because television cannot be
extended with an alien paradigm, it can only be extended with a compatible
multimedia paradigm.
Unfortunately the fact that the premises are there is a-priori no
guarantee that things will happen, because industries and companies within
industries have the tendency to operate with remarkable shortsightedness. Some
examples:
VRML
has largely failed because the size of VRML files, where information is
encoded with characters, was too big to be carried by today’s Internet.
Still 3GPP, an initiative to develop specifications for 3rd
generation mobile networks, is adopting SMIL – which again uses
characters to describe media composition – with the justification
that bit efficiency is not important as composition information is used only
occasionally and in any case is small. May be so on the devices evolving
from today’s cellphones, but there is no reason why for PDAs with larger
screens which are likely to appear at about the same time the two
assumptions will hold. Therefore we are likely to find two incompatible
types of devices which will artificially segment the market and undermine
the chances of success of an enviroment for which concerns are being loudly
raised.
The
desire of rights holders to retain control of their assets can only be
shared, but the desire to create walled gardens where users will enter to
consume protected content of only one source is not. MP3 has shown that
consumers have plenty of technology that allow them to access and consume
music that is for free and undistinguisheable from music that is purchased.
The idea that consumers will leave free content based on a technology that
offers total interoperability to move to paid-for content based on
technologies that create walled gardens is so naïve to border insanity.
Being aware of the dangers is one way to avoid them. I am sure that, may
be not at the first try, industries and companies will eventually see the
shortsightedness of creating islands of products, services and applications and
will fully embrace the full MPEG-4 technologies. Besides the advantage of
accessing technological tools that have been designed to operate separately to
satisfy individual application needs and still can be combined to provide more
sophisticated applications because they has been designed to allow that, MPEG-4
users have the assurance that MPEG-4 versioning will keep the technology moving,
either because existing audio and video compression tools will be upgraded –
as the new 3D object compression and the ongoing audio call for evidence and
video call for proposals show - or because new system-level functionalities will
be added – as the character-to-binary XMT compiler encompassing SMIL and X3D
and the Multiuser World call for proposals show.
In addition to this MPEG-4 users will benefit from the soon-to-be-completed
developments in the MPEG-7 area that will provide users with the ability to
innovate the way content is accessed and consumed and in the MPEG-21 area that
will create the fundations of new forms of content usage for a networked society
for which the world is indeed a village.