FIPA96/06/05 10:11
FOUNDATION FOR INTELLIGENT PHYSICAL AGENTS nyws007
Source: Yeun-Bae Kim (NHK)

 

Agent-based Broadcast Indexing

We are currently developing a new type of value-added broadcasting services under a framework called Integrated Services Digital Broadcasting (ISDB). In the ISDB framework, images, sounds, texts and other hypermedia information will be tightly coupled to each other in a way to provide a variety of highly flexible and interactive services to audiences via intelligent agents (1) (2).

The creation of such an agent-based ISDB system will require fundamental changes in all aspects of the present broadcasting scheme, ranging from TV program production, to program transmission and reception. For instance, in TV program production, more intelligent and powerful tools for manipulating and gathering information will be required to increase productivity. Similar tools (e.g., agents) will also be needed to assist audiences in filtering, storing, and retrieving desired programs or specific video clips from a large volume of incoming broadcast programs.

One of the most important factors in building such system is how an agent can manipulate these combined media according to their semantic contents, in particular video images which are considered the most important and difficult to handle in ISDB.

To overcome such a difficulty, we are conducting research on content-based video indexing and retrieval methods as one of key technologies toward the realization of such a system. In ISDB, the TV programs will be broadcast with the index data that describe the actual contents which can be valuable for agents to navigate at will to gather or retrieve programs and other information. In this context, the content-based index may plan an important role in agent technologies because it may help increase the agent's ability and reliability by simplifying its recognition mechanisms which are the most complicated of its components.

Our indexing method exploits the conventional program production procedure (3). In the conventional program production, many directors manually keep records of the contents of each video clip with precise time code (in frame number) either on a personal computer or in the form of written notes using natural language sentences. This manual preparation of the content description is currently a necessary step for later reference for final video editing. The notes describe what is happening (e.g., someone is doing something somewhere) in the video clips, rather than the low-level syntactic contents (e.g., camera movement, color, texture, shapes, or time codes). The notes are of crucial importance because they can provide information that cannot be obtained by current state-of-the-art machine cognition and image processing technology. Most of the currently available video indexing methods use the low-level syntactic contents as their main source to describe video clips, and these contents have little relation to the actual semantic contents. On the other hand, methods based on keywords fail to provide a satisfactory description of the semantic video contents.

Our approach analyzes these notes provided by directors to automatically generate content-based video indexes to be used in video manipulation by using a natural language parser. The video contents are represented in a stream-based representation called spatiotemporal script which keeps the video data intact in its original context. Our method achieves better retrieval than conventional methods such as keyword-based methods, can be generated by an automated procedure parsing the sentences in the notes, and has a compact and flexible structure that is easily tractable.

References

(1) Yeun-Bae Kim, et al., "An Integration of Natural Language and Vision Processing towards an Agent-based Future TV system", In Proceedings of AAAI-94-Workshop for Integration of Natural Language and Vision Processing, Seattle, Aug., 1994.

(2) Yeun-Bae Kim, et. al., "When Agents Combine Broadcasting and Multimedia", In Proceedings of the International Broadcasting Symposium '95 (IBS'95), Tokyo, Nov., 1995.

(3) Yeun-Bae Kim, Masahiro Shibata, "A Video Indexing Method using Natural Language Memo for TV Program Production", In Proceedings of 12th European Conference on AI, ECAI-96, Budapest, Aug., 1996.