FIPA | 96/06/05 10:22 |
FOUNDATION FOR INTELLIGENT PHYSICAL AGENTS | nyws017 |
Source: Ennio Grasso, Fabio Malabocchia, Roberto Manione and Claudio Rullent (CSELT) |
In this contribution we will focus on the levels of standard support that would facilitate the broad exploitation of agent based applications.
In particular we believe that there is room and need for standardazing the interactions of agent based subsystems. This is the topic of the next section.
A second goal we want to highlight is a proper definition of standards that can allow to assess the quality of an agent. This is the topic of the section 3.
Section 4 outlines a proposal for standardization for one of the levels of interaction outlined in section 2: i.e. the application level.
Interaction among agents requires a support at three different levels (listed here from top to bottom):
One such domains is speech understanding, discussed later in this paper. It is clear that a standardization at this level is appropriate only for the corresponding domain and refers to what a typical subsystem needs to receive or has to produce.
Such kind of standards is achievable only when the field is mature enough and the subdivision into subsystems is derived from ìnaturalî classes of functionalities or can be agreed as a straightforward evolution of current offers.
A more ambitious goal could be a domain independent design methodology for agent based applications associated with some reference applicative architectures. This is something that could be very profitable to standardize, although perharps not in the first calls.
This level of communication is probably adequately supported by formalisms like KQML and KIF. Goals, motivations, justifications and the like are kinds of information that are typical of a more complex interaction and that will call for a support.
Mobility should be supported at this level, through services devoted to receive, authenticate, and negotiate services with the hosted agents. Telescript and (Safe-)Java have interpreters that can provide some services of this kind. CORBA itself is being extended to deal with this problem.
Services of this kind need to be standardized, and must be designed in integration with the current standards in distributed computing. This fact will allow a seamless integration between agent computing and distributed computing.
Security in particular, needs to be adequately ensured, and this is a prerequisite for any development in this field.
An important topic to be considered is proper validation of the agent. When I buy an agent based subsystem, I donít really care about how intelligent it is but I do care about how good it is in performing its job.
Leaving apart the problem of having the platform services that ensure safeness with respect to attempts to force the encapsulation imposed to the agent, the problem is how we can ensure that the software we have:
To this purpose, we can use, of course, tools and methods used to assess proper working of distributed computations; in particular can be of great help
formal methods that allow also to perform simulations of the objects behaviour according to the specifications.
Agent test can be performed by injecting the agent into a simulated environment where it can establish an interaction representative of what can be found
in the target environment. This environment will act like a kind of flight simulator that allows to safely challenge the pilotís skills.
Agentsí capabilities and rights must be built in a standard way as well as platform and language independent, so that agent based computation can
achieve complete mobility.
The goal of a speech recognition agent is to identify the user utterance, i.e. the sequence of words uttered by the user. Some speech recognition agents perform only isolated word recognition (the user is supposed to utter a single word in response to a system prompt) while others can have the capability of recognizing continuous speech (the user can utter a sentence composed of many words).
The definition of a standard input and output interface for a speech agent could consider the following aspects:
In a traditional approach the input of a speech agent usually is the speech signal represented using one of the various PCM formats.
Another possibility is to give the speech agent a more elaborated input, for example a set of spectral parameters for each time frame. This latter solution
has the advantage that a local computation can be performed (for instance in a mobile telephone) reducing the amount of data that need to be
transmitted to the speech agent. Of course a standard representation of spectral parameters has to be defined and commonly agreed upon. This
approach is currently studied by a group of industrial partners (AURORA Project) and is currently being considered in ETSI to become a formal Work
Item.
The output of a speech agent that performs only isolated word recognition could be just the word that has been recognized. In practice it is useful to obtain a set of the N words that have the highest probability of having been uttered, each one possibly characterized by a score that represents an estimation of that probability. In such a way an application ha the chance, for instance, of using the second best hypothesis if the user disconfirms the first one.
In the case of a continuous speech agent, the result could be a sequence of words (representing an utterance). As in the previous case it could be useful
to have a set of such sequences (N-BEST), with an associated score. Another possibility could be the production of a lattice of word hypotheses, each
one characterized by the word, its starting and ending time frame and a score.
In the case of spoken dialogue applications, the continuous speech agent should interact closely with other agents that provide important functionalities like: understanding of the user utterance, managing the man machine dialogue, generation of natural language utterances and speech synthesis. Spoken dialogue systems are still at the research prototype level and the interactions between the various components are far from being standardized.
It should be relatively easy to standardize the interface between the recognition agent and the other components, even if a specific effort has to be done to consider the sharing of knowledge that is required among the different components (the application dictionary is used by many components, like recognition, understanding and even dialogue).
Also quite simple is the interaction between dialogue management and speech synthesis, where the messages could be simple strings, with the possibility of predefined escape commands.
The interface between understanding and dialogue management is more difficult to be standardized because it requires the representation of semantic
knowledge at different levels of abstraction (depending on the approch that is followed by the system developers). Similar considerations hold for the
interface between dialogue and message generation: here we can also assume the agent being the same.
In this contribution we have described the opportunities for a fruitful standardization activity. Among the other fields, we suggest that in 1997, there could be a standardization effort for each level described in the introduction (application, semantics, distributed computing).
Speech can be a candidate for the application level and its applicability is broad enough to attract a large number of contributors.
At the same time parallel efforts at the other two levels when adequately coordinated can provide a good overall framework.
Issues like validation, quality assurance, and safeness are mandatory prerequisites of an industrial technology, and must drive the standardization efforts
at the three levels.
Ennio Grasso, Fabio Malabocchia, Roberto Manione and Claudio Rullent
CSELT
via Reiss Romoli 274
I-10148 Torino, Italy