Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
The Media Analysis Management Interface (MAMI) enables the understanding of the real world at a low cost by using analysis engines such as video image processing engines, sensor data analysis engines, and so on. It also enables various services to be easily provided, such as physical security, environmental load reduction, and intelligent accessibility services. The MAMI Incubator Group (MAMI-XG) discussed the requirements and determined the feasibility of the MAMI, which consists of the data models and exchange protocols for the analysis data of various media. In this document, the findings of the group are summarized.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of Final Incubator Group Reports is available. See also the W3C technical reports index at http://www.w3.org/TR/.
This document was developed by the Media Analysis Management Interface Incubator Group (MAMI).
Publication of this document by W3C as part of the W3C Incubator Activity indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. Participation in Incubator Groups and publication of Incubator Group Reports at the W3C site are benefits of W3C Membership.
Incubator Groups have as a goal to produce work that can be implemented on a Royalty Free basis, as defined in the W3C Patent Policy. Participants in this Incubator Group have made no statements about whether they will offer licenses according to the licensing requirements of the W3C Patent Policy for portions of this Incubator Group Report that are subsequently incorporated in a W3C Recommendation.
The number of systems required to respond to situations in the real world is increasing. Examples of these systems in use are physical security and environmental load reduction services. In particular, the number of image analysis services is increasing rapidly with marked improvement in image-recognition technologies.
Various sensors for collecting various types of data and analysis engines analyzing those data are used to realize such services.
For example, human and vehicle trajectories captured with video images are used for security and marketing services. In one example, information on age, sex, and clothes extracted from images of people is used for an information retrieval service. Electric power data gathered from wattmeters is used for equipment control for energy saving.
However, such systems require a lot of specialized software depending on the features of individual services or analysis engines. In addition, specialized knowledge about individual analysis engines is required to use these analysis engines. This causes a rise in development cost and time. It also restricts analysis engine usage.
In comparison, low cost systems lead to a prevalence of systems that have analysis engines. For that purpose, an architecture and a standard interface for reducing cost are required.
In the MAMI-XG, we discussed a system architecture, use cases, and the requirements of an interface in order to specify an interface standard.
The target standard defines an interface between the analysis data manager and applications and between the analysis data manager and analysis engines. It also defines a framework for multi-vocabulary, but specific vocabulary definitions are out of the scope of this research. Implementing the analysis data manager is out of the scope of this research as well.
There are standardized APIs in specific fields such as surveillance camera control or biometrics. Some of them include the output format of an analysis engine. The Open Network Video Interface Forum (ONVIF) and ISO/IEC JTC1 SC37 (Biometrics) are examples of that.
MPEG-7 and SPARQL are related to general-purpose standards.
The Multimodal Interaction Working Group from the W3C also specifies related standards.
ONVIF decides the standard for network cameras. It specifies the APIs for camera detection, the setting and reading of control parameters, and the getting of the images and output of image analyses. It also handles standards for object detection.
Combining several standards is necessary in order to build a complex system that uses more than one analysis engine. Therefore, more knowledge is required for system development.
ISO/IEC JTC 1/SC 37 specifies common APIs and data exchange formats for biometric authentication such as fingerprint or face recognition.
MPEG-7 is a standard for metadata description which is associated with multimedia content such as voice or image data. It includes a standard for face recognition results.
MPEG-7 is limited to the metadata description of multimedia content and cannot treat other data such as sensor analysis data.
SPARQL is a W3C recommendation for RDF query language. It does not specialize in analysis data.
SPARQL is general-purpose standard for RDF retrieval. Therefore, it is not efficient enough to treat the results of analysis engines and to express and operate analysis data.
The MMI-WG standardizes Extensible MultiModal Annotations (EMMA) and the Multimodal Architecture. EMMA is an interface for the Multimodal Architecture and is authorized as a W3C recommendation. It treats input by human interaction such as voice or ink data. The Multimodal Architecture integrates the information of various input and output devices and is aimed to integrate information from plural devices.
We propose a system architecture that reduces the development cost of systems that use analysis engines. The metadata integration platform (MIP) is placed between applications and sensors. It integrates analysis data and provides them to applications (See diagram below).
Applications use results of analysis engines.
The external interface is an interface through which applications retrieve analysis data from the MIP. Through this interface, applications can clip the required data from the analysis data managed in the MIP.
The metadata integration platform (MIP) consists of the analysis data manager and analysis engines, which integrates the analysis data and provides them to applications.
The functions of the analysis data manager are to:
The internal interface is the interface through which the analysis data manager receives analysis data from various analysis engines. Through this interface, the analysis data manager stores and manages the analysis data retrieved from various analysis engines.
Analysis engines analyze raw data from various sensors and generate analysis results which show the status of the real world.
Sensors collect various data from the real world through rays of light, radio waves, sound waves, and so on.
We discussed a common interface standard that hides the individuality of each analysis engine. Analysis results expressed by the interface are then used commonly between services and engines.
We introduce use cases in three fields: energy saving, video surveillance, and operational improvement.
The office worker trajectory visualizer and facilities controller is a system that controls lights, air-conditioners, and office automation equipment on the basis of the locations of people in an office for energy saving.
The home facilities controller is a system that uses video cameras and human detection sensors in houses in order to detect humans. It controls lights, air-conditioners, and other electrical appliances for energy saving.
The intrusion detection system detects human or vehicle intrusions by analyzing images captured by surveillance cameras. In addition, it raises an alarm when unauthorized intrusion is found with face or license plate recognition.
The person search system, which uses face and clothes features, is a system that enables image search services. These services retrieve video images by using personal features like age, sex, face, and clothes.
The sales analysis, based on customer trajectories, is a system that gives sales analyses for increasing sales and for better ways to lay out a shop. For that purpose, it extracts customer's trajectories from video images and RFID data. It then links the trajectories to a layout of goods and POS data.
The factory operation analysis, which uses worker trajectories, is a system that analyzes work procedures and the layout of factories in order to improve work efficiency and safety. For that purpose, it generates workers' trajectories from video images of factories and from RFID data and generates visualizations of these trajectories.
We have discussed use cases and requirements of the MAMI. Hereafter, we will continue to discuss standardization in a working group activity.
The Multimodal Interaction Working Group (MMI-WG) standardizes the Multimodal Architecture which integrates information of various input and output devices. MMI-WG's Multimodal Architecture is similar to our system architecture. The Multimodal Architecture has the Interaction Manager which virtualizes devices and integrates information from plural devices.
EMMA V1.0 treats input by human interaction such as voice or ink data. The next version, EMMA 2.0, is scheduled to be extended to sensor data and biometrics data.
We plan to implement the achievements of the MAMI-XG to the MMI-WG. We want to contribute to the MMI-WG in terms of how analysis results are used.
We discussed the Media Analysis Management Interface (MAMI), an analysis result exchange interface, in order to reduce the development cost and time of systems that use analysis engines.
The individuality of an analysis engine is a major factor that increases development cost. Therefore, the analysis data manager, which hides this individuality, is placed between analysis engines and applications. The manager stores and provides analysis data. In addition, the MAMI is required for data exchange between analysis engines and applications.
We described the requirements of the MAMI and six use cases in three fields: energy saving, video surveillance, and operational improvement. We also studied other WG activities and found that the MMI-WG has much in common with our system.
We will continue to collaborate with other WGs like the MMI-WG to specify the MAMI.