Evaluation report for activities to be targeted

Project acronym: QUESTION-HOW
Project Full Title:Quality Engineering Solutions via Tools, Information and Outreach for the New Highly-enriched Offerings from W3C: Evolving the Web in Europe
Project/Contract No. IST-2000-28767
Workpackage 2, Deliverable D2.1

Project Manager: Daniel Dardailler <danield@w3.org>
Author of this document: same

Table of Content:

Introduction

The objectives of this workpackage (WP2) is to provide additional tools to support new W3C technologies (in addition to those in WP1) . The set of tools described below are also being derived from the needs of the W3C working groups.

During the past three months of the project, an assessment of the state of the current set of tools for newer technologies in the W3C community has been made and as a result, new functionalities added to existing tools or new tools are being proposed.

The expected result of this work is enhanced value in W3C Recommendations through the provision of tool support shifting the user from awareness to understanding of the emergent technology.

This deliverable describes Projects that will be delivered in the next 12 months of activity under WP2, for Deliverables WP2.2-6 in the area of XHTML, Metadata, Multimedia, and Device Independence. Each project has a cost evaluation in person-month for its development.

Project: Namescape-aware SVG validator

Validation has different meanings depending on context. It can mean valid to a DTD, or to a schema. It can mean valid to the prose of some specification.

This project is producing a modular and extensible validation framework that checks all those aspects. It can validate not only stand-alone SVG but also SVG embedded in other document types.

The deliverables of the project will be packaged as an online service for users to validate over the web, and a standalone validator for local, offline use.

Project: Shrink wrap requirements for Web Site

To improve on the quality on the Web, there's a need for ready-to-use business cases to the benefits of building Web sites with W3C standards.

There is a lack of good metrics and studies about this benefits, but it's usually said in the Web community that developping a website which doesn't respect the standards has a direct work overload of 20 to 30%. Only a few people are aware of the necessity of the standards and when they are, they don't know exactly how to express the benefits fortheir application and how to explain it to other parties..

We think it is necessary to create a requirements document (a cahier des charges) that will explain the benefits of standards but also provide a template that any audience can modify to their needs and which includes the case of commercial relationship.

These two documents will be freely available to anyone and translated (using the voluntary Translations policy of W3C) in others languages to convey the language of the person using it.

Project: Collaborative Interactive Web Editing Tool

The original design of the Web was to be read-write, unfortunately, despite the fact that it was already in the communication protocol used on the web (HTTP), nearly all current Web servers and clients allow remote reading, but not remote writing.

The goal of this project is to enable read-write access. As most HTTP PUT enabled tools are editors, some tools are missing to allow upload of a batch of files, or of non-textual content.

Such a tool will be created with an interactive Graphical User Interface that allow the use to select which files/data is transfered on the server. It will also interoperate nicely with the server to allow conflict detection as well as removal of remote content on the server.

Project: QA Log Validator.

The burden of validating large amount of HTML pages is often too important for a webmaster of a large site.

A new tool is needed for that specific purpose, a tool that will help webmaster to improve the quality of their contents in an manageable way.

The design principle is to perform the tasks in incremental steps. Each day, week or month (freely configurable by the webmaster), the webmaster will receive a list of the most popular (as defined for the site policy) Web pages which do NOT validate. The webmaster will be able to fix those 10 or 20 pages right away.

The program will be built in Perl, for ease of integration on the server side, in a modular and pluggable way, which means if someone wants to develop a new plug-in that will add a new functionnalities it will be easy to do (e.g. someone wants validation and a spell checker: the webmaster will have only to develop the SpellChecker module). This tool will be used on the w3.org site as well.

Project: Conversion of QA Matrix resources in Metadata format

The QA Matrix of W3C specification is a complex table of W3C specifications oriented for the developper world.

This Matrix is one of the main tool that helps developpers to gather information about the status of a Specification (Stage, validation tools, Test Suites, additional documents, Conformance section, etc.). It is maintained by the Quality Assurance Activity of W3C.

To increase the benefits of such information we are working on the development of an RDF version of this Matrix. RDF is the language of choice for the Semantic Web and representing Metadata.

The RDF version will help us to have a better semantic representation of the data and so will be more useful for external people. To achieve this, we first need to develop a vocabulary based on ontologies and/or Topic Maps.

In addition to this new version of the Matrix, the RDF this project will deliver a series of tools based on XSLT and cvs to optimize the automation of updates. The results will be used online on the w3.org site.

Project: P3P Preference manager

Privacy protection has strong relationships with access control. Models for privacy protection is emerging (c.f. P3P and APPEL), but a complete picture does not yet exist. Access control, on the other hand, has a long history, and models have been proposed and evaluated. The piece of work described here contributes to an understanding of how privacy preference policies can be managed in a practical way, by building and evaluating a preference management system based on technology from the area of access control.

In the web environment, the user's privacy is protected through the means of server privacy policies, user privacy preferences, and functionality that can match these policies to each other. APPEL is a proposed language for expressing privacy preferences, and P3P is a language for expressing server privacy policies. Given that the tool, that the user access the web through, can understand the current set of user preferences (expressed in APPEL), the tool can prevent personal information from being made accessible to the server. Where does the user's stated preferences come from? There are typically several different reasons for the totality of the preferences. Reasons may be dependent on each other, or they may be independent of each other. Over time, reasons change, and the concrete representation of the reason and its consequences change. Hence the user's preference document is a living document that will exhibit a kind of evolution and adaptation. This evolution needs to be managed. Otherwise, the preferences will diverge from what is intended, and the preferences, as expressed, may become impossible to understand.

This project consist in developping a demonstration system, centered around a preference manager, a piece of software that supports other components involved in accessing servers on the web. The preference manager delivers valid preferences, represented in APPEL. These preferences can then be used by a preference evaluator, evaluating a server privacy policy against current (user) preferences. This preference evaluator may be inside a web browser, in a proxy server, or in some other component accessing web resources. The main purpose of the preference manager is to provide complete preferences, for specific use, or for general use. By "specific use" we mean the preference appropriate for a single specific access to some specific server. By "general use" we mean a preference that encodes the totality of constraints in a situation, independent of what services will be accessed. The intent of this subproject is to evaluate principles and mechanisms for administration of preferences.

To leverage on existing work, we will reuse functionality and components from on-going work at SICS, in the area of Policy-Based Reasoning, specifically practical results from the Delegent system. To simplify evaluation, it is highly likely that a proxy-server approach will be adopted, where preference evaluation will be performed, on request from ordinary web browser.

Project: Simultaneous Visual and Audio Communication on the Web

Building a basic multimodal dialog demonstrator used to query a multimedia database that is used in the framework of an FhG Project named iFinder. (iFinder automatically aquires speeches held in teh german Bundestag does speaker recognition an cutting into clips.)

The demonstrator uses the following W3C-Standards: XHTML-basic for the graphic interface, VOICEXML for the voice interface, XSLT to do transformations of results form XML to either of these standards, XPATH to specify queries and selections.

The demonstrator runs on IBM's Webspere but can be ported to any other system that supports the same functions.

Beside the demonstrator we will deliver an overview of the architecture, sample files that demonstrate the use of W3C technologies and clarify the way of interaction between them.

Project: Automatic inclusion of RDF annotation in SMIL 2.0

Web publishing systems have to take into account a plethora of Web-enabled devices, user preferences and abilities. Technologies generating these presentations will need to be explicitly aware of the context in which the information is being presented. Semantic Web technology can be a fundamental part of the solution to this problem by explicitly modeling the knowledge needed to adapt presentations to a specific delivery context

This project focuses on the automatic inclusion of RDF annotation in generated SMIL 2.0 presentations. This is being incorporated in the Cuypers hypermedia presentation generation system. An initial proof of concept using Dublin Core to annotate individual media items is being implemented.

This uses information stored in the underlying database of multimedia items (the database is owned by the Rijksmuseum). The information stored in the database is translated to Dublin Core, associated with the media items used in the final presentation and included in the SMIL file.

As we make the knowledge used during the different steps of the generation process more explicit, this can also be included in the final presentation. For example, for which device was this presentation created; for which user characteristics; which application domain is used in this presentation.

Such explicit annotations improve the tracebility and trustlevel of information by explaining the origin of selected information as well as the design decisions taken during the (server-side) generation process.

We also plan to include the exploration of domain-specific ontologies to improve metadata by using ontological reasoning and the exploration of annotation of the overall presentation by using presentation/narrative oriented vocabularies.

Project: CC/PP module

A module that makes the Apache Server capable of understanding CC/PP information contained in HTTP headers.

We follow the HTTP-ext format using the Profile and Profile-Diff fields in the HTTP header. To demonstrate the functionality of the module, we will add to apache limited transcoding capabilities. The file formats that will be subject to transcoding will be chosen among the MPEG video format, the MP3 audio format and the GIF image fomat.

We are currently implementing a module that can co-operate with the Apache core like all the other modules loaded by the Apache. For testing purposes we plan to create HTTP+CC/PP headers from a java application running as a proxy. After building the CC/PP module for apache servers, we plan to investigate whether this module can be used for Apache proxies as well. For that purpose we will be investigating the module mod_proxy.

This is a possible position where the CC/PP module can be integrated in. The transcoding process will be implemented on top of existing libraries, like ImageMagick (for the transcoding of the GIF images), and FFmpeg (for the transcoding of the MPEG and the MP3 files).

To improve efficiency even further, we plan to study whether the CC/PP information will be in RDF or in XML format, since RDF parsers are generally slower than XML parsers.

Project: XML/RDF Digital library

The main task will be to develop a user interface to query complex and specialized XML documents corpora (like juridical documents, cultural heritage cataloguing cards, user manuals, etc.)

At the lower level, we will take an already developed library. The core of the library (indexing and compressing algorithms) will remain property of their authors. The library is written in C and provides a set of algorithms and data structures for indexing and searching an XML document collection.

The documents must be well-formed and may be heterogeneous in that they may reflect different DTDs. The library supports the storage and management of these XML files in native form, that is, it operates directly at the File System level. The main features of the library are: state-of-the-art algorithms and data structures for text indexing, compressed space occupancy, and novel succinct data structures for the management of the hierarchical structure of the XML document. T

he library provides an API with a rich set of functions to operate on its whole collection of data structures and algorithms. It may implement most of the basic functionalities of XQuery, and it may support more complex IR-like searches. The user interface will get document structure from the XMLSchema, and will make use of some RDF facilities for broadening or narrowing query terms, possibly implementing a graphical browsing of thesauri, and in supporting semantic equivalences for more effective searches.

The possibility of querying different document collections, where semantically equivalent data element are stored in different XML document structures, will be considered. One of the most significant issues is that the system will work on compressed documents, and space reduction might play a crucial role in space demanding applications which run on small-memory devices like, for example, PDAs and ebooks.

Project: Using RDF to query multiple SQL Databases

This is the continuation of the work started in WP1: produce a demonstrator to show the benefits of RDF over XML Schema, raw XML, or the SQL output formats for both query and integration of information.

Background: Access to multiple databases is a classic problem in IT. Using a single distributed schema across multiple databases is one solution to this problem where it can be imposed, but in practice this solution does not address legacy systems, or systems created with different vocabularies (e.g. car vs automobile). The ambiguity to be resolved is a semantic problem for the RDBMS which is addressed by ontologies expressible in RDF.

The demonstrator will provide the complete package shown above to demonstrate the benefits of using RDF as a Semantic Web solution to this problem. Many of the parts are already implemented in the DAML+Oil project, by Eric Miller at W3C (lead of W3C Semantic Web activity) or in commercially available tools (e.g. XML Spy). Other parts will require design and implementation. The whole package will need to be put together, along with example queries, DB entries etc, to produce a documented demonstration showing cost effectiveness of the approach.

The activity for WP2 is to integrate the components of the architecture, produce the demonstration materials, ensure that the demonstration is robust, to document the demonstration to show the cost-benefits of using semantic web technologies. To provide a completed demonstration that can be used to demonstrate the benefits of the approach to potential adopters of semantic web technologies.

Project: Mapping of Complex Forms

This is the continuation of WP1 work in the same area where Forms are the current customer organization interface.

The motivations for this project are numerous: Organizations need efficient and powerful tool to handle forms and users request to have continuous access to the organizational interfaces. In addition, XForms intend to become the bridge across these expectations and Validation of XForms is an essential capabilities.

The goals are to be able to map complex HTML forms to the new XForms markup language, and doing so analyze XForms and gain the ability to easily handle complex forms.

We also want to develop a dictionary of modules to help in producing forms and to help in extracting information from forms.

Deviations from plan

No deviation from the plan outlined in the Technical Annex (we're just correcting the milestone agenda which seems to be wrong for this workpackage).

As announced in the D1.1 deliverable, the ratio of W3C team/office technical work is now reversed, with the W3C offices taking up most of the technical work in year2. The overall cost for WP1 + WP2 development remains stable.

Also, the Technical Annex describes for WP2 a series of deliverables and milestones which needs to be mapped to the set of "projects" indicated in this deliverable.