Using W3C's Recommendations for Enterprise Applications

by Rigo Wenning, Policy Analyst, W3C/INRIA (see also the slides)

Note: All content that I'm referring to is available from the W3C-site. I will provide links to the appropriate pages in this presentation.

Introduction

The objective, why we are meeting is to precise the technical objectives required to allow interoperability of the next generation of enterprise applications. What is the role of W3C in this scenario? The goal of W3C is to lead the Web to it's full potential. In our context, it means, that the W3C is first of all focused on things that are important for the Web. So we could end the presentation here and say: This is out of scope for W3C.

But the Industry at large did not follow exactly the politics of W3C I just described. After the W3C has defined XML, this technology took off. All the world is using XML today. The use of XML today goes far beyond the Web. XML is used in most B-to-B Solutions. W3C continues to develop this technology and adds new functionality to define an overall architecture. XML is designed for a webed world, but business today is also webed together in many ways.

So it was natural, that a large variety of actors came to W3C and asked us to develop the XML for real - estate, the XML for car-manufacturers, for the chemical industry, for whatever you could imagine. W3C responded until now, that it hasn't sufficient knowledge within it's Team to lead those kind of activities. But this could change on demand of it's Members, when they are willing to dedicate new ressources to the development of XML-Applications. At the moment, we spend all our capacity to finalize the basic technology to effectively use XML, e.g. XML Schema, before spending time on defining applications based on our existing specifications.

In a very recent presentation of Tim Berners-Lee has presented the overall architecture, which will be based on XML. I will try to translate this architecture to give you an idea of how it can make also business-applications interoperable. It is also recommended to read Berners-Lee's thoughts on Web Architecture from 50,000 feet.

The Basic Architecture

XML - The basic specification

Extensible Markup Language, abbreviated XML, describes a class of data objects called XML documents and partially describes the behavior of computer programs which process them. XML is an application profile or restricted form of SGML, the Standard Generalized Markup Language [ISO 8879]. By construction, XML documents are conforming SGML documents.

XML documents are made up of storage units called entities, which contain either parsed or unparsed data. Parsed data is made up of characters, some of which form character data, and some of which form markup. Markup encodes a description of the document's storage layout and logical structure. XML provides a mechanism to impose constraints on the storage layout and logical structure.

A software module called an XML processor is used to read XML documents and provide access to their content and structure. It is assumed that an XML processor is doing its work on behalf of another module, called the application. This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

The design goals for XML were:

XML shall be straightforwardly usable over the Internet.
XML shall support a wide variety of applications.
XML shall be compatible with SGML.
It shall be easy to write programs which process XML documents.
The number of optional features in XML is to be kept to the absolute minimum, ideally zero.
XML documents should be human-legible and reasonably clear.
The XML design should be prepared quickly.
The design of XML shall be formal and concise.
XML documents shall be easy to create.
Terseness in XML markup is of minimal importance.

The XML-specification was a huge success in the market-place. At CeBIT in Febuary, there were a vast majority of companies promoting their products by saying, that they are XML-based. As a first introduction, Bert Bos, a Team-Member, has written a document he called " XML in 10 Points", but he finally had only seven points, a magic number. XML stands for eXtensible Markup Language. XML allows anyone to define his own markup-language for his own use. XML is object-oriented and was also intended to integrate the databases into the Web. But how could this be interoperable? A good description of how and why can be found in an article of Jon Bosak, published in the Scientific American.

In the Activity Statement of XML-Schema, there is a list of expectations for the future:

XML will

Enable internationalized media-independent electronic publishing
Allow industries to define platform-independent protocols for the exchange of data, especially the data of electronic commerce
Deliver information to user agents in a form that allows automatic processing after receipt
Make it easier to develop software to handle specialized information distributed over the Web
Make it easy for people to process data using inexpensive software
Allow people to display information the way they want it, under stylesheet control
Make it easier to provide metadata -- data about information -- that will help people find information and help information producers and consumers find each other

For more information, you should look at one of the numerous XML-FAQ's. But only using XML isn't sufficient to create interoperability. If everyone is using his own flavour of XML, how could we distinguish all these Markup-Languages? Could we even merge documents of different markup together to one meaningful document? That raises a lot of questions. The W3C tried to solve it by specifying, that every XML-Markup should use Namespaces.

Namespaces

The Namespaces-Specification was built, because the W3C envisioned applications of Extensible Markup Language (XML) where a single XML document may contain elements and attributes (here referred to as a "markup vocabulary") that are defined for and used by multiple software modules. One motivation for this was modularity; if such a markup vocabulary exists which is well-understood and for which there is useful software available, it is better to re-use this markup rather than re-invent it.

Such documents, containing multiple markup vocabularies, pose problems of recognition and collision. Software modules need to be able to recognize the tags and attributes which they are designed to process, even in the face of "collisions" occurring when markup intended for some other software package uses the same element type or attribute name.

class="TStart">An XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names. XML namespaces differ from the "namespaces" conventionally used in computing disciplines in that the XML version has internal structure and is not, mathematically speaking, a set. These issues are discussed in " A. The Internal Structure of XML Namespaces".

class="TStart">URI references which identify namespaces are considered identical when they are exactly the same character-for-character.Note that URI references which are not identical in this sense may in fact be functionally equivalent. Examples include URI references which differ only in case, or which are in external entities which have different effective base URIs.

The Namespace-definition allows us, to merge ERP-documents built with the software from vendor A with documents from vendor B without forcing them to use exactly the same markup. Namespaces help us to stop the confusion, if it occurs, that vendors use the same tag for different things.

But Namespaces by themselves don't tell anything about the meaning of the tag and his function in the architecture of the used application. A computer can't find out on it's own, what the tag <ERP> means in the context of a specific document. But how could one describe this? The tags must be defined somewhere.

XML Schema

In SGML and -as a consequence- in HTML, the tags are defined by a DTD, a Document Type Definition. This DTD was defined by a the HTML Working Group within W3C. But DTD's have limitations and the W3C tries actually to fill in the gap by defining a Schema-Language for XML called XML Schema. While XML 1.0 supplies a mechanism, the Document Type Definition (DTD) for declaring constraints on the use of markup, automated processing of XML documents requires more rigorous and comprehensive facilities in this area. Requirements are for constraints on how the component parts of an application fit together, the document structure, attributes, data-typing, and so on. The XML Schema Working Group [members only] is addressing means for defining the structure, content and semantics of XML documents.

The XML Schema Working Group is currently under way to solve the following issues:

structural schemas

a mechanism somewhat analogous to DTDs for constraining document structure (order, occurrence of elements, attributes). Specific goals beyond DTD functionality are

integration with namespaces
definition of incomplete constraints on the content of an element type
integration of structural schemas with primitive data types
inheritance: Existing mechanisms use content models to specify part-of relations. But they only specify kind-of relations implicitly or informally. Making kind-of relations explicit would make both understanding and maintenance easier

primitive data typing

integers, dates, and the like, based on experience with SQL, Java primitives, etc.; byte sequences ("binary data") also need to be considered

conformance

The relation of schemata to XML document instances, and obligations on schema-aware processors, must be defined. The Working Group will define a process for checking to see that the constraints expressed in a schema are obeyed in a document (schema-validation); the relationship between schema-validity and validity as defined in XML 1.0 will be defined.

The XML Schema work is interdependent with several other areas of W3C activity. These are listed below under Design Principles. A good overview can be found in W3C's Activity Statement for work on XML Schema. The requirements document also helps to understand the scope of that work. XML-Schema is currently still work in progress

DOM

The DOM is the API for XML. A short summery from the Activity Statement:

W3C's Document Object Model (DOM) is a standard internal representation of the document structure and aims to make it easy for programmers to access components and delete, add or edit their content, attributes and style. In essence, the DOM makes it possible for programmers to write applications which work properly on all browsers and servers, and on all platforms. While programmers may need to use different programming languages, they do not need to change their programming model.

W3C's Document Object Model thus offers programmers a platform- and language-neutral program interface which will make programming reliably across platforms with languages such as Java and ECMAScript a reality.

W3C is currently completing DOM level 2, with many new features, such as the ability to query and set style properties, and the provision of a generalized model of event handling.

Using the DOM together with XML-based solutions provides a way to have reliable solutions, even in heterogenous environments.

URI's

The requirement to have all resources in a hypermedia system addressable was identified long ago in Douglas Engelbart's seminar paper (see also, An Evaluation of the World Wide Web with respect to Engelbart's Requirements). The ability to make a reference to a resource with a URL enables linking, searching, and a variety of navigation and access techniques.

Some services make information available via the web, but not addressable. For example, results of database queries using POST (rather than GET) are not addressable. A items in a catalog put on the web this way can't be linked to, and cannot participate in third-party search services. This unfortunate choice by some information providers reduces automation and scalability in the web.

It is also unfortunate that, for example, headings in HTML documents are not addressable unless they are marked up as anchors explicitly. See XML section in the Addressing-Page.

For Enterprise Application, this means, that they should base their addressing scheme on URI's to allow a seemless integration into an intranet or a very easy publishing of parts onto the internet. If this is not 100% possible, an application should provide a second addressing scheme with URI's wherever possible.

XML-Protocol

The goal of XML Protocol is to develop technologies which allow two or more peers to communicate in a distributed environment, using XML as its encapsulation language. Solutions developed by this activity allow a layered architecture on top of an extensible and simple messaging format, which provides robustness, simplicity, reusability and interoperability.

The initial focus of the XML Protocol Working Group is to develop a framework for XML-based messaging systems, which includes specifying a message envelope format and a method for data serialization, directed mainly, but not exclusively, to RPC applications, and conforming with the abovementioned principles. More specifics are available from the Charter.

A broad range of applications will eventually be interconnected through the Web. The initial focus of this Working Group is to create simple protocols that can be ubiquitously deployed and easily programmed through scripting languages, XML tools, interactive Web development tools, etc. The goal is a layered system which will directly meet the needs of applications with simple interfaces (e.g. getStockQuote, validateCreditCard), and which can be incrementally extended to provide the security, scalability, and robustness required for more complex application interfaces. Experience with SOAP, XML-RPC, WebBroker, etc. suggests that simple XML-based messaging and remote procedure call (RPC) systems, layered on standard Web transports such as HTTP and SMTP, can effectively meet these requirements. Specifically, the XML Protocol Working Group is chartered to design the following four components:

An envelope for encapsulating XML data to be transferred in an interoperable manner that allows for distributed extensibility and evolvability as well as intermediaries.
A convention for the content of the envelope when used for RPC (Remote Procedure Call) applications. The protocol aspects of this should be coordinated closely with the IETF and make an effort to leverage any work they are doing, see below for details.
A mechanism for serializing data representing non-syntactic data models such as object graphs and directed labeled graphs, based on the datatypes of XML Schema.
A mechanism for using HTTP transport in the context of an XML Protocol. This does not mean that HTTP is the only transport mechanism that can be used for the technologies developed, nor that support for HTTP transport is mandatory. This component merely addresses the fact that HTTP transport is expected to be widely used, and so should be addressed by this Working Group.

CSS

So now, we have our content described with angle brackets, located and identified with URI's. We can transform and select parts of the content. But how do we show it to the user of such a system?

There is a general design principle, that recommends strongly to separate structure/content from presentation. W3C presenters have called it also separation of structure and style. (See the presentation of Bert Bos on WWW9). But why should we do so? There are many reasons:

Allow different devices (like mobile, PDA, text-browsers) with different capabilities to show the content
Make your content accessible (see the Web Accessibility Initiative)
Make direct rendering of XML possible with CSS
Easier to edit
Smaller documents

Conclusion

There is a very flexible globally standardized system for information management, ready to be used. But there are many ways to use this framework for Enterprise Applications. So there is still a need for further standardization for the specific context in question.

W3C Solutions based on the XML-Framework

XSLT

XSLT provide's a standardized way for the transformation of XML Documents.

This specification defines the syntax and semantics of the XSLT language. A transformation in the XSLT language is expressed as a well-formed XML document [XML] conforming to the Namespaces in XML Recommendation [XML Names], which may include both elements that are defined by XSLT and elements that are not defined by XSLT. XSLT-defined elements are distinguished by belonging to a specific XML namespace (see [2.1 XSLT Namespace]), which is referred to in this specification as the XSLT namespace. Thus this specification is a definition of the syntax and semantics of the XSLT namespace.

A transformation expressed in XSLT describes rules for transforming a source tree into a result tree. The transformation is achieved by associating patterns with templates. A pattern is matched against elements in the source tree. A template is instantiated to create part of the result tree. The result tree is separate from the source tree. The structure of the result tree can be completely different from the structure of the source tree. In constructing the result tree, elements from the source tree can be filtered and reordered, and arbitrary structure can be added.

A transformation expressed in XSLT is called a stylesheet. This is because, in the case when XSLT is transforming into the XSL formatting vocabulary, the transformation functions as a stylesheet.

XPath

XPath is the result of an effort to provide a common syntax and semantics for functionality shared between XSL Transformations [XSLT] and XPointer [XPointer]. The primary purpose of XPath is to address parts of an XML [XML] document. In support of this primary purpose, it also provides basic facilities for manipulation of strings, numbers and booleans. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document.

Like in a SQL - query, XPath allows to select a part of an XML-Document for further processing, allowing to use XML Documents like databases.

XForms

But how to interact with your application? Forms were introduced into HTML in 1993. Since then they have gone on to become a critical part of the Web. The existing mechanisms in HTML for forms are now outdated, and the HTML Charter calls for work on an improved match to workflow and database applications, as well as support for an increasingly diverse range of browser capabilities.

W3C is currently developing a new generation of Mark-up for Forms, that is much richer than what we know today. Goals are:

Support for handheld, television, and desktop browsers, plus printers and scanners
Richer user interface to meet the needs of business, consumer and device control applications
Decoupled data, logic and presentation
Improved internationalization
Support for structured form data
Advanced forms logic
Multiple forms per page, and pages per form
Suspend and Resume support
Seamless integration with other XML tag sets

XML - Signature

How to secure your data? How can you make the information provided reliable? Digital signatures provide integrity, signature assurance and non-repudiatability over Web data. Such features are especially important for documents that represent commitments such as contracts, price lists, and manifests. In view of recent Web technology developments, future work will address the digital signing of XML -- and any of its applications such as RDF (Resource Description Framework) or P3P (Platform for Privacy Preferences). This capability is critical for a variety of electronic commerce applications, including payment tools.

XML-Signature is developed as a cooperation of IETF and W3C. It's goal is to develop an XML compliant syntax used for representing the signature of Web resources and portions of protocol messages (anything referencable by a URI) and procedures for computing and verifying such signatures.

XML-Signature is currently in Last-Call. At this stage, the Working Group submitted their Draft to public scrutiny.

RDF

We all know, that too much information can trouble the building of knowledge. How to find something in all that incredible amount of information, that is provided by the Internet and the Intranets of today. Companies get more and more flat structures. At the 9th World Wide Web - Conference in Amsterdam, there was a paper presented, that tried to use the Ressource Description Framework to help the structuring of information.

The FAQ on RDF give's the following reasons, why RDF might be an interesting aspect, even for Enterprise Applications:

interoperability of metadata
machine understandable semantics for metadata
better precision in resource discovery than full text search
future-proofing applications as schemas evolve
a uniform query capability for resource discovery
a processing rules language for automated decision-making about Web resources
language for retrieving metadata from third parties

General Conclusion

The overall architecture can be presented with the following chart:

To make business applications interoperable one could use W3C's technology, already present on the Intranets of today. XML and RDF allow an smooth integration of databases into Intranets and the overall workflow. XML Schema provide's a powerfull way to define Data Types. With Namespaces, every XML-based mark up can be identified. As an optional, but very useful feature, W3C encourages the developers of XML-based mark up languages to provide the appropriate XML Schema at the location, which is indicated by the Namespace-URI. Once, the basic content is in XML, there are many ways to select, transform or tailor this content to specific needs. The use of XML and XML Schema facilitate's a smooth integration the application into existing environments. The use of standardized methods decreases the transformation costs. The architecture is very flexible. Perhaps, controlling and steering remotly by a PDA isn't that far away anymore.

If there is a feeling, that W3C should conduct further standardization in the area, despite W3C's reluctance in the area of XML-Applications, this could be encouraged by contacting european Members. European Members of W3C can push for a W3C-Activity in this area and for the provision for the necessary ressources.

Last update $Date: 2000/10/04 20:19:07 $ by $Author: rigo $