Services and Semantics: Web Architecture

Introduction

This document paints a picture of the emerging architecture of Web Services and the Semantic Web, two orthogonal directions of development of Web technology. These two transitions, the one from documents to messages as part of a long-running protocol, and the other from documents for human consumption to documents with machine-processable semantics, are to a certain extent independent concepts, but each to fulfill its full promise off the other. The combination is what I describe here.

This paper was written for the W3C Advisory Committee and WWW10 conference in Hong Kong April-May 2001. It is based on ideas already presented about Semantic Web architecture; about "conversations" (slides) at previous AC meetings; and the results of a two day brainstorming at the W3C Web Services workshop. I would like to be able to finish by concluding the work units which comprise the next phase, and the relationships between old and new groups both within and outside the consortium. However, the medium scale structure has not yet fully emerged, and much is elaborated in practice as the work starts.

Semantic Web - moving to meaning

To quickly review the concepts presented at AC meetings, XML2000, and now in the Activity Statement for the Semantic Web Activity, the new languages are divided into layers according to their expressive power. Built on top of XML as a standard for structured documents and data, RDF provides statements not about documents, but abstract real life objects, and the relationships (Properties) between them. Power is added to the system as RDF Schema (now in Candidate Recommendation) defines abstract Classes and allows properties to be basically categorized, and upcoming Ontology layer concepts (new RDF classes and properties) allow more complex relations such as transitivity, inverse, cardinality, and so on to be expressed.

This ontology work, and layers above, were originally considered insufficiently mature at previous meetings to merit standards work. However some work on the boundary of research and standards was deemed appropriate, and MIT/LCS, with funding from the US government rather than W3C membership, participated in a group of knowledge management researchers and practitioners which comprised Darpa Agent Markup Language (DAML) and worked closely with the European (Ontology Interchange Language) group. There was a mutual understanding that the work would transition to W3C when the time was ripe, a space for it being written into the Semantic Web Activity. That time seems to be now, as the DAML+OIL combined document is not a research paper but a standards document. Here the word "ontology" is used for a schema at this level of expressiveness. A DAML+OIL ontology can for example, define specific concepts of people, males and females, parents and children, and declare that a person can have at most two parents.

Extending the power of languages to another level, rule languages allow patterns of statements to be matched and new patterns generated in a way which while powerful remains computationally tractable. Many rule-based systems exist, and while no standard exists, there are many very similar systems. Rules and query languages are essentially the same, the distinction being how they are used. Whilst originally the rules work seemed far-off compared to RDF, two factors suggest work should be started sooner rather than later. One is that XML/RDF rule based languages such as RuleML and RIL are producing pressure for a common language and threat of fragmentation earlier than anticipated. The other is the pressure for query, a relevant example being the Web service directory query. The W3C Query workshop provided a background for both tree and graph query languages, and hopefully the significant and relevant experience in XML query will result in an RDF query language which is very closely related to the XML query language. (It is not practical to make an RDF query by querying the XML representation of the RDF because of the 1:many mapping of graph to syntax, but an XML query may be compared with a query on the infoset graph, which could be expressed as an RDF query).

Further still up the ladder of expressiveness, the logic language would allow anything to be expressed from any of the less expressive languages, and RDF would be a subset of that language. This is still at the research phase, with informal discussion on the RDF Logic list. Whilst it looks as though the basic model will be a monotonic but not classic logic, it is not currently clear to me to which extent a complete system of logic will be shared between applications, and to what extent each will refer to its own set of axioms.

Web Services - from document to protocol

In my AC talk of November 2000, I compared the difference between graphic, shared information interfaces and conversational interfaces on the human side with the relationship with the difference between documents and protocols on the machine side. In each case, we have two useful and powerful models, one of quasi-static information space, and the other of a set of utterances or messages which together form a social or business interaction between two agents. A coordinated architecture makes the best of both models and relates them. One way to do this is to extend the concept of validity and semantics of a document alone, to that of the validity semantics of messages in the context of all the messages to which they relate. The ensemble of messages form a transaction in some sense, possible multiple agents.

At that AC meeting, there was a call from several members for a W3C Web services architecture. This resulted in the W3C Web services workshop at which many components and plans were proposed. The components called for fall into two groups: extending XML Protocol run-time features for the actual operation of a service, and metadata about the services.

Remote Operations

The XML Protocol work currently deals with the remote operation functionality described in the SOAP/1.1 submission. Extensions suggested included capability for binary attachments, security features, and reliable messaging, to strengthen the runtime operation of a remote operation so as to support serious use in electronic business between parties with or without trust relationships.

The need to be able to attach arbitrary documents to an XML Protocol message was clear. Indeed, the need for XML packaging equivalent to the MIME multipart functionality has been around fro a long time, but the momentum to "just do it" has not appeared in the past. This should, it sees, be planned as a necessary support to this work. It should include the packaging of multiple things together, the inclusion of binary objects, and the indication of which is the "cover note" which carries the semantics of the overall package.

A further need was "reliable messaging". TCP/IP, of course, provides reliability in that an agent can be absolutely sure that its peer has received a given item of information when it has received a response. HTTP is reliable in that sense. However, reliability here is in another sense. No one keeps HTTP requests and responses (or old IP packets) for a long time, to prove that a message was sent -- there is no way of demonstrating that they really were sent anyway. Business applications, however, will have to keep the messages for many years, beyond the life of the hardware or software which originally handled them. This means that the format of such messages must be as clean as possible, and largely independent of as much transient technology as possible. Reliable messaging involves building this on top of XML.

Message routing was a third part of the protocol infrastructure which was called for, and security a fourth. These both are rather vague requirements at this stage.

Ontology of services

At the metadata level, the description, and discovery of services were addressed. Here service means the abstraction of the role an agent can play in a protocol. For example, a printer service is the role of server in a client/server printing protocol, and credit card authentication service is the role intermediated by one of the boxes surrounding the checkout at a supermarket. The description of services is addressed by WSDL, also submitted to W3C. Whilst this defined the functionality well and was clearly requested to become a work item, WSDL in practice defines abstract objects such as services, ports and bindings between them which in the W3C metadata architecture would normally be modelled in RDF. A mapping to an RDF vocabulary for services would enable the activity to make use of generic metadata tools for storage, search, translation and inference, as several attendees pointed out. There was a call for a way of providing, in an XML schema, a mapping to RDF. This requirement has surfaced before, and sounds like a necessary work item.

Context

The context of a message here refers to information about the agents sending or receiving it and other messages in the transaction. This might include preferences, including information which is commercially or personally sensitive. This information may affect both the operation of the service itself, and also the selection of a service from a directory. The CC/PP framework is clearly relevant here, as is P3P. The fact that they are RDF-based should make their integration into service processing and service description queries easier than otherwise. there may be a need for more work in both domains (profiles and privacy) to apply the concepts and technology to Web Services.

Flow

When moving from a simple remote operation to a protocol of multiple messages and possibly multiple agents, then work flow or process flow becomes an important aspect of service description. Technically flow is the technology which defines the state of a transaction as a function of the messages to date (or the recent state and message since that time). Whilst there seems to be no good name for this relationship, I have used the term "paper trail" for it in previous notes. To explain by example, while one can look at an invoice in XML as being of itself valid from the point of view of being well-formed XML, and having a total which is indeed the sum of the items mentioned, its validity within a commercial protocol requires also that the purchase order it refers to be an order from the same agent for the same goods, with reference to a quote or catalog which gives prices for the items the same as those mentioned on the invoice, and so on. Similarly, the meaning of an invoice can be rather weakly defined in a specification in english. However, the semantics in the protocol context can be defined in a machine processable way so as, for example, to define a function which is true only when as set of messages represent a set of completed transactions, and neither party owes the other anything. This is much more powerful as a tool for avoiding misunderstanding. It provides ways of proving that a protocol accomplishes a given goal. This in turn allows a protocol to be used as a building block in a larger scale protocol. This composability was called for repeatedly at the conference. (The pi-calculus was mentioned as a suitable formalization for flow by several parties. This models an automaton as changing state on message transmission or reception, and builds large automata out of many small ones).

It is worth noting that reliable messaging, above described as an necessary addition to the XML Protocol infrastructure, is a special case of a two-party protocol. Process flow technology should be able to describe the reliable messaging protocol. It is not clear indeed whether the reliable messaging protocol should be developed as an atomic feature, or whether the atomic feature should be the message, and reliable messaging be distilled as an application of flow technology. It seems that the first is a short term solution which matched existing RPC concepts, but the last is the cleaner long-term solution. The art, of course, is to do both at once, and so make a practical engineering design which in fact is an example of a general system.

Directories of services

A Web service is just one type of thing whose concepts can be described with an ontology, and whose descriptions can be exchanged in RDF. Like many other things, from books to aircraft parts to biogenetic compounds, there are needs for directories. One must be able to crawl a Web of data about them, and build directories, submit data to directories explicitly, query directories for examples matching certain criteria, synchronize parts of directories between directory servers, and so on. The clean way to structure this is for one or more generic RDF query engines to be used, with a vocabulary specifically for describing Web services (derived directly from the WSDL work). That is, to have a special query protocol for each vocabulary clearly misses an obvious opportunity for modularity and reuse. It should be noted also that these directories are not likely to be limited to specific Web Services vocabulary: there was a strong call for extensibility of the formats for all kinds of related data. RDF stores naturally can adapt to holding information from many vocabulary, as the individual applications require. For example, a directory of printer services may have ontologies of paper types, colors and ink types, and may overlap with directories of network devices, and inventory. This extensibility, and the ability to distinguish standard and local terms, all comes, of course, with RDF.

Business applications

The infrastructure described above will provide powerful tools for creating electronic business applications. The actual mapping of business practice to protocols, and the creation of protocol flow descriptions to define them, are tasks for business groups in consort with international and governmental groups. An ongoing work is to define clear boundaries of technical scope between W3C work and other initiatives such as ebXML.

That said, it is always necessary to have a few early adopting applications to test infrastructure.

Conclusion

Some of the dependencies are roughly illustrated in the diagram, where arrows represent dependencies of different types. the Semantic and Service directions complement each other, and really require each other to be done well. For example, a service directory query should be a generic RDF query, which should send a Web service a query written in RDF query language using a Web services vocabulary. Things could be hacked so that the Services used ad-hoc semantics and the Semantics used ad-hoc services, but that would result in duplication of code and lack of reuse and all the accompanying spinoffs.

One must of course be careful, when reuse has been maximized that the whole is still subsettable. There has been a complaint voiced recently that XML Schema is too monolithic, and a fear that soon any cellphone will have to support full XPointer. Mean while XHTML has done a retrospective modularization operation. Let us hope that Web Services and the Semantic Web will use flexibility points to provide a a consistent architecture, but not though being interlinked inextricably.

Tim Berners-Lee

2001/04

$Id: 30-tbl.html,v 1.28 2001/04/27 06:37:20 janet Exp $

Appendix: dependencies: rdf/xml, n3, ... diagram