Modeling Distributed Authoring
Using Hypertext Links

Abstract

This document describes a model where distributed authoring and versioning (DAV) mainly is expressed as hypertext links binding together pieces of information. The model describes by imposing semantics on hypertext links we can build a powerful platform supporting DAV with an open-ended set of features.

Introduction

Many DAV models divide information into two categories: Documents and metainformation. The documents contain information and the metainformation contains information about the documents. The main advantage of this distinction is that all metainformation about a document is directly associated with the document and can be treated as an intimate part of the document. Many existing text formats reflects this dependency by directly incorporating all metainformation about the document into the same container as the document itself. When applied to the Web, however, the disadvantages of referring to metainformation by value instead of by reference are significantly bigger than the advantages:

Metainformation is always transferred along with the document causing problems on bandwidth constrained links.
Metainformation is not cachable by itself
Metainformation can not have metainformation and can not be inherited among documents
It is not possible to create independent third party metainformation about a document

This document takes the opposite standpoint that both documents and metainformation is information and therefore is a resource accessible via a URI. We do not distinguish between information and metainformation at the resource level but rather at the link level. The result is that any resource may be a piece of metainformation and any resource may be a document all depending on the link relationships between a resource and all other resources.

The Link protocol header was introduced very early on in the Web model along with the Link HTML element. The Link header and the Link element are semantically identical and they can be used interchangeably. The Link header and tag provides a means for describing a relationship between two resources, generally between the requested resource and some other resource. The Link header is described in the HTTP/1.1 specification and the Link HTML element is described in the Webmap draft. HTTP/1.1 also defines two methods, LINK for creating references between already existing resources and UNLINK for deleting existing relationships. We will in this document describe how the Link header, the LINK HTML element and the LINK and UNLINK methods can be used in DAV.

Attributes

There are several formats for representing meta information - some already used on the Web are MIME and PICS. The most prevalent mechanism in HTTP is to use MIME headers describing the content of the HTTP message. This representation is equivalent to "call-by-value" in that the receiver has no means of changing or otherwise accessing the information. The Link HTTP header and the Link HTML element, on the other hand, describe a relationship between two hypertext links where the value is referred to by a URL. That is, the value is not transmitted directly - only its reference which is equivalent to "call-by-reference".

By referring to metainformation using a hypertext link or URL, the metainformation itself becomes a resource. This resource is no different from any other resource and can also have metainformation associated with it. That is, the notion of metainformation and information is indistinguishable as any document can be pointed to and point to any other document. When we mention metainformation in this document we mean a resource that contains information about another resource.

Attribute Naming

See the Webmap draft for all the introductory stuff. This document defines a small set of link attributes that may be dynamically extended. Attribute names are of type TEXT as defined in HTTP/1.1 specification.

Open Issues

How can we describe extensions so that clients can pick them up dynamically?
How does attribute names relate to i18n. We can be solved by adding another indirection in the attribute name. The first resolution is a mapping from a local name to a standard token, for example "Author" and the next is a mapping to the value URL.

Creating, Modifying, and Deleting Attributes

In some situations, it does not make sense to represent an attribute as a resource. For example, the "Last-modified" attribute defined by the HTTP/1.1 specification is often better represented as a MIME header than as a Link attribute which would have to be resolved by following the link. The same is the case with most of the other headers defined in HTTP/1.1. However, if the client is to be able to create or modify an attribute then it must be represented by a URL.

A client can create a new resource and link it to an existing resource using the Link header as shown in this example:

	PUT /statistics/author.html HTTP/1.1
 	Link: <data.html>; rev="Author"

If a forward link attribute is to be made and "data.html" already exists then it can be done using the LINK method defined by the HTTP/1.1 specification:

	LINK /statistics/data.html HTTP/1.1
 	Link: <author.html>; rel="Author"

In the same way, a link relationship between resources can be deleted using the UNLINK method

	UNLINK /statistics/data.html HTTP/1.1
 	Link: <author.html>; rel="Author"

As attributes are resources, "Author.html" may in fact be a content negotiable, version controlled resource. We will later describe how to handle version control and content negotiation.

Note that we in these examples have use the HTTP Link header only which is independent of the media type of any of the documents involved. The Link header can be used to create relationships between arbitrary resources.

Open Issues

Do we need an "Unlink" header in order to be able to delete relationships without requiring a separate roundtrip. I think so! This would take the need for an UNLINK method away as LINK then would cover all situations, for example

	LINK /statistics/data.html HTTP/1.1
 	Unlink: <author.html>; rel="Author"
 	Link: <new_author.html>; rel="Author"

Attribute Lookup

In order to inspect, create, modify, and delete attributes, a client must know where existing attributes are located and how to access them. As the set of attributes is open-ended, we define the "Attribute" link attribute as a bootstrap mechanism for finding attributes. The attribute resource may be a searchable resource or a collection that contains links to other resources related to this resource. In case of a collection, we may have something like this:

	GET /statistics/data.html HTTP/1.1
 
 	200 OK
 	Link: <attributes.html> REL="Attributes"
 
 	GET /attributes.html HTTP/1.1
 
 	200 OK
 	<document ...>

We can indicate that the attribute resource is searchable using the CLASS attribute of the link header:

	GET /statistics/data.html HTTP/1.1
 	
 	200 OK
 	Link: <attributes.html> REL="Attributes" CLASS=searchable
 
 	GET /attributes.html?Author;ACL;Dependencies HTTP/1.1

Note: The search format should follow the LDAP search format as described in ???

Both of the mechanisms above require a minimum of two round trips before the URL for the attribute has been found which may end up with three round trips for looking up an attribute. Often, this is not required as the already in the initial request can indicate which attributes it would like to resolve. This could be done using an Attribute header, for example

	GET /statistics/data.html HTTP/1.1
 	Attributes: Author, ACL, Dependencies
 
 	200 OK
 	Link: <author.html>; rel="Author"
 	Link: <acl.html>; rel="ACL"

In indicated in the example above, the server may leave out attributes that it doesn't know abut.

Open Issues

Do we want to make the attribute collection a webmap or is there any conflict? I actually don't think so.
Do we want a special flag in the Link header to indicate whether a URL is searchable or not? This would make it a lot easier to indicate how attributes can be looked up. We could also make it searchable by default but this would make it harder to start using as we then have to rely on the server supporting search.
Should a search return URLs or values directly? Do we have to support both?

Collections

Web collections are collections of document as defined in the document "Web collections: A mechanism for grouping documents". I hope that we can adopt the definition directly or at least work together with them

Collections of Attributes

As collections are resources and attributes are resources, attributes can be collections. This means that an attribute can be a collection of other attributes or of other resources entirely. As collections is a mechanism for grouping together related links, we can use this mechanism to group together related attributes. In the examples below, we show how this can be used in the case of version control and content-negotiated resources.

Hierarchical Attributes

As collections are hierarchical and attributes can be organized in collections, attributes can also be hierarchical.

Open Issues

It is a lot less safe to use the URL hierarchy to indicate attribute hierarchies. This can be used internally by the server as it knows the URL space that it servers but clients should probably not do it.

The About Header

This model does not impose any restrictions on which mechanism to use for handling version control, access control, nor any other element contained in the model. In a sense the model is strictly a mechanism for how to locate where to find information about version control, access control, and any other information related to DAV.

In some situations, the server may want to bind additional information to the Link header, for example what type of version control is actually used for a particular resource. This can be done using the About HTML tag as described in the Webmap draft.

Open Issues

I don't know if the About element also has an HTTP header equivalent. This should be the case.

Links and Collections applied to DAV

In the following sections, we show how semantic links and collections can be applied to various of the distributed authoring scenarios.

Dynamic Documents

The output of a dynamic document or resource is generated as a result of a script or a process using a one-way function. The entity included in the HTTP response message differs from the server's internal representation of the resource in such a way that the server is not capable of transforming edits to the generated entity back to the server's internal representation. Server-Side-Includes (SSI) and CGI-scripts are two popular mechanisms for producing dynamic contents varying from simple variable substitutions to computationally complex database queries.

As edits applied to the entity sent in an HTTP response can not be transferred back to the resource, the editor must use some other mechanism for changing the resource. The "Source" link attribute allows the server to export an editable version of the resource with a URL and relate it to the generated resource by a link. The "Source" attribute guarantees that there is a two-way function between the entity included in a HTTP response and the server's internal representation.

Note, that as the "Source" document itself is a resource, it has its own content type, modification date, and other metainformation related to it. Identifying the source as an independent resource has the advantage, that all other operations and semantics defined for resources including caching etc. are directly applicable.

In the example below, the first resource is dynamically generated from the second resource. The "Source" link attribute defines the relationship between the two resources.

	http://w3.org/statistics/data.html
 	http://w3.org/statistics/generator.cgi
 	Link: <generator.cgi>; rel="Source"

Open Issues

A dynamic resource is created using PUT just as any other resource. The server knows that a resource is dynamic from the Content-Type and or from a reverse "Source" link attribute
I consider it out-of-scope to put demands on timing issues between editing the source and when the changes turn up on the server

Content Negotiated Resources

Resources that are subject to content negotiation can have multiple variants of the same contents. A document can exist in English, in French, and in Danish, for example. The specific variant served in response to an HTTP request is a function of the request profile. The server indicates that a resource has multiple variants by including a "Vary" header in the response describing the dimensions by which the resource varies. In the example above, the dimension would be "Content-Language".

Variants can be generated dynamically from a single resource or maintained as individual resources possibly maintained by independent authors. In the first case, a smart server may generate image formats on the fly as a function of what was requested by the client, for example. In this situation, editing the resource should be done as described under Dynamic Documents. In the second case, each variant is maintained as an individual resource with its own name.

If the same author is to edit multiple varying resources then the client must be able to edit and save any of the variants - not only the resulting variant of any content negotiation applied to the HTTP request. This requires that the client can query which variants are available and how to access them. The "Variants" link attribute points to a collection of all resources which may be used to generate a response on this URL.

Note that the link model does not impose any naming convention on how each variant is named. It is up to the server or servers to decide how to name variants of a resource and the name is completely opaque to the client. The "Variant" link attribute provides the client with a handle for how to find the set of variants for a particular resource.

In the example below, the first URL is the URL for the generic resource on which the server does content negotiation. The second URL identifies a specific variant which may be used to produce the response. The third URL locates a collection of all available resources that may be used to produce the response and the Link header defines the relationship between any of the first two URLs and the third.

	http://w3.org/statistics/data
 	http://w3.org/statistics/data.fr.html
 	http://w3.org/statistics/variants.html
 	Link: <variants.html>; rel="Variants"

Version Control

The need for version control arises naturally in many situations concerning distributed authoring on the Web. New revisions may be generated automatically by the server when a client saves an updated version. In a fully version controlled system a client must be able to edit and update any of the existing revisions of a document. This requires that the client can query which versions are available and how to access them.

All available revisions of a document have a unique name controlled by the server in any way is a resource just as any other document. A revision does not have to exist as an individual file on the server at any point in time - it may be generated from a delta storage system upon request, for example. As a revision generated from a delta storage system is a two-way function, revisions are not to be considered as dynamic documents. Note that not all revisions have to be available or even exist to all users. A good example is a news paper where the readers have access to the final daily revision and the editors have access to all intermediate draft revisions.

As revisions of a document are documents related over time we define a version collection to be the set of all available revisions. There can be multiple time lines, i.e. branches, of a document in which case the collection will form a tree structure. If the document evolves along a single time line then the collection will form a sequential line.

The "History" link attribute points to a collection of all available revisions of a version controlled resource. As for content negotiated resources, the link model does not impose any naming convention on how each revision is named and no specific version control model is required.. The "History" link attribute provides the client with a handle for how to find the set of revisions of a version controlled resource.

Access Control

Editing tools need to update more than just content of web pages; they need to allow the author to express auxiliary information such as access control, pricing, etc. We use the "ACL" attribute to identify the access control information for a given resource. The access control information itself is a resource and can have other metainformation associated with it as well as having its own access control.

The "ACL" attribute allows a client to discover the address of an access control resource; for example, a form that allows the author to set access control policies for a page. It does not provide any information on how the access control information is organized or how to edit the information.

	http://w3.org/statistics/data.html
 	http://w3.org/statistics/ACL.html
 	Link: <ACL.html>; rel="ACL"

Document Dependencies

Document dependencies also arises naturally in distributed authoring. Some of the more classic examples of dependencies are updating an index and the table of contents when adding a new document to a collection. As Web contents often is divided into multiple documents the task of updating dependencies may have to be executed on the server.

The "Dependencies" attribute allows a client to discover the address of a dependency resource; for example, a form that allows the author to set dependencies. It does not provide any information on how the dependency is organized or how to edit the information.

	http://w3.org/statistics/data.html
 	http://w3.org/statistics/Dependencies.html
 	Link: <Dependencies.html>; rel="Dependencies"

Open Issues

Should we have a default set of dependencies, for example always to update the table of contents when editing a page?
How should we indicate when dependencies should be computed. In some situations it can be done immediately and in others we are better off doing lazy evaluation.

Creating Collections on the Fly

We have in the previous sections seen examples of how collections can be used to describe relationships between document for example to handle version control and content negotiated resources. In these cases, collections have the same order of lifetime as the documents contained by the collection.

However, collections may also be created in order to perform a single operation and then deleted again. Imagine, that we want to delete a resource together with all its variants, previous revisions of the variants and the access control list associated with the document. This can be done by deleting each of these resources independently which may be a very cumbersome operation - especially if we want to provide a multiple operations to the resources.

Instead we can define a collection containing the resources and then perform the operations on this collection. A collection is created as any other resource by a PUT operation.

Open Issues

I have deliberately not mentioned transactions as I am not sure that we need them (or can make them reliable)
To make some file system equivalent operation we may want to include wild cards in the URLs

Security Concerns

Semantic links are two directional which is reflected in the forward "rel" attribute and the reverse "rev" attribute. Just as normal hypertext links, anybody can make a forward link indicating that a resource contains information about another resource. This document does not give any mechanism for providing any guarantee about what is authentic information and what is not.

Henrik Frystyk Nielsen,
@(#) $Id: Attributes.html,v 1.3 1997/08/09 17:40:00 fillault Exp $

Modeling Distributed Authoring Using Hypertext Links

Abstract

Introduction

Attributes

Attribute Naming

Creating, Modifying, and Deleting Attributes

Attribute Lookup

Collections

Collections of Attributes

Hierarchical Attributes

The About Header

Links and Collections applied to DAV

Dynamic Documents

Content Negotiated Resources

Version Control

Access Control

Document Dependencies

Creating Collections on the Fly

Security Concerns

Modeling Distributed Authoring
Using Hypertext Links