IAQ: Collections, Resourcetype and Hierarchy in WebDAV

Collections, Resourcetype and Hierarchy in WebDAV
An IAQ (Infrequently Asked Questions)

1	Why Did WebDAV Decide that the HTTP URL Namespace is a Hierarchy?
2	Why Did WebDAV Create Resource Types like the Collection Resource?
3	Why Did WebDAV Create the Resourcetype Property?
4	Why Did WebDAV Create MKCOL?
5	Why Did WebDAV Allow for Mixed WebDAV and Non-WebDAV Compliant
Namespaces?
6	Why Does WebDAV Allow for non-WebDAV Compliant Collections?
7	Why Does WebDAV Require Hierarchy in WebDAV Only Namespaces?

1 Why Did WebDAV Decide that the HTTP URL Namespace is a Hierarchy?

A typical HTTP URL looks like http://server.com/name1/name2/name3. The
HTTP/1.1 specification never defined what the "/" really meant. Did the "/"s
have any meaning or were they just decoration to help people remember where
they put their resources? This was one of the very first problems the WebDAV
Working Group (WG) had to face.

Most of the WG had the very definite idea that WebDAV should provide at
least file system level functionality. Even though most of the WG were
document management types who didn't really use file systems, they
understood that file systems were the single most common form of storage on
the planet. Matching file system functionality meant providing at least the
possibility of supporting a hierarchical namespace.

So the WG decided that the "/"s could represent a hierarchical namespace and
that it was WebDAV's job to provide the tools to create and maintain that
hierarchy if the client/server choose to make it hierarchical.

2 Why Did WebDAV Create Resource Types like the Collection Resource?

File systems contain two types of resources, files and directories. The
PUT/GET methods already provided sufficient support to simulate a file. So
this left the WebDAV WG with the job of creating directories.

The term directory was hopelessly overloaded so an alternative word,
collection, was selected in its stead.

But what was a collection?

The basic object in HTTP is the resource. HTTP does not give a very tight
definition of what a resource is. Essentially all HTTP implies is that a
resource is an object that is addressed by one or more URLs and that accepts
methods.

This lack of definition was actually a good thing. It allowed HTTP resources
to be extremely flexible. This was one of the important lessons from HTTP,
don't define anything you don't absolutely have to define. That way you
don't paint yourself into a corner. The key is to only define the parts that
are needed for interoperability.

Keeping this in mind, the WG realized that for the sake of interoperability
it needed to "remove" a little of HTTP's magical vagueness. Specifically, it
needed to create a "type" of resource. The idea behind typing was to create
a profile for an HTTP resource. The profile specified which methods a
resource of that type had to support, how they had to support them and which
methods the resource wasn't allowed to support. This was all new ground.

3 Why Did WebDAV Create the Resourcetype Property?

Now that we had created a new resource type, we needed to create a way for
clients to determine the type of resource they were talking to.

Unfortunately the WG had not yet learned the folly of live properties
(http://lists.w3.org/Archives/Public/w3c-dist-auth/1998OctDec/0302.html), so
the obvious solution was to create a new property.

The idea of creating a collection property was kicked around but eventually
rejected because the WG realized that where there was one new resource type,
there would be many more. Thus the resourcetype property was created. It
would be the central repository for declarative information regarding the
nature/type/profile of the resource.

4 Why Did WebDAV Create MKCOL?

The WG faced the problem of how one created a collection resource. The only
way HTTP provides to create a new resource is the PUT method. So the idea
was tossed around that we should add a header to the PUT method which
specified the type of resource to be created.

However there was strong objection to adding this functionality to PUT. The
core of the argument was that PUT was one of the very few extremely well
defined methods. It was used to record a byte stream that could later be
retrieved with a GET. It was considered unwise to add any "magic" to PUT. In
this case by allowing PUT to do anything other than to blindly record a byte
stream.

So a new method was needed. The choice was either to create a generic method
to create any resource type or to create a method specifically for creating
collections.

The question ended up being phrased as "Was PUT a mistake?" Should the HTTP
WG have created a generic method to create new resources? Had the HTTP WG
simply screwed up?

The conclusion of the WebDAV WG was that the HTTP WG had not made a mistake.
By creating a PUT method it was possible to carefully define exactly what it
meant to record a byte stream. That a resource could be created as a side
effect seemed reasonable enough.

But still, wouldn't the world be a better place if we created a generic
method to create any resource type? This began a long and fairly boring
debate about what the body of this magic method should be. The center of the
issue was, should the magic method be allowed to create the initial value
for the resource? In the case of a collection, this meant initially
populating the members of the collection as well as specifying the values of
those members. This meant gluing together a bunch of HTTP methods and
shoving them inside this new method. This debate had a lot to do with the
same issues touched upon in
http://lists.w3.org/Archives/Public/w3c-dist-auth/1998OctDec/0303.html.

The WG recognized an impossible rat hole when it saw one. As such the WG
decided it wouldn't try to create the universal "make any resource type"
method and instead would take inspiration from PUT and design a method
specifically for creating a collection. More than this, the WG also agreed
that it would not define a body for this method. One could be added later
but one would not be specified in the WebDAV specification. Thus, using base
DAV, the only way to create a collection is with MKCOL and the only way to
populate it is with PUTs and more MKCOLs.

5 Why Did WebDAV Allow for Mixed WebDAV and Non-WebDAV Compliant Namespaces?

The WG realized that a mechanism was needed to determine if a server was
WebDAV compliant or not. After all, since WebDAV was using HTTP on port 80
how could you tell a WebDAV compliant HTTP server from a normal HTTP server?

The obvious solution was to use the OPTIONS method. This HTTP/1.1 method was
specifically designed to provide protocol information.

At first the WG thought it could require performing an OPTIONS method on the
special request-URI "*". This request-URI had been introduced by the HTTP WG
as a means of discovering information about an entire server.

The server community reacted very badly to this proposal. "*" did not just
cover an entire server, it covered an entire HTTP namespace. When a "*"
request-URI is sent, scoping is only provided by the host header. This means
that "*" applies to all resources in that domain. In other words, it meant
that every resource in http://uci.edu would have to be WebDAV compliant if
any wanted to support WebDAV.

The reason the HTTP WG originally introduced "*" is that in the old days it
was common for a single server to handle all the URIs in an entire domain.
However a number of developments invalidated the assumption underlying "*".

The first was the introduction of server extension mechanism such as
CGI/ISAPI/NSAPI/Modules. These mechanisms allowed one to add a program to a
server so that the program controlled a part of that server's namespace.
Thus one could take an existing server and by adding an extension that
supported WebDAV, make part of the server's namespace WebDAV compliant. If
discovery could only be performed on "*" then one could only make a server
WebDAV compliant by making the entire server compliant. This loss of
flexibility was considered unacceptable.

Second, different servers often controlled parts of a HTTP URL namespace. A
redirector would be used to route requests based on the URL. Thus it was
very likely that one server may want to support WebDAV, thus WebDAV enabling
part of the HTTP URL namespace, but the rest wouldn't be interested.

So the WG came to the conclusion that it couldn't use "*". This meant that
discovery would have to be determined by executing the OPTIONS method on
each resource individually.

At this point the WebDAV client community complained. Imagine someone hands
a WebDAV client the URL http://foo/bar/blah. The WebDAV client then performs
an OPTIONS request on http://foo/bar/blah and discovers that the resource is
WebDAV compliant. Now the WebDAV client wants to display a picture of the
resource's namespace to the user. Something like:

foo
 |-bar
    |-blah

But how does the client know that http://foo and http://foo/bar are WebDAV
compliant? If they aren't and the user tries to click on them, who knows
what would happen. This was known as the mixed namespace problem. WebDAV
compliant and non-compliant resources could end up in the same HTTP URL
namespace.

Many in the client community were very concerned that they would be required
to perform discovery on every resource they worked with. Not an appetizing
thought.

The first suggest was to ban mixed namespaces. However this suggestion was
quickly rejected for the reasons given previously.

A second suggestion was to allow the root of a WebDAV namespace to be
anywhere in the HTTP namespace but to require that all the children of that
root be WebDAV compliant. Thus http://uci.edu/ may not be WebDAV compliant
but http://uci.edu/users/jwhitehead could be WebDAV compliant so long as all
its children were compliant. However this suggestion was rejected for the
same reason that the suggestion requiring the entire HTTP namespace to be
WebDAV compliant was rejected. Imagine the HTTP namespace rooted at
http://foo/ was owned by server A but the namespace rooted at
http://foo/bar/blah was owned by server B. Now imagine that server A wants
to be WebDAV complaint but server B does not. The second suggestion would
mean that both http://foo/ and http://foo/bar/ could never be WebDAV
compliant because they were parents of http://foo/bar/blah, which is not
WebDAV compliant.

There were then suggestions that the WebDAV WG provide a mechanism to map a
namespace. The idea was that the client could make a single request to the
server and not only find out if a particular URL was WebDAV compliant, but
what other resources on the server were compliant.

The server community quickly rejected the idea as both too expensive and not
implementable. If the namespace is cut up between different machines there
may not be anyway for the different machines to discover each other much
less figure out which resources exist and if they support WebDAV.

Eventually the client community accepted that mixed namespaces where here to
stay and that clients were going to have to pay the cost for resource by
resource detection. The over all cost proved not to be too bad because of
the hierarchy manipulation mechanisms WebDAV provided.

6 Why Does WebDAV allow for non-WebDAV Compliant Collections?

At one point or another the authors of the WebDAV spec realized that a
resource could meet all of WebDAV's requirements for a collection resource
without necessarily supporting all the WebDAV methods. Recognizing that some
people might want to be able to just implement collections without
necessarily supporting all of WebDAV the authors decided to throw in
language that allowed a resource to be a collection without necessarily
being WebDAV compliant. It was one of those "never forbid without a damn
good reason" type decisions.

7 Why Does WebDAV Require Hierarchy in WebDAV Only Namespaces?

Section 5.2 of the WebDAV standard states that:

   For all WebDAV compliant resources A and B, identified by URIs U and
   V, for which U is immediately relative to V, B MUST be a collection
   that has U as an internal member URI.

This requirement is even stronger than consistency. Consistency only
requires that if http://a/b/c exists then http://a/b/ exist. Section 5.2
requires hierarchy. Not only must http://a/b/ exist if http://a/b/c exists
but http://a/b/ must be a collection and must contain http://a/b/c as a
member.

This means that if http://a/ and http://a/b/c are both WebDAV compliant but
can't communicate with each other, for example http://a/b/c is a virtual
root or on a different server, then there must be a non-WebDAV buffer
between them. Because http://a/ has no way to communicate depth requests to
http://a/b/c or to even be sure that http://a/b/c currently exists the
non-WebDAV buffer prevents the resources from being required to communicate
with each other.

In addition to section 5.2 there are requirements throughout WebDAV which
prevent most of its methods from creating inconsistent namespaces as a
result of their execution. When these consistency requirements are combined
with section 5.2 the result is that these methods are essentially required
to never create a non-hierarchical WebDAV namespace.

On the face of it these requirements looks a bit unwieldy and quite
possibly, unnecessary. Below I present the arguments that lead to these
requirements. However the presentation is a complete fiction. Most of these
arguments were never explicitly made. The decision to include these
requirements was made over a period of years as part of arguments buried
deep in different contexts. What I have tried to do below is to distill
those arguments to just the parts relevant to WebDAV's hierarchy
requirements.

7.1 Client Hierarchy Requirements

When a company sells client software it usually has to give away a certain
amount of free support. If a user picks up the phone to use their free
support all the profit on that sale is generally negated as a function of
the cost of providing help services. Thus it is very galling to client
makers that when servers screw up (resource not available, protocol not
supported, etc.) it is the client software maker, not the server maker, who
get the phone call. This is why one sees weird messages in network enabled
client programs like "Operation Successful - Connection Failed". The program
is trying to tell the user that the client didn't do anything wrong, it is
the network or the server that screwed up so please make them pay for the
support call.

Thus a large segment of the client community had a very serious concern
regarding hierarchy. Specifically, the following scenario:
1. the user saves a file to a server,
2. the user subsequently views the contents of the collection the file was
saved in, 
3. the file is not listed because the server isn't enforcing a consistent
namespace, even among WebDAV compliant resources.
4. the user picks up the phone and calls client support demanding to know
why the client lost their file.

Of course the previous scenario could still occur even if the namespace was
hierarchical. For example, someone may save a file and before they have a
chance to list the contents of the parent collection someone else may delete
the file. Thus, to the user, it appears as if the file just disappeared. A
possible solution to this problem is to require transactioning/versioning to
let the client know that the file was saved but was subsequently deleted.
However this scenario is sufficiently rare and the costs for dealing with it
in the protocol sufficiently high that the client community accepted that it
was going to have to suffer the costs for dealing with this scenario itself.

The client community wasn't particularly concerned about mixed namespaces.
They expected that in the average case there would be a WebDAV root and all
the resources underneath it would also be WebDAV compliant. Thus the client
community was primarily concerned with optimizing for behavior in WebDAV
namespaces, rather than worrying about all the possible combinations in
mixed namespaces.

Thus the client community was strictly concerned about hierarchy in WebDAV
namespaces. Even so, they were not asking that there be a requirement that
WebDAV namespaces be hierarchical. Rather they were asking that two features
be added to the protocol:
1) A way to guarantee that a method would not result in the creation of a
non-hierarchical namespace.
2) A way to determine if an existing namespace is hierarchical.

Requirement 1 would prevent the previous scenario. Requirement 2 would keep
the client from getting blamed if someone else created the previous
scenario.

Finally, a large segment of the client community made it absolutely clear
that they would refuse to work with any server that did not maintain a
hierarchical namespace, at least amongst WebDAV compliant resources. Their
reasoning was that their UI was hierarchy based so they had no way to
display non-hierarchical namespaces. Furthermore, they had no interest in
being able to display non-hierarchical namespaces. They felt they would only
confuse their users and thus increase support costs. In addition, their
commands (such as copy/move/delete) were all based on hierarchy manipulation
and thus required a hierarchical namespace.

7.2 Server Hierarchy Requirements

The hierarchical server community uses file systems and other hierarchical
stores to record data.

This segment of the server community was not very thrilled about
non-hierarchical namespaces. They could implement them, but at a high cost.
How would a hierarchical store record the fact that http://a/b, http://a/b/c
and http://a/b/c/d exist but are all non-collection resources? How would a
hierarchical store record that http://e/ and http://e/f/g exists but
http://e/f does not? Both problems are solvable with enough record keeping
and redirections on the part of the server.

An even worse problem, from the hierarchical server community's point of
view was that requiring servers to support non-hierarchical namespaces would
effectively limit access to the underlying store to only HTTP. What happens
if a client using the file system directly tries to access the store? It
would be completely confused, not recognizing that a/b was not really meant
to be a directory or that e/f/g was not meant to be seen as a child of e/f.
The hierarchical server community had a strong requirement that their
underlying stores be accessible and understandable through multiple access
mechanisms. This meant they needed to be able to maintain their hierarchy.

The message from the hierarchical server community was that supporting
non-hierarchical namespaces in the WebDAV protocol was fine, just as long as
their servers were allowed to require that all resources be WebDAV compliant
and hierarchical.

7.3 The Working Group Analysis

Hearing from these two groups of implementers the WG was getting worried
that a serious interoperability issue was being created. Hierarchical
clients refused to work with non-hierarchical servers and hierarchical
servers refused to work with clients requiring a non-hierarchical WebDAV
namespace.

7.3.1 Sizing up the Market

The WG decided to determine just how bad the interoperability issue was
likely to be by examining what clients and servers were doing today. The
WG's investigations turned up two types of client/servers:

Pure Hierarchical - This class accounted for the overwhelming majority of
deployed authoring clients and servers. Typical examples include all known
file systems as well as the file manipulation dialogs of all major clients
(Word Perfect, Word, File Explorer, Finder, etc). Note, this doesn't include
linking, which the group had already agreed would be dealt with in a
separate draft (which has since been published).

Property Based - The majority of the remaining client/servers belong to
high-end document and data management systems. These systems rarely
presented their users with a hierarchy. Rather they allowed the user to
provide meta-data that was then used to perform a search that would then
present a result space. For example, a user could ask for a particular
configuration or a version. Neither request identifies a resource name but
rather identifies values of properties associated with resources. The system
would search on those values and present the result. The underlying stores
were often flat namespaces. Those that weren't flat were unvaryingly
hierarchical.

Interestingly enough the WG never turned up an authoring client or server
that required a non-hierarchical namespace only those that could operate in
the absence of hierarchy because they used a flat namespace.

Thus the WG came to the conclusion that the majority of WebDAV clients would
be unwilling to even try to talk to server that supported non-hierarchical
WebDAV namespaces. The WG also concluded the majority of WebDAV servers
would demand that the namespace was always hierarchical. Thus an
interoperability issue did exist. Clients which, for whatever reason,
required the ability to create non-hierarchical namespaces would have almost
no one to talk to and servers that didn't allow clients to require a
hierarchical WebDAV namespace would find very few clients willing to talk to
them.

7.3.2 The Working Group's Conclusion

The WG had two solutions:

The first solution was to allow non-hierarchical WebDAV namespaces and
augment the protocol to require that clients specify if a method is allowed
to create a non-hierarchical namespace. The protocol would also have to be
augmented to indicate to the client if the existing WebDAV namespace
enforced hierarchy. The upside of this solution was that it allowed for
flexibility. The downside of this proposal is that it complicated the
protocol by adding additional headers and properties to allow for discovery
and enforcement of hierarchy.

The second solution was to mandate that WebDAV namespaces must be
hierarchical. The upside of this solution was that it was extremely simple,
requiring no protocol changes. The downside to this proposal is that it shut
the door on supporting non-hierarchical WebDAV namespaces.

What finally swayed matters was the realization that allowing
non-hierarchical WebDAV namespaces essentially meant creating two separate
protocols. HWebDAV and NHWebDAV. Hierarchical client/servers used HWebDAV
and non-hierarchical client/servers use NHWebDAV. Thus the cause of
interoperability was not served by adding support for non-hierarchical
WebDAV namespaces but the protocol was certainly made more complex by adding
support for non-hierarchical WebDAV namespaces.

So the WG decided to mandate that WebDAV namespaces had to be hierarchical.

Received on Sunday, 27 December 1998 17:27:05 UTC