Copyright © 1999 The Internet Society & W3C (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This is a WG XML Signature Design draft. It is likely that this document will not be published as a TR or ietf-draft, but will be used as the basis of some other document.
Please send comments to the editor <reagle@w3.org> and cc: the list <w3c-ietf-xmldsig@w3.org>. Publication as a Working Draft does not imply endorsement by the W3C membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Drafts as other than "work in progress".A list of current W3C working drafts can be found at http://www.w3.org/TR. Publication as a Working Draft does not imply endorsement by the W3C membership.
This document addresses the data model with respect to the thing signatures sign. The scenarious below present further evidence as to why we should punt all trust assertions and focus on signature validity first.
As the first step in designing the data model is understanding what it is we are modeling, I think some definitions and description of resources are in order. (See References and Definitions)
A tricky usage is the question of signing the original content while sending along an encoded/packaged version: do we identify the resource via URI, but provide another URL to the local encoded version? Can we sign the native format's byte stream instead of the encoded versions? I've realized this is actually an application of digital signatures, one of trusted caching, but a useful scenario for clearing out conceptual crud. (The discussion below is largely agnostic on the issue of whether URIs are expected to resolve to content or act merely as names. [Namespaces for XHTML. Jon Bosak and resulting thread on XML Plenary.])
We are not signing resources, because a resource has no digital representation, it is an abstract object. A resource's identifier can be represented as a sequence of characters/bytes with a restricted syntax -- though few people would want to sign a lone URI. Some resources (digital) also have content that is represented as bytes (or hex, et al), though not every resource has digital content:
Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources. URI Generic Syntax -- RFC 2396.
Consequently, we are signing the digital representation of a resource's content. However, the content of a resource may change over time.
Thus, a resource can remain constant even when its content---the entities to which it currently corresponds---changes over time, provided that the conceptual mapping is not changed in the process. URI Generic Syntax -- RFC 2396.
Remember, we could very easily -- though very inefficiently -- chunk all those content bytes into a signature engine with a key and generate a signature value. But given the inefficiency and the desire to have a signature that is detached from the content, we use a URI and hash in the manifest. This allows us to identify the resource from which the signature is detached. The URI plus the hash of the content gives us an unambiguous representation of the resource's content at the time of signing. If the URI acts as a URN we may have to provide a URL as well. [Should we abondon URIs in the manifest? I think so, though people can still sign statements with URIs/URNs in them of course. But it makes little sense to have a referant which cannot yield content, hence we should use URLs!]
The above explaination if is fairly simple, but how does it deal with the following scenarios:
Bob creates a statement "I like the NYT daily dynamic news home page." and signs that statement/content. Bob is encouraged to use a URI to identify the NYT page. As an alternative, Bob could capture this assertion in a machine readable assertion using URIs.
Two key assertions are made, and the semantics of a third assertion must also be made -- or derived independently.
Note that if we trust Alice to make assertions {1,2}, she can sign both (make all the signatures involved!). In this case, we can extend this level of trust because if we trust Alice to make statements of what she likes, then there's little motivation for her to include a copy she doesn't like in a package! However, this can not be generalized to all trusted caching applications. Whether trust derived from assertion:1 can be extended to assertion:2 is dependent on the semantic of assertion:1. Furthermore, neither of these should be confused with assertion:3. I can see this key scenario pushing designers to develop a semantically ambiguous trust syntax that suits this single application well, but breaks in the properly generalized case.
Finally, if content is encoded in the process of packaging, there is another implicit assertion: "the encoded content in (URI:package) can be transformed into different content." Let us call the decoded content (URI:unencoded). The trust worthiness of this statement is implicit to the encoding/decoding process and easily verified if the encoding property has a feature that there is a single decoded form and it is highly unlikely to collide with the decoding of different content. [Not sure on this point, or if this is the right property.]
Definite description SEMANTICS: a definite noun phrase which is used to refer to exactly one individual. EXAMPLE: the king of France in (i) is a definite description that can only be properly used if France has one and only one king: [http://www.mv.ru/~oz/study/flims/LingGlossary/ll_d.html]
In an anchor address, the URI refers to a resource; it may be used in a variety of information retrieval protocols to obtain an entity HTML document. The fragment identifier, if present, refers to some view on, or portion of the resource.
URI are characterized by the following definitions:
Uniform Uniformity provides several benefits: it allows different types of resource identifiers to be used in the same context, even when the mechanisms used to access those resources may differ; it allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers; it allows introduction of new types of resource identifiers without interfering with the way that existing identifiers are used; and, it allows the identifiers to be reused in many different contexts, thus permitting new applications or protocols to leverage a pre-existing, large, and widely-used set of resource identifiers.
Resource A resource can be anything that has identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources.
The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content---the entities to which it currently corresponds---changes over time, provided that the conceptual mapping is not changed in the process.
Identifier An identifier is an object that can act as a reference to something that has identity. In the case of URI, the object is a sequence of characters with a restricted syntax.
Having identified a resource, a system may perform a variety of operations on the resource, as might be characterized by such words as `access', `update', `replace', or `find attributes'.
1.2. URI, URL, and URN
A URI can be further classified as a locator, a name, or both. The term "Uniform Resource Locator" (URL) refers to the subset of URI that identify resources via a representation of their primary access mechanism (e.g., their network "location"), rather than identifying the resource by name or by some other attribute(s) of that resource. The term "Uniform Resource Name" (URN) refers to the subset of URI that are required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable.
The URI scheme (Section 3.1) defines the namespace of the URI, and thus may further restrict the syntax and semantics of identifiers using that scheme. This specification defines those elements of the URI syntax that are either required of all URI schemes or are common to many URI schemes. It thus defines the syntax and semantics that are needed to implement a scheme-independent parsing mechanism for URI references, such that the scheme-dependent handling of a URI can be postponed until the scheme-dependent semantics are needed. We use the term URL below when describing syntax or semantics that only apply to locators.
Although many URL schemes are named after protocols, this does not imply that the only way to access the URL's resource is via the named protocol. Gateways, proxies, caches, and name resolution services might be used to access some resources, independent of the protocol of their origin, and the resolution of some URL may require the use of more than one protocol (e.g., both DNS and HTTP are typically used to access an "http" URL's resource when it can't be found in a local cache).
A URN differs from a URL in that it's primary purpose is persistent labeling of a resource with an identifier. That identifier is drawn from one of a set of defined namespaces, each of which has its own set name structure and assignment procedures. The "urn" scheme has been reserved to establish the requirements for a standardized URN namespace, as defined in "URN Syntax" [RFC2141] and its related specifications.
Most of the examples in this specification demonstrate URL, since they allow the most varied use of the syntax and often have a hierarchical namespace. A parser of the URI syntax is capable of parsing both URL and URN references as a generic URI; once the scheme is determined, the scheme-specific parsing can be performed on the generic URI components. In other words, the URI syntax is a superset of the syntax of all URI schemes.
Resources | All things being described by RDF expressions are called resources. A resource may be an entire Web page; such as the HTML document "http://www.w3.org/Overview.html" for example. A resource may be a part of a Web page; e.g. a specific HTML or XML element within the document source. A resource may also be a whole collection of pages; e.g. an entire Web site. A resource may also be an object that is not directly accessible via the Web; e.g. a printed book. Resources are always named by URIs plus optional anchor ids (see [URI]). Anything can have a URI; the extensibility of URIs allows the introduction of identifiers for any entity imaginable. |