XML-Signature Data Model Design:
Referents and Resources

WG Draft 1999-July-23

This Working Group version:: http://www.w3.org/Signature/Drafts/xml-dsig-design-resources-990723.html [ascii]
Previous version:: ...
Author(s):: Joseph Reagle Jr. <reagle@w3.org>

W3C Status of this Document

This is a WG XML Signature Design draft. It is likely that this document will not be published as a TR or ietf-draft, but will be used as the basis of some other document.

Please send comments to the editor <reagle@w3.org> and cc: the list <w3c-ietf-xmldsig@w3.org>. Publication as a Working Draft does not imply endorsement by the W3C membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Drafts as other than "work in progress".A list of current W3C working drafts can be found at http://www.w3.org/TR. Publication as a Working Draft does not imply endorsement by the W3C membership.

Abstract

This document addresses the data model with respect to the thing signatures sign. The scenarious below present further evidence as to why we should punt all trust assertions and focus on signature validity first.

Referents of Signatures

As the first step in designing the data model is understanding what it is we are modeling, I think some definitions and description of resources are in order. (See References and Definitions)

A tricky usage is the question of signing the original content while sending along an encoded/packaged version: do we identify the resource via URI, but provide another URL to the local encoded version? Can we sign the native format's byte stream instead of the encoded versions? I've realized this is actually an application of digital signatures, one of trusted caching, but a useful scenario for clearing out conceptual crud. (The discussion below is largely agnostic on the issue of whether URIs are expected to resolve to content or act merely as names. [Namespaces for XHTML. Jon Bosak and resulting thread on XML Plenary.])

We are not signing resources, because a resource has no digital representation, it is an abstract object. A resource's identifier can be represented as a sequence of characters/bytes with a restricted syntax -- though few people would want to sign a lone URI. Some resources (digital) also have content that is represented as bytes (or hex, et al), though not every resource has digital content:

Not all resources are network "retrievable"; e.g., human beings, corporations, and bound books in a library can also be considered resources. URI Generic Syntax -- RFC 2396.

Consequently, we are signing the digital representation of a resource's content. However, the content of a resource may change over time.

Thus, a resource can remain constant even when its content---the entities to which it currently corresponds---changes over time, provided that the conceptual mapping is not changed in the process. URI Generic Syntax -- RFC 2396.

Remember, we could very easily -- though very inefficiently -- chunk all those content bytes into a signature engine with a key and generate a signature value. But given the inefficiency and the desire to have a signature that is detached from the content, we use a URI and hash in the manifest. This allows us to identify the resource from which the signature is detached. The URI plus the hash of the content gives us an unambiguous representation of the resource's content at the time of signing. If the URI acts as a URN we may have to provide a URL as well. [Should we abondon URIs in the manifest? I think so, though people can still sign statements with URIs/URNs in them of course. But it makes little sense to have a referant which cannot yield content, hence we should use URLs!]

Scenarios: Signing Statements About Resources and Caches/Packages

The above explaination if is fairly simple, but how does it deal with the following scenarios:

Bob wants to state that he likes the NYT daily dynamic news home.
Bob creates a statement "I like the NYT daily dynamic news home page." and signs that statement/content. Bob is encouraged to use a URI to identify the NYT page. As an alternative, Bob could capture this assertion in a machine readable assertion using URIs.
Alice wishes to sign a statement that she likes this content (enclosed) she saw at a URI, by "this content" she also sends an encoded/packaged form of the signed content with the signature.
Two key assertions are made, and the semantics of a third assertion must also be made -- or derived independently.
1. Alice asserts "I like the content of (URI:remote)"; this assertion is identified as (URI:assertion-like); Alice signs the content of (URI:assertion-like). We assume we can trust Alice to know what she likes. The validity of this signature can be independently validated given the appropriate content and key from Alice.
2. Alice asks her widely trusted friend Terry to retrieve the content at (URI:remote) place it in a package and assert "the content of (URI:package) is/was the content of (URI:remote)"; this assertion is identified as (URI:cached). Terry signs the join of the content (URI:package) with the content (URI:cached). Alice could make this statement herself if we trusted her to do so. The trust-worthiness of this statement is established through on-line assurance, reputation, or derived in some other way. The validity of this signature can be independently validated given the appropriate content and key from Terry.
3. The binding between these two statements can come from:
  1. an assertion by Alice that she thinks both (the join) are true.
  2. derived from the equivalence of the included hash value associated with (URI:remote) within each statement.
Note that if we trust Alice to make assertions {1,2}, she can sign both (make all the signatures involved!). In this case, we can extend this level of trust because if we trust Alice to make statements of what she likes, then there's little motivation for her to include a copy she doesn't like in a package! However, this can not be generalized to all trusted caching applications. Whether trust derived from assertion:1 can be extended to assertion:2 is dependent on the semantic of assertion:1. Furthermore, neither of these should be confused with assertion:3. I can see this key scenario pushing designers to develop a semantically ambiguous trust syntax that suits this single application well, but breaks in the properly generalized case.
Finally, if content is encoded in the process of packaging, there is another implicit assertion: "the encoded content in (URI:package) can be transformed into different content." Let us call the decoded content (URI:unencoded). The trust worthiness of this statement is implicit to the encoding/decoding process and easily verified if the encoding property has a feature that there is a single decoded form and it is highly unlikely to collide with the decoding of different content. [Not sure on this point, or if this is the right property.]

References and Definitions

Russell

Definite description SEMANTICS: a definite noun phrase which is used to refer to exactly one individual. EXAMPLE: the king of France in (i) is a definite description that can only be properly used if France has one and only one king: [http://www.mv.ru/~oz/study/flims/LingGlossary/ll_d.html]

HMTL 2.0 Link

In an anchor address, the URI refers to a resource; it may be used in a variety of information retrieval protocols to obtain an entity HTML document. The fragment identifier, if present, refers to some view on, or portion of the resource.

absolute URI: a URI in absolute form; for example, as per [URL]
anchor: one of two ends of a hyperlink; typically, a phrase marked as an A element.
URI: A Uniform Resource Identifier is a formatted string that serves as an identifier for a resource, typically on the Internet. URIs are used in HTML to identify the anchors of hyperlinks. URIs in common practice include Uniform Resource Locators (URLs)[URL] and Relative URLs [RELURL].

URI Generic Syntax -- RFC 2396.

   URI are characterized by the following definitions:

      Uniform
         Uniformity provides several benefits: it allows different types
         of resource identifiers to be used in the same context, even
         when the mechanisms used to access those resources may differ;
         it allows uniform semantic interpretation of common syntactic
         conventions across different types of resource identifiers; it
         allows introduction of new types of resource identifiers
         without interfering with the way that existing identifiers are
         used; and, it allows the identifiers to be reused in many
         different contexts, thus permitting new applications or
         protocols to leverage a pre-existing, large, and widely-used
         set of resource identifiers.

      Resource
         A resource can be anything that has identity.  Familiar
         examples include an electronic document, an image, a service
         (e.g., "today's weather report for Los Angeles"), and a
         collection of other resources.  Not all resources are network
         "retrievable"; e.g., human beings, corporations, and bound
         books in a library can also be considered resources.

         The resource is the conceptual mapping to an entity or set of
         entities, not necessarily the entity which corresponds to that
         mapping at any particular instance in time.  Thus, a resource
         can remain constant even when its content---the entities to
         which it currently corresponds---changes over time, provided
         that the conceptual mapping is not changed in the process.

      Identifier
         An identifier is an object that can act as a reference to
         something that has identity.  In the case of URI, the object is
         a sequence of characters with a restricted syntax.

   Having identified a resource, a system may perform a variety of
   operations on the resource, as might be characterized by such words
   as `access', `update', `replace', or `find attributes'.

1.2. URI, URL, and URN

   A URI can be further classified as a locator, a name, or both.  The
   term "Uniform Resource Locator" (URL) refers to the subset of URI
   that identify resources via a representation of their primary access
   mechanism (e.g., their network "location"), rather than identifying
   the resource by name or by some other attribute(s) of that resource.
   The term "Uniform Resource Name" (URN) refers to the subset of URI
   that are required to remain globally unique and persistent even when
   the resource ceases to exist or becomes unavailable.

   The URI scheme (Section 3.1) defines the namespace of the URI, and
   thus may further restrict the syntax and semantics of identifiers
   using that scheme.  This specification defines those elements of the
   URI syntax that are either required of all URI schemes or are common
   to many URI schemes.  It thus defines the syntax and semantics that
   are needed to implement a scheme-independent parsing mechanism for
   URI references, such that the scheme-dependent handling of a URI can
   be postponed until the scheme-dependent semantics are needed.  We use
   the term URL below when describing syntax or semantics that only
   apply to locators.

   Although many URL schemes are named after protocols, this does not
   imply that the only way to access the URL's resource is via the named
   protocol.  Gateways, proxies, caches, and name resolution services
   might be used to access some resources, independent of the protocol
   of their origin, and the resolution of some URL may require the use
   of more than one protocol (e.g., both DNS and HTTP are typically used
   to access an "http" URL's resource when it can't be found in a local
   cache).

   A URN differs from a URL in that it's primary purpose is persistent
   labeling of a resource with an identifier.  That identifier is drawn
   from one of a set of defined namespaces, each of which has its own
   set name structure and assignment procedures.  The "urn" scheme has
   been reserved to establish the requirements for a standardized URN
   namespace, as defined in "URN Syntax" [RFC2141] and its related
   specifications.

   Most of the examples in this specification demonstrate URL, since
   they allow the most varied use of the syntax and often have a
   hierarchical namespace.  A parser of the URI syntax is capable of
   parsing both URL and URN references as a generic URI; once the scheme
   is determined, the scheme-specific parsing can be performed on the
   generic URI components.  In other words, the URI syntax is a superset
   of the syntax of all URI schemes.

Web Characterization Terminology & Definitions Sheet.

Web Resource.: A resource, identified by a URI, that is a member of the Web Core. Note: The URI identifying the Web Resource does not itself have to be found within the Web Core. That is, a URI written on a bus identifying a resource that is a member of the Web Core identifies a Web Resource.
Web Core: The collection of resources residing on the Internet that can be accessed using any implemented version of HTTP as part of the protocol stack (or its equivalent), either directly or via an intermediary.; Notes: By the term "or its equivalent" we consider any version of HTTP that is currently implemented as well as any new standards which may replace HTTP (HTTP-NG, for example). Also, we include any protocol stack including HTTP at any level, for example HTTP running over SSL.
Resource Manifestation: A resource manifestation is a rendition of a resource at a specific point in time and space. A conceptual mapping exists between a resource and a resource manifestation (or set of manifestations), in the sense that the resource has certain properties - e.g., its URI, its intended purpose, etc. - which are inherited by each manifestation, although the specific structure, form, and content of the manifestation may vary according to factors such as the environment in which it is displayed, the time it is accessed, etc. Regardless of the form the manifestation's rendering ultimately takes, the conceptual mapping to the resource is preserved. Note: For historical reasons, HTTP/1.x calls a manifestation for an "entity". Examples: real-time information accessed from a news Web site on a particular day, up-to-the-minute stock quotes, a rendering of a multimedia Web page accessed with a particular client ...

XLink

locator: Data, provided as part of a link, which identifies a resource.
resource: In the abstract sense, an addressable service or unit of information that participates in a link. Examples include files, images, documents, programs, and query results. Concretely, anything reachable by the use of a locator in some linking element. Note that this term and its definition are taken from the basic specifications governing the World Wide Web.
local resource: The content of an inline linking element. Note that the content of the linking element could be explicitly pointed to by means of a regular locator in the same linking element, in which case the resource is considered remote, not local.
remote resource: Any participating resource of a link that is pointed to with a locator.
sub-resource: A portion of a resource, pointed to as the precise destination of a link. As one example, a link might specify that an entire document be retrieved and displayed, but that some specific part(s) of it is the specific linked data, to be treated in an application-appropriate manner such as indication by highlighting, scrolling, etc.

Resource Description Framework (RDF) Model and Syntax Specification

Resources

All things being described by RDF expressions are called resources. A resource may be an entire Web page; such as the HTML document "http://www.w3.org/Overview.html" for example. A resource may be a part of a Web page; e.g. a specific HTML or XML element within the document source. A resource may also be a whole collection of pages; e.g. an entire Web site. A resource may also be an object that is not directly accessible via the Web; e.g. a printed book. Resources are always named by URIs plus optional anchor ids (see [URI]). Anything can have a URI; the extensibility of URIs allows the introduction of identifiers for any entity imaginable.

Other

Namespaces for XHTML. Jon Bosak.

XML-Signature Data Model Design: Referents and Resources