completion of action: 2001-07-27#2 (long)

[a review of what the M&S says about literals]

see also: <http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Jul/0434.html>

RDF 1.0 syntax: the XML BNF as per section 6.
RDF 1.0 model: the formal model as per section 5.


Quick overview of what the M&S says about literals:

Literals MUST be well formed XML with respect to the RDF 1.0 syntax. Some
constraints SHALL apply to literals containing XML markup with respect to the
RDF 1.0 syntax. Such constraints SHOULD NOT apply to other serializations.
Literals
MAY be other than well formed XML with respect to the RDF 1.0 model. The RDF 1.0
model is NOT dependent on XML in its conception of a literal. Literals are NOT
defined for the RDF model.

(at least that's what I've divined it says...)


Commentary:

My current take is the RDF 1.0 model is not restricted to using well formed
XML for the inscription of literals. I think the wg should decide if this is
the case.

I've been under the happy illusion of late that literals in RDF are structured
XML. This seems to only be the case when RDF is serialized according to RDF 1.0
XML syntax.

Section 2.1 says "In RDF terms, a literal may have content that is XML markup
but is not further evaluated by the RDF processor." Is the intent to say that a
literal may optionally be XML markup (certainly there's more than one parse tree
for that sentence)? If you combine such an interpretation, with the reference to
"string literals" in 2.1.1, the pictorial examples (which do not include XML
tags in the literals employed) and the glossary definition of a literal, there's
enough to allow the interpretation that literals may be other than well formed
XML, with respect to the RDF model. However, it seems possible to interpret the
document either way.

[Note however that the discussion of literals with respect to the model does not
offer or recommend a character encoding for literals (that is, literals do not
have a canonical form in the M&S). Looking to the MT for help, I see that it
defers to n-triples encoding for the literals. I'm guessing that you can drop
any encoding scheme+mapping function to the set of literals LV into the MT and
it would not affect what are true statements in RDF (help?).]

In sections 2.2.1 and 6 it is clear that literals MUST be well formed XML with
respect to the BNF: but is this the case for any RDF syntax, or indeed for the
model? The M&S doesn't actually say. Contrariwise, it's clear that a URI used in
RDF is the same across syntaxes, that is, it's clear that the RDF 1.0 Model
depends on the URI syntax. All in all the M&S could be more explicit on literal
encoding: I'm honestly not sure what the authors' intent was (help?).

There are some desirable outcomes of not insisting that literals are XML:

-any formal model of RDF will not come to be dependent on XML for the
 serialization of literals; this is consistent with the notion of having a
 syntax independent model.

-other serializations can avoid being dependent on XML for literals. For
example n-triples 1.8 is not constrained to use well formed XML for literals.

There is at least one undesirable outcome:

-transcoding literals between syntaxes, such as between n-triples and RDF
1.0 syntax, might have lossy corner cases, and possibly wrt namespaces.



Some proposals follow as a result of all this.


Proposal 1:
add a serialization independent definition of a Literal early
on in the document.

Forces: Literals are not clearly defined for the RDF 1.0 model; Literals
and their representation in XML are not clearly distinguished in the M&S;

So: edit section 2.1 along the lines of:

"
2.1. Basic RDF Model

[...]

The basic data model consists of four object types:

Resources
[...]

Properties
[...]

Literals

The most primitive value type represented in RDF, typically a string of
characters [XXX: 7-bit US-ASCII? as per n-triple/model theory?]. Literals are
distinguished from Resources in that the RDF model does not permit literals to
be the subject of a statement. For the XML serialization syntax described in
this document, there are syntactic restrictions on how XML markup in literals
can be expressed; see Section 2.2.1.

Statements

A specific resource together with a named property plus the value of that
property for that resource is an RDF statement. These three individual parts of
a statement are called, respectively, the subject, the predicate, and the
object. The object of a statement (the property's value) can be another
resource or it can be a literal."


Proposal 2:
state clearly that the constraints placed on literals for the
serialization syntax SHOULD NOT apply to other or future syntaxes.

Forces: prevent dependencies on XML by other syntaxes as well as the RDF
formal model or future MT.

So: edit section 2.2.1 along the lines of:

"Section 2.2.1
[...]

Within a propertyElt, the resource attribute specifies that some other resource
is the value of this property; that is, the object of the statement is another
resource identified by URI rather than a literal. The resource identifier of the
object is obtained by resolving the resource attribute URI-reference in the same
manner as given above for the about attribute. Strings must be well-formed XML;
the usual XML content quoting and escaping mechanisms may be used if the string
contains character sequences (e.g. "<" and "&") that violate the well-formed
ness
rules or that otherwise might look like markup. See Section 6. for additional
syntax to specify a property value with well-formed XML content containing
markup such that the markup is not interpreted as RDF. Note that such syntactic
constraints on RDF literals are a result of the use of XML and might not apply
to
other or future RDF serializations."
[...]"


Proposal 3:
parseType attribute values beginning with 'rdf:' are reserved for
use by the RDFCore/W3C. Proposal 3.1: move 'Literal' and 'Resource' parseTypes
to 'rdf:Resource' and 'rdf:Literal'

Forces: parseType is a useful and used extension mechanism for RDF application
developers and modelers;
<http://lists.w3.org/Archives/Public/www-rdf-comments/2001AprJun/0127.html>;
"The RDF Model and Syntax Working Group acknowledges that the
parseType='Literal' mechanism is a minimum-level solution to the requirement to
express an RDF statement with a value that has XML markup. Additional
complexities of XML such as canonicalization of whitespace are not yet well
defined. Future work of the W3C is expected to resolve such issues in a uniform
manner for all applications based on XML. Future versions of RDF will inherit
this work and may extend it as we gain insight from further application
experience.";
"The parseType attribute changes the interpretation of the element content. The
parseType attribute should have one of the values 'Literal' or 'Resource'. The
value is case-sensitive. The value 'Literal' specifies that the element content
is to be treated as an RDF/XML literal; that is, the content must not be
interpreted by an RDF processor. The value 'Resource' specifies that the element
content must be treated as if it were the content of a Description element.
Other values of parseType are reserved for future specification by RDF.  With
RDF 1.0 other values must be treated as identical to 'Literal'."-the mandate of
reserving future parseTypes and the mandate that parseTypes in the wild 'should'
be either Literal or Resource are not wholly consistent with each other.

So: amend section 6 BNF and prose to fully reserve parseTypes starting with
'rdf:'. Informatively indicate that alternate parseTypes are a useful extension
mechanism. Deprecate 'Literal' and 'Resource' in favor of 'rdf:Literal' and
'rdf:Resource'. Mandate that unrecognized parseTypes are treated as rdf:Literal.


Bill

--
InterX
bdehora@interx.com
dehora@acm.org
+44(0)20-8817-4039
www.interx.com

Received on Friday, 24 August 2001 22:25:10 UTC