1. use case 1
  2. use case 2
  3. infoset ramifications
  4. encoding preservation
  5. dual assertions

use case 1

Is

<rdf:Description>
  <a:title rdf:parseType="Literal"><em>Soul</em> Train</a:title>
<rdf:Description>

equivilent to

<rdf:Description>
  <a:title>&lt;em&gt;Soul&lt;/em&gt; Train</a:title>
<rdf:Description>

?

Consensus seems to be that an infoset is different from the string of bytes (MJD: this should read 'string of characters', not 'string of bytes') that represents it, so no. (related discussion)

use case 2

How about

<rdf:Description>
  <a:title rdf:parseType="Literal">Soul Train</a:title>
<rdf:Description>

and

<rdf:Description>
  <a:title>Soul Train</a:title>
<rdf:Description>

? This is a more contentious question.

yes
irc.freenode.net:#rdfig 20030715-15:21Z <MJDuerst> (a sequence of characters / plain text / whatever you call it) should not become something different just because its context is different
no
irc.freenode.net:#rdfig 20030715-15:21Z <ericP> (if you told RDF it's an XML subtree, it's an XML subtree)

infoset ramifications

The implications of parseType="Literal" is that the content is more than a series of characters, perhaps merely well-formed XML, or perhaps a set of elements from the infoset description for that XML. The difference should not matter outside of the database API.

encoding preservation

In creating RDF/XML with a parseType="Literal", one asserts that the data is a portion of an XML document. (MJD: No, as parseType='Literal' says, it merely instructs the parser to parse the element content as a literal.) If agents are not allowed to consider use case 1 to be equivlent, they must preserve the encoding (parseType) when reserializing. (MJD: They must preserve the escaping (better use escaping than encoding, because encoding can mean lots of other things), but the escaping is not the same as the parseType. They must in some cases (if there is actual XML markup) use parseType='Literal', because that's the only way to write such a literal. Compare this to parseType='Resource': It does not say anything about the nature of the content, just about how to parse it, and there is no need to preserve parseType='Resource' in an RDF store.) Therefor, stating that something has parseType="Literal" means that agents like query/rules engines must perserve the difference between a query for the CharData &lt;em&gt;Soul&lt;/em&gt; Train and the XMLLiteral &lt;em&gt;Soul&lt;/em&gt; Train. This would imply a query mechanism that allowed one to specify the encoding of a literal:

?what a:title XMLLiteral("<em>Soul</em> Train")
?what a:title xsd:string("&lt;em&gt;Soul&lt;/em&gt; Train")

(MJD: as this example shows, the syntax within the string is enough to distinguish between XML markup and plain strings that might happen to look like XML markup. There is no need for a double distinction. Also, in the new spec in its current state, plain literals and xsd:string typed literals have nothing in common (another example of needless type proliferation for simple text strings).)

dual assertions

If the solution is to preserve the encoding on literals, the question remains whether use case 2 should be be considered equivilent. If it should, should an assertion of it as a parseType="Literal" also imply the analogous simple literal. Should the assertion

<rdf:Description>
  <a:title>Soul Train</a:title>
<rdf:Description>

yield results in the query

?what a:title XMLLiteral("Soul Train")

as well as

?what a:title xsd:string("Soul Train")

? Should the database maintain a preferred serialization? Or default to CharData and make sure to not also serialize as xsd:string?