RDF Literals and language tagging

This document is prereading for a discussion between i18n and RDF WG personnel. It is expected that the content will be further developed during that discussion. It is not appropriate to refer to this other than as a basis for discussion.

The text below attempts to reflect a decision making tree with regard to the handling of RDF Literals and language tagging, as best we currently understand it. This provides a framework for our discussion, but also highlights areas where i18n/RDF have issues and where not. The document lists proposed alternative approaches (numbered items), then for each lists perceived pros and cons (+ or - bullets). There are also responses to the latter.

There is only one technical topic for discussion, and that is...

Labelling of language info for plain and XML literals in RDF

  1. xml literals do not some way of having lang info [agreed INVALID]
  2. xml literals do have some way of having lang info
    1. use external xml:lang in RDF/XML and a tuple for graphs
      • - RDF: discontinuity in handling of info (only tuple-like datatype)
      • - RDF: likelihood that apps searching for English text in "<x xml:lang=fr>chat</fr>" would not recognise markup
        • i18n: but that could happen anyway, whether or not this is a tuple, therefore @ propose not valid
      • + i18n: xml literals and plain literals are handled in a very similar way
    2. use external xml:lang on literal markup in RDF/XML, but add wrapper markup to literal in graph format to carry lang info
      • + RDF: provides uniform handling of lang tag
      • - RDF: appearance of extra wrapper info may surprise user
      • - RDF: constrains all XML literals to have wrapper
        • i18n: any arbitrary literal can be wrapped, so not a constraint @propose not valid
      • - RDF: harder for apps to implement if they hide wrapper
    3. ignore external xml:lang on markup in RDF/XML, but integrate into markup in both XML and graph format, using span if necessary and require it for each literal
      • + RDF: cleaner design (esp from typing point of view)
      • + RDF: span element available
        • i18n: only available in HTML; some markup languages may not have one at all, others may use a different one -> how tools going to know it's equivalent @ propose not valid
      • - RDF: redundant specification of lang tags in RDF/XML is time consuming for manual creation
        • RDF: not an issue for tools, by hand rare @propose not valid
        • i18n: slight disagreement @propose possibly valid
      • - RDF: confusion when xml:lang info not inherited in RDF/XML
        • RDF: rdf writers dont use global lang tags, so issue doesn't arise @propose not valid
        • i18n: not sure, but seems a logical thing to do
      • - i18n: VERY IMPORTANT!: alters the literal in a non-recoverable way if lang info integrated into existing tags @propose not valid
      • - i18n: VERY IMPORTANT!: no common span-like tag available where needed @propose not valid
      • - i18n: much harder to compare plain and xml literals if xml literals contain additional markup for language information - especially relevant if creators have started with plain literals and then had to go to xml literals
      • - i18n: creates incompatabilities with existing data based on M&S
      • - i18n: confusion by users, including programmers, bec different behaviour in XML format for XML literals vs plain literals, ie. xml:lang info ignored or not