From W3C Wiki

What do we do about non-ascii characters in names in the SemanticWeb?

Options seem to be:

  • Option A: SemanticWeb names are URIs; use of non-ascii characters is allowed as a convenience in some data formats (IRIs), but IRIs are understood to be notation for URIs. RDF graph matching (LinkMe: defn simple entailment in the WD-rdf-mt) is based on URIs. This is what cwm supports as of Feb 2002.
  • Option B: SemanticWeb names are IRIs; when the guy writes André, he means André (there's an accent on the "e"), and not Andr%C9%A3, and RDF graph matching shouldn't say that he did. (@@details about who gets to %xx-lify when need to be elaborated). RDFCore resolved Issue rdf-charmod-uris this way.
  • Option C: The URI and IRI spaces are isomorphic; for every IRI x, xxlify(x) is its dual in URI space. Any URI which is the value of xxlify(x) for some x is reserved for the purpose of representing the IRI x. RDF graph matching can either normalize to URIs or normalize to IRIs; the results will be the same.

XML namespace software deployment (and layered stuff like XPath) leans toward the 2nd option (FixMe: IntraPageLink).

History, newest first

  • Namespaces in XML 1.1 W3C Candidate Recommendation 18 December 2002. section 2.3 Comparing IRI References shows it very clearly takes position B, though in section 9, "Users defining namespaces are advised to restrict namespace names to URIs until software supporting IRIs is in common use."
  • TagIssue:IRIEverywhere-27 TAG issue IRIEverywhere-27] raised Oct 2002
  • Issue rdf-charmod-uris: Does the treatment of uri-references conform with charmod? resolved Apr 2002 that RDF graph matching is based on IRIs. (LinkMe: find the tests)
  • SWAP tools grows some unicode support (e.g. revision 1.100 date: 2002/02/14 23:56:09; author: connolly) LinkMe: relevant tests
  • RFC:2396 reaches IETF Draft Standard. US-ASCII characters only in URIs.
  • definition of external ID in XML 1.0, 1998, carries the tradition forward
  • B.2.1 Non-ASCII characters in URI attribute values HTML 4.0 spec, W3C Recommendation 18-Dec-1997; stipulates that current practice goes beyond US-ASCII characters

see also: IRI spec drafts, Nov. 2001 thru March 2003.

hmm... does the charlie code implement anything relevant?

This is clearly a ProxyTopic; not sure whether that's good or not.

A proxy to which wiki page? guest
 to TagIssue:IRIEverywhere-27