Warning:
This wiki has been archived and is now read-only.

IRIs/RDFConceptsProposal

From RDF Working Group Wiki

Jump to: navigation, search

The issue before the working group are:

Clarify the usage of IRI references for RDF resources, e.g., per SPARQL Query §1.2.4.

Proposed text changes

Replace “URI reference” and “RDF URI reference” with “IRI” throughout
Replace RDF Concepts Section 6.4, RDF URI References with the following new text:

6.4 IRIs

An IRI (Internationalized Resource Identifier) within an RDF graph is a Unicode string [UNICODE] that conforms to the syntax defined in RFC 3987 [IRI]. IRIs are a generalization of URIs [URI]. Every absolute URI and URL is an IRI.

IRIs in the RDF abstract syntax MUST be absolute, and MAY contain a fragment identifier.

Two IRIs are equal if and only if they are equivalent under Simple String Comparison according to section 5.1 of [IRI]. Further normalization MUST NOT be performed when comparing IRIs for equality.

NOTE: When IRIs are used in operations that are only defined for URIs, they must first be converted according to the mapping defined in section 3.1 of [IRI]. A notable example is retrieval over the HTTP protocol. The mapping involves UTF-8 encoding of non-ASCII characters, %-encoding of octets not allowed in URIs, and Punycode-encoding of domain names.

NOTE: Some concrete syntaxes permit relative IRIs as a shorthand for absolute IRIs, and define how to resolve the relative IRIs against a base IRI.

NOTE: Previous versions of RDF used the term “RDF URI reference” instead of “IRI” and allowed additional characters: “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘“’ (double quote), and “ ” (space). In IRIs, these characters must be percent-encoded as described in section 2.1 of [URI].

NOTE: Interoperability problems can be avoided by minting only IRIs that are normalized according to Section 5 of [IRI]. Non-normalized forms that should be avoided include:

Uppercase characters in scheme names and domain names
Percent-encoding of characters where it is not required by IRI syntax
Explicitly stated HTTP default port (http://example.com:80/); http://example.com/ is preferrable
Completely empty path in HTTP IRIs (http://example.com); http://example.com/ is preferrable
/./ or /../ in the path component of an IRI
Lowercase hexadecimal letters within percent-encoding triplets (“%3F” is preferable over “%3f”)
Punycode-encoding of Internationalized Domain Names in IRIs
IRIs that are not in Unicode Normalization Form C [NFC]

Notable consequences

1. The characters “<”, “>”, “{”, “}”, “|”, “\”, “^”, “`”, ‘“’ (double quote), and “ ” (space) were allowed in URIrefs, and are not allowed in IRIs, so any data containing these characters *unescaped* is now invalid. Data containing these characters in %-encoded form is fine.

2. There was a note stating that URIrefs are compatible with the anyURI datatype. This is no longer the case as anyURI allows the characters above, but IRIs don't, so the note is simply removed.

3. A note said: “The use of %-escaped characters in RDF URI references is strongly discouraged.” This is a problem. There are many completely reasonable URIs the cannot be expressed as IRIs without %-encoding, for example this one: http://google.com/search?q=rdf%20semantics … I removed the note, and subsumed it into another note that discourages the use of %-encoding *iff the unencoded char is allowed in an IRI*.

4. SPARQL 1.1 Query should update Section 4.1.1. Perhaps just drop the second paragraph.

Retrieved from "https://www.w3.org/2011/rdf-wg/wiki/index.php?title=IRIs/RDFConceptsProposal&oldid=1128"

IRIs/RDFConceptsProposal

Proposed text changes

6.4 IRIs

Notable consequences

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Navigation

Tools