Internationalized Resource Identifiers (IRIs)
draft-duerst-iri-03.txt
Editors: Martin Dürst and Michel
Suignard
Discussion list: www-international@w3.org (archive, subscribe)
Plan:
- Resolve remaining issues
- Submit to IESG as individual draft for Proposed Standard
Changes since Yokohama
Changes from -02 to -03
- Added an issues list.
- Added a paragraph prohibiting conversions from URIs to IRIs not based
on UTF-8 to Section 3.2.
- Introduced iadditional to combine unwise, delims, and space.
- Tweaked description and added examples for URI-to-IRI conversion.
- Improved syntax rules for hostname part.
- Improved description of equivalences in Section 2.3.
- Improved description of URI-to-IRI-mapping in Section 3.2.
- Changed preferred case when hex-escaping from lower to UPPER.
- Fixed various details.
Changes from -01 to -02
- New approach for Bidi section, many examples.
- Created idelims, removed '%' and '#'. Changed userinfo to iuserinfo in
iserver.
- Changed to ABNF defined by [RFC2234].
- Included bug fixes from [RFC2396bis].
- Additions to Acknowledgements.
Useage of IRIs in W3C Specs
[Defined by conversion based on UTF-8 and %HH]
- XML 1.0 (REC) for SYSTEM idenifiers
- XML Namespaces: 1.0 in practice, 1.1 formally
- XLink
- XML Schema: anyURI datatype
- HTML 4.0 (as an error provision)
- SVG, XForms,...
Open Issues
- Should characters in
iadditional
be allowed? Under what
conditions?
- Allign the description in Section 2.3 with the results of W3C TAG
discussions on issue URIEquivalence [considered solved].
- Adapt depending on how [IDNURI] is integrated into [RFC2396bis].
- IAB considerations (draft-iab-char-rep-00.txt)
Issue: iadditional
- URIs don't allow the following ASCII characters:
<
,
>
, "
, SPACE, {
,
}
, |
, \
, ^
,
`
- IRIs currently allow (but discourages) them
- 50/50 split of opinion
For:
- In practice, already used (e.g. in fragids)
- Convenient, e.g. for XPointers in XML attributes
- Judgement is still needed (also for many other characters)
- Carrying URIs already means you need to consider context (e.g.
& in XML)
Against:
- IRIs are about internationalization, this is not an
internationalization issue
- These characters may be used as separators, which would not work
anymore
- Formal protocols and formats
- Heuristics (such as extracting IRIs from plain text)
Variant proposals:
- Stronger discouragement (e.g. for namespace-like identifiers)
- Allow always to %-escape for these characters
- Allow some of these characters, in some places
IDNURI
- IRI->URI general approach: Convert to UTF-8 and then use %HH
- Helps convergence of URI schemes to UTF-8
- Important exception: IDN
- Two alternatives:
http://星.org
=> http://%E6%98%9F.org
-> xn--kiv.org
proposed in draft-ietf-idn-uri-03.txt
http://星.org
=> http://xn--kiv.org
-> xn--kiv.org
attention: authority may not be domain name
- Irrelevant for direct resolution of IRIs, but very important for
resolution via URIs
- Example of a pontetially more general issue of scheme-dependency (IAB
concerns)
URI Schemes and components based on UTF-8
- Recommended in RFC 2718
- By design of the scheme: URN, POP, IMAP
- By design of the protocol: FTP
- By individual choice: HTTP
- Query parts: HTML forms increasingly in practice, XForms by design
- Fragment identifiers: XPointer
Schedule (draft)
- Publish new draft in April
- mailing list last call in May
- Submit to IESG in June