ISSUE-439: Authors should be able to use both "utf8" and "utf-8" labels, case-insensitively

Authors should be able to use both "utf8" and "utf-8" labels, case-insensitively

State:

CLOSED

Product:

encoding

Raised by:

Richard Ishida

Opened on:

2015-03-30

Description:

https://www.w3.org/Bugs/Public/show_bug.cgi?id=24337

This issue tracks the bug listed above and was created as part of the WG CR process.

---

(In reply to Geoffrey Sneddon from comment #0)
> Currently the spec says: 'Authors must use the utf-8 encoding and must use
> the "utf-8" label to identify it.'
>
> Given the label matching is done case-insensitively, it is not entirely
> clear whether authors must use this label case-sensitively or not. This
> should be clarified, preferably to allow either case (there is no practical
> benefit of requiring it to be lowercased).

Agreed.

> We should also make the "utf8" label conforming. Making this non-conforming
> is of no practical benefit and makes a large number of documents
> non-conforming.

This looks innocuous at first. However, in some products (in particular Oracle
Databases), the label "utf8" is used for a variant of UTF-8 where characters
outside the BMP are encoded with two surrogates, with a total of 6 bytes. For
security reasons, this is prohibited in UTF-8.

Related Actions Items:

No related actions

Related emails:

I18N-ISSUE-439 (BUG24337): Authors should be able to use both "utf8" and "utf-8" labels, case-insensitively [encoding] (from sysbot+tracker@w3.org on 2015-03-30)

Related notes:

Bug marked FIXED and RESOLVED.

Richard Ishida, 30 Mar 2015, 13:32:47

This was a very old bug, added to tracker in error. Closed.

Richard Ishida, 30 Mar 2015, 13:44:13

Display change log

Internationalization Working Group Issue Tracking

ISSUE-439: Authors should be able to use both "utf8" and "utf-8" labels, case-insensitively

Authors should be able to use both "utf8" and "utf-8" labels, case-insensitively

Related notes: