Re: [Encoding] false statement [I18N-ACTION-328][I18N-ISSUE-374] from John C Klensin on 2014-08-31 (www-international@w3.org from July to September 2014)

From: John C Klensin <john+w3c@jck.com>
Date: Sun, 31 Aug 2014 19:31:02 -0400
To: Andrew Cunningham <lang.support@gmail.com>
cc: Anne van Kesteren <annevk@annevk.nl>, Addison Phillips <addison@lab126.com>, Richard Ishida <ishida@w3.org>, www-international@w3.org, Larry Masinter <masinter@adobe.com>
Message-ID: <3E631B146708C02A715ACE45@[192.168.1.128]>
Andrew (and, by the way, John Cowan),

I certainly did not intend to be either brutal or elitist.  I'm
trying to separate what seem to me to be multiple problems in
the hope that it will help us move forward.  Over the history of
the Internet (and a few other technologies with which I've had
to work), institutionalizing incompatibility has rarely turned
out to be a good idea.  Sometimes it happens and we have to work
around it, sometimes those workarounds are successful, but even
that rarely changes the "bad idea" part.  If I had a script that
wasn't supported by Unicode, I'd be unlikely to write a proposal
to get it coded and then sit around waiting for years waiting
for them to do it.  However, I would write the proposal and,
when I created an interim system, I'd try to make sure there was
a migration plan and, ideally, that my interim system didn't
conflict with anyone else's.

I think we have some historically-established ways of doing that
which we know how to handle.  I'd hate to see us go back to 2022
and expand that registry but I can also imagine its being an
interesting (and non-conflicting) solution while waiting for
Unicode and, if ISO/IEC JTC1/SC2 isn't willing to maintain and
update that registry, I can imagine several entities who could
take over.  If the Unicode Consortium understands and is
convinced that this has become a serious problem, perhaps they
could start conditionally reserving some blocks for
as-yet-uncoded scripts so at least there could be unambiguous
migration paths, perhaps via a new subspecial of compatibility
mappings or providing surrogate-like escapes to other code
points that would parallel the 2022 system.    There may also be
better ideas, but I wish you (and others) would propose them
rather than --it seems to me-- merely complaining in louder and
louder voices (or, in this case, name-calling).

Any of those approaches (at least the ones I can think of) would
be very ugly, but far preferable to disguising a lot of one-off
font tricks or pseudo-Unicode, with potentially overlapping code
points, as Standard UTF-8 and hoping that the end systems can
sort out what is going on without any in-stream clues.  That
just leads to a very fragmented environment in which people
cannot communicate... or worse.

If the official Unicode Consortium position were really "people
should just wait to use their languages until we get around to
assigning code points and we reserve the right to take as many
years as we like" and the official WHATWG (much less W3C)
position were really "if your language and script don't have
officially assigned Unicode code points, you don't get to be on
the web" then it is probably time for the broader community to
do something about those groups.  Fortunately I haven't heard
anyone who can reasonably claim to speak for any of those bodies
say anything like that.  If you have, references would be
welcome.

More or less the same situation applies to the Encoding spec.
It still seems to me that it should be targeting UTF-8 and
Standard Unicode with other things viewed as transitional.  That
doesn't solve your "pseudo-Unicode" problem, but, AFAICT, it
doesn't make it any worse either.   As I have tried to say
before, I (at least) would be interested in what you do propose
but, so far, you just seem to be complaining about things that
won't work in the contexts you are concerned about.  Certainly
the web browsers and other software who are now supporting font
or [other?] pseudo-Unicode tricks aren't going to stop doing so
because Anne, WHATWG, or W3C say those tricks are bad --
everyone knows they are bad already, even (or especially) those
who think they are necessary and those who think they are
necessary are (correctly, IMO) unlikely to change their minds
until after someone offers real alternatives.

Finally (at least for today) there is a choice in principle
between saying "the browser vendors and page authors who are
using IANA Registry Charset labels but doing something else are
causing interoperability problems with the rest of the Internet
and the rest of the world and should be designing ways to get
out of that hole" and saying "many of the browser vendors are
doing this and, while it differs from what the IANA Registry is
usually believed to specify, it is the standard because they are
doing it and therefore everyone else should get in line".  The
first may be impractical (and probably is unless higher powers
intervene).   The second (or variants on it) would be a whole
lot more attractive if the community could feel some assurance
that we wouldn't have to look forward to another round of the
same thing in the future, e.g., an "Encoding 2018" spec that
said "don't pay any attention to the labels and definitions
established in Encoding 2014 because the browser vendors went
off in another direction".  One possible implication of your
comments is that the risk of that situation is pretty high; if
it is, then we really ought to be discussing a better solution
to it than either making proclamations that will be ignored or
engaging in fervent prayer that the light coming toward us
really isn't a train after all.

best,
   john


--On Monday, 01 September, 2014 08:22 +1000 Andrew Cunningham
<lang.support@gmail.com> wrote:

> Anne and John
> 
> Your comments read as brutal and elitist.
> 
> Do you have any idea of how long it takes to prepare and
> shepard through a Unicode proposal? How much work and
> resources it can take?
> 
> The communities that need Unicode support don't necessarily
> have the resources or expertise to prepare the proposals.
> 
> Recently a proposal went to UTC to disunify some charactrrs in
> the Myanmar block. The proposal was rightly rejected.
> 
> I had a chat to one of the authors of that proposal. What was
> interesting was the reason for preparing the proposal in the
> first place.
> 
> Essentially the problem was web browsers were precieved to
> have problems with displaying content in the languages in
> question.
> 
> Essentially they were trying to get changes in unicode because
> of deficiences in web browsers.
> 
> Most cases I know for use of what you refer to as hacks did
> not occur specifically because of lack of support of language
> in Unicode. It came as a specific consequence of lack of
> support in web browsers.
> 
> Lets be honest here. It is easier to get unicode to add
> support than it is to get web browsers to add support.
Received on Sunday, 31 August 2014 23:31:30 UTC