Re: Factoring out Content-Disposition (i123), was: Content-Disposition (new issue?)

Brian Smith wrote:
> During the IETF meeting, what was the result of the discussions about
> Unicode support in HTTP? Looking at the IRC log, it looked like the
> discussion was leaning towards allowing UTF-8 in an otherwise-unencoded form
> in headers (applications should start accepting unencoded UTF-8 but should
> avoid sending it right now). If that is the way things are going to go, a
> general RFC 2231 profile for HTTP seems counterproductive.

As far as I recall, we played with the idea, but we were unsure whether 
this would be possible to do.

RFC 2231 already is implemented in two of the big four UAs, and it works 
around the whole issue, so I think making it easier to use (by clearly 
stating how it works in HTTP) is a good idea in any case.

> RFC 2231 + UTF-8 is an especially bad interchange format for text since it
> requires over 9 bytes per letter for the vast majority of people's native

Making UTF-16 support mandatory could help her, but I'm not sure how 
widespread support for that is (recall I'm trying to document what 
several UAs already do and have been doing for a long time). Will keep 
this in mind when writing test cases.

> languages. Plus, there are no features for language tagging (needed for CJK
> languages), BIDI (needed for middle-eastern languages), or accessibility
> (for users of screen readers). IMO, the best thing to do is to keep

RFC 2231 *does* include language tagging. WRT BIDI I'm no expert, but I 
thought Unicode has something to say here? And could you clarify the 
accessibility concern please?

> language-sensitive text out of HTTP as much as possible by recommending that
> applications transfer language-sensitive text in entity bodies as much as
> possible. Really, it is only suitable for short, language-neutral  strings
> like (file and IRI) path fragments.

That's something I agree with. For instance, WebDAV doesn't suffer from 
these kinds of problems because anything that is text actually travels 
in entity bodies as XML.

That being said, you can't always avoid it, such as in 
Content-Disposition or Slug.

> Nitpicks:
> 
> The draft references Unicode 4.0 indirectly through RFC3629. It would be
> better to allow implementations to use any later versions, or at least the
> current version, 5.1.

Yes, that's a nit, isn't it :-).

> I don't see the point of requiring ISO-8859-1. ISO-8859-1 can only encode a
> very small number of languages that are used by a small minority of people
> (who just happen to be over-represented in standards committees). Advocating
> ISO-8859-1 also seems to be the opposite of what was discussed at the IETF
> meeting (AFAICT from the logs).

I originally want to mandate UTF-8 only, but people pointed out 
(rightfully), that any HTTP software already needs to understand 
ISO-8859-1, so it really doesn't make a difference.

BR, Julian

Received on Friday, 15 August 2008 21:01:45 UTC