Re: #227: Encoding advice for new headers and parameters

On 2016-09-28 04:29, Mark Nottingham wrote:
> [ "just me" hat on ]
>
> <https://github.com/httpwg/http-extensions/issues/227>
>
> After some discussion in Berlin and Stockholm, as well as experience with dealing with i18n in parameters for the Link header (see <https://github.com/mnot/I-D/issues/180>), I think we should give more definite advice about when RFC5987(bis) encoding should and should not be used.
>
> In particular, flagging encoding by using a parameter name complicates extension processing (see the issue referenced above), and causes a lot of uncertainty about precedence, etc.

It complicates processing *slightly*.

The issue of parameters potentially repeating, and the fact that you 
need to define what to do in that case, exists in any way. It is 
inherent in any format that supports name/value pairs.

> I think it would be much simpler and more reliable to advise people minting new HTTP headers to *not* use RFC5987(bis) encoding, but instead advise that they mandate use of an encoding on the field (or a specified portion thereof).

RFC 5987 defines a way to deal with non-ASCII. It's not pretty, has a 
slightly bizarre syntax, but at least it's there, and it has been 
implemented successfully in all widely deployed user agents.

Defining *another* way to achieve this seems like a bad idea to me 
(insert XKCD reference here...).

(And yes, I'm all for working on a new common field syntax, which, as 
side effect, addresses non-ASCII, but that's a separate discussion)

> E.g., if the "foo" parameter on the "bar" header field might need to accept non-ascii content, it MUST be generated with those characters encoded, and MUST be parsed by first decoding that portion of the header.

...which essentially *is* the format used RFC 5987, minus parameter 
naming and preamble.

Requiring it's use sounds attractive, but I have my doubts that the 
typical "producer" of field values will get it right; thus we might see 
"%" characters which are not meant to be percent-escapes in the wild.

The RFC 5987 format, as ugly it might be, at least has the property that 
the producer needs to make a conscious decision to choose the format, 
and thus hopefully will get it right according to spec.

> The actual encoding to be used need not be specified, ...

Here I disagree even more. Telling people not to use a standard format, 
but *not* to tell them what to use instead is strange.

 > ...but the simplest approach would probably be to use RFC3986 
%-encoding over a UTF-8 string.

> A more aggressive approach would be to also recommend that new parameters on existing fields (even if they specify use of 5987) SHOULD use such encoding.

-1 to mixing different escaping rules in the same field.

> Thoughts? I'm not going to lie down in the road for this, in that I suspect that most people will gravitate towards this kind of solution naturally, rather than use 5987, but it'd be nice to put clear advice out there.

I'm opposed to discourage use of RFC 5987 encoding until we have 
something better to recommend (and that includes a specification for it).

I'll also point out that in the meantime, at least one more 
specification uses this format 
(<https://tools.ietf.org/html/draft-ietf-httpauth-extension-09#section-4>), 
so if you are serious about discouraging it's use, you really should 
comment on that spec right now.

Best regards, Julian

Received on Thursday, 29 September 2016 13:49:17 UTC