ISSUE-19: How are non-ASCII characters handled in CSP

Interaction of CSP and IRIs

How are non-ASCII characters handled in CSP

CSP Level 1
Raised by:
Brad Hill
Opened on:
Last Call comment from Boris Zbarsky <bzbarsky@MIT.EDU>:

Dear all,

I was just reading through the CSP draft, and I'm very concerned by the handling of non-ASCII characters in CSP. Specifically, I'm concerned about four things:

A) Lack of description for how one goes from an IRI or partial IRI to a
host-source expression.
B) Lack of description for how one compares a source expression to an
C) Lack of description for how one goes from a Unicode string to
D) The fact that the current setup is likely to cause interop problems.

As far as I can tell, the current setup is as follows:

1) All CSP policies are made up of bytes in the ASCII range (and in particular, a subset of that range). Non-ASCII hostnames are expected to be encoded as punycode, I guess (though this is not actually stated anywhere; see concern A above). Non-ASCII characters in paths presumably expected to be %-encoded, but the specification doesn't say what encoding should be used for this (concern A again). In practice, by the way, at least one implementation allows non-ASCII bytes in paths, though I think the spec is pretty clear that as things stand this is not allowed.

2) When comparing a source expression to an IRI, the IRI needs to first be converted to a URI, presumably per RFC 3987. If the presumption is correct, this should probably be explicitly called out (concern B above).

3) When converting a Unicode string to a policy, presumably one does it by taking the numeric value of each codepoint and treating it as an ASCII character index? If so, this should be explicitly called out (concern C above).

In practice, I expect people to just call their favorite escape() method on their strings if they have to shoehorn them into an ASCII format, which means that we'll get a mix of %-encoding in as ISO-8859-1 and
UTF-8 at the very least, and very possibly others. The result will be lack of interop (concern D).

It seems to me that a lot of these problems were alleviated if CSP policies were defined as sequences of Unicode codepoints, with a comparison function to IRIs. The spec would also need to define how to construct such a sequence of Unicode codepoints from a Content-Security-Policy HTTP header or a Content-Security-Policy-Report-Only HTTP header, but the result would be to allow authors to use strings that actually make sense to them in CSP policies instead of shoehorning them into an ASCII-only format in likely-broken ways.

Thank you for taking the time to read all that, Boris

Related Actions Items:
No related actions
Related emails:
No related emails

Related notes:

On 9/6/12 11:48 PM, Adam Barth wrote:
> HTTP operates in terms of URIs.

Yes, but very few authors actually write HTTP servers.

> I'm not sure I understand your question. Authors deal with
> host-expressions the same way they deal with the HTTP Host header.

Authors generally don't have to author Host headers; the UA sends those.
They will, however, need to author host-expressions to actually use CSP.

>> Why not? Everything else a browser has lying around (e.g. document
>> locations) is IRIs. Are host-source expressions never compared to
>> document locations?
> In the end, the browser needs to translate IRIs into URIs for use in
> HTTP. Everything in CSP 1.0 is defined in terms of networking
> operations

OK, fair.

> Indeed, but that's outside the scope of CSP 1.0.

Yes, I understand that's your position. I just wish there were a way to make this stuff less of a footgun for authors...

> Actually, if your issue is with the WebKit implementation, you can
> just file a bug and I'll write a test in the course of fixing it.

Note that I haven't looked through the Gecko version carefully (because regexps); it may have similar problems.

> The short version is that the IETF insists that folks use IDNA2008,
> but most browsers implement something closer to IDNA2003. IDNA2008 is
> not backwards compatible with IDNA2003 and so will never actually be
> deployed. Any attempts to hammer out a browser-consensus spec get
> shouted down by folks who are pushing IDNA2008.

I see. <sigh>.


Brad Hill, 11 Sep 2012, 03:38:10

Responses to this issue can be found in the following threads: (there are often several replies, so it is suggested to view "Contemporary messages sorted by thread".

The group's decision to close this issue without changing spec behavior was recorded in the minutes to the following teleconferences:

Brad Hill, 26 Oct 2012, 20:42:12

Display change log ATOM feed

Daniel Veditz <>, Mike West <>, Chairs, Wendy Seltzer <>, Samuel Weiler <>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <>.
$Id: 19.html,v 1.1 2020/01/17 08:52:23 carcone Exp $