This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
http://dev.w3.org/csswg/css3-syntax/#the-input-byte-stream step 3 currently reads > Check the byte stream. If the first several bytes match the hex sequence > > 40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B > > then ... This has two problems, one semantic and one cosmetic. 1) It does not allow for ASCII case insensitivity and is therefore inconsistent with the rest of CSS. It also doesn't allow use of a single-quoted string (this latter omission is actually _more_ likely to cause problems IMO). 2) For clarity, the interpretation of this hex sequence as ASCII characters should be shown. Suggested edit: replace the quote above with > Check the byte stream. If the first several bytes match the hex sequence > > 40 (43|63) (48|68) (41|61) (52|72) (53|73) (55|65) (54|74) 20 (22|27) (XX)* QQ 3B > > where QQ must be the same as the earlier 22 or 27 byte, (these bytes encode > the text '@charset "...";', matched _ASCII case-insensitively_ and > allowing both single and double-quoted string literals) then ... For even greater consistency with normal CSS parsing, it might also be nice to allow arbitrary whitespace before and after the string (not just a single space before and none after), and leading whitespace before the initial @-sign. To do that, change the 20 to WW*, add another WW* at the beginning and a third before the final 3B, and refer to the encoding standard for the definition of ASCII whitespace.
Does that match browser behavior? I'm not interested in aligning @charset with other CSS parsing, because it's a very, very special thing. However, I'll match whatever browsers do.
The relevant code in Firefox is css::Loader::GetCharsetFromData, here: https://mxr.mozilla.org/mozilla-central/source/layout/style/Loader.cpp#611 It only allows lowercase "charset", it doesn't allow single-quoted strings, the @ must be the first character in the document, and there must be exactly one space before the initial " character and none between the closing " and the semicolon. I would consider the first two of these flat-out bugs in Firefox tbh. I am not as familiar with Webkit, but the relevant code _appears_ to be TextResourceDecoder::checkForCSSCharset, here: https://trac.webkit.org/browser/trunk/Source/WebCore/loader/TextResourceDecoder.cpp#L455 This also does case-sensitive matching as far as I can tell, but it allows both single- and double-quoted strings, and it allows arbitrary whitespace before and after the string. The @ still has to be the first character in the document. Later today I will put together a comprehensive test case and find out what IE and Opera do.
Created attachment 1243 [details] test case Here's a test case. It will show a list of variations on @charset and report whether they worked or not. If you get all "yes" or all "no" results you should suspect that the browser might actually be *ignoring* @charset within a data: URL. This appears to be the behavior of, at least, the version of Chromium I have on here. Results with all the browsers I can conveniently test with will follow.
@charset "utf-8"; yes @CHARSET "utf-8"; no @ChArSeT "utf-8"; no @cHaRsEt "utf-8"; no @charset"utf-8"; yes @charset "utf-8"; no @charset\9"utf-8"; no @charset\A"utf-8"; no @charset\C"utf-8"; no @charset\D"utf-8"; no @charset "utf-8" ; no @charset "utf-8"; no @charset 'utf-8'; no Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0 Iceweasel/16.0.2
@charset "utf-8"; no @CHARSET "utf-8"; no @ChArSeT "utf-8"; no @cHaRsEt "utf-8"; no @charset"utf-8"; no @charset "utf-8"; no @charset\9"utf-8"; no @charset\A"utf-8"; no @charset\C"utf-8"; no @charset\D"utf-8"; no @charset "utf-8" ; no @charset "utf-8"; no @charset 'utf-8'; no Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4 (this is the browser that appears to ignore @charset in a data: URL altogether)
@charset "utf-8"; yes @CHARSET "utf-8"; no @ChArSeT "utf-8"; no @cHaRsEt "utf-8"; no @charset"utf-8"; yes @charset "utf-8"; yes @charset\9"utf-8"; yes @charset\A"utf-8"; no @charset\C"utf-8"; no @charset\D"utf-8"; no @charset "utf-8" ; yes @charset "utf-8"; no @charset 'utf-8'; yes Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.22+ (KHTML, like Gecko) Chromium/17.0.963.56 Chrome/17.0.963.56 Safari/535.22+ Debian/unstable (3.4.2-2) Epiphany/3.4.2 (this is probably representative of *uncustomized* WebKit)
@charset "utf-8"; yes @CHARSET "utf-8"; yes @ChArSeT "utf-8"; yes @cHaRsEt "utf-8"; yes @charset"utf-8"; yes @charset "utf-8"; yes @charset\9"utf-8"; yes @charset\A"utf-8"; yes @charset\C"utf-8"; yes @charset\D"utf-8"; yes @charset "utf-8" ; yes @charset "utf-8"; yes @charset 'utf-8'; yes Opera/9.80 (Windows NT 6.1; WOW64; U; en) Presto/2.10.289 Version/12.00 ... and unfortunately, IE9 appears not to support this use of data: URLs, so I'm going to have to redo the test case with a whole bunch of tiny supporting files to get that to work. :-(
Created attachment 1244 [details] corrected test case
(In reply to comment #4) > @charset"utf-8"; yes I was puzzled by this as the code is pretty clear. Then I figured "b1"'s data URL isn't correct. It now gives me: @charset "utf-8"; yes @CHARSET "utf-8"; no @ChArSeT "utf-8"; no @cHaRsEt "utf-8"; no @charset"utf-8"; no @charset "utf-8"; no @charset\9"utf-8"; no @charset\A"utf-8"; no @charset\C"utf-8"; no @charset\D"utf-8"; no @charset "utf-8" ; no @charset "utf-8"; no @charset 'utf-8'; no Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0 (In reply to comment #7) > ... and unfortunately, IE9 appears not to support this use of data: URLs, so > I'm going to have to redo the test case with a whole bunch of tiny > supporting files to get that to work. :-( So I did that for you ;) @charset "utf-8"; yes @CHARSET "utf-8"; no @ChArSeT "utf-8"; no @cHaRsEt "utf-8"; no @charset"utf-8"; no @charset "utf-8"; no @charset\9"utf-8"; no @charset\A"utf-8"; no @charset\C"utf-8"; no @charset\D"utf-8"; no @charset "utf-8" ; no @charset "utf-8"; no @charset 'utf-8'; no Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; eSobiSubscriber 2.0.4.16; .NET4.0C) It's not suprsing to me as CSS 2.1 is OK clear here: # @charset must be written literally, i.e., the 10 characters '@charset "' # (lowercase, no backslash escapes), followed by the encoding name, followed # by '";'.
Yay! With Firefox and IE9 both being very strict, it looks like I can justify patching WebKit to be strict, and keeping the spec strict as well.
(In reply to comment #10) > Yay! With Firefox and IE9 both being very strict, it looks like I can > justify patching WebKit to be strict, and keeping the spec strict as well. Cool. (Aside: Allowing multiple spaces between "@charset" and the quote would have been particularly annoying, since it would have introduced the problem XML has: There can be an arbitrary amount of whitespace padding in XML after "<?xml" but before "encoding=", so implementations end up setting an arbitrary limit.)
Yeah, given mostly-consistent browser behavior I am OK keeping the spec strict as well. Please add an explanation of the ASCII interpretation of the byte stream, though.
(In reply to comment #12) > Yeah, given mostly-consistent browser behavior I am OK keeping the spec > strict as well. Please add an explanation of the ASCII interpretation of > the byte stream, though. Done. I'll mark this as fixed, and go patch webkit.