This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 19882 - @charset rule logic is not case insensitive & should state ASCII interpretation for clarity
Summary: @charset rule logic is not case insensitive & should state ASCII interpretati...
Status: RESOLVED FIXED
Alias: None
Product: CSS
Classification: Unclassified
Component: Syntax (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Tab Atkins Jr.
QA Contact: public-css-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-11-06 16:01 UTC by Zack Weinberg
Modified: 2012-11-07 23:56 UTC (History)
2 users (show)

See Also:


Attachments
test case (3.42 KB, text/html)
2012-11-06 19:20 UTC, Zack Weinberg
Details
corrected test case (3.42 KB, text/html)
2012-11-06 23:58 UTC, Kang-Hao (Kenny) Lu
Details

Description Zack Weinberg 2012-11-06 16:01:28 UTC
http://dev.w3.org/csswg/css3-syntax/#the-input-byte-stream step 3 currently reads

> Check the byte stream. If the first several bytes match the hex sequence
>
>     40 63 68 61 72 73 65 74 20 22 (XX)* 22 3B
>
> then ...

This has two problems, one semantic and one cosmetic.

1) It does not allow for ASCII case insensitivity and is therefore inconsistent with the rest of CSS.  It also doesn't allow use of a single-quoted string (this latter omission is actually _more_ likely to cause problems IMO).

2) For clarity, the interpretation of this hex sequence as ASCII characters should be shown.

Suggested edit: replace the quote above with 

> Check the byte stream. If the first several bytes match the hex sequence
>
>   40 (43|63) (48|68) (41|61) (52|72) (53|73) (55|65) (54|74) 20 (22|27) (XX)* QQ 3B
>
> where QQ must be the same as the earlier 22 or 27 byte, (these bytes encode
> the text '@charset "...";', matched _ASCII case-insensitively_ and
> allowing both single and double-quoted string literals) then ...

For even greater consistency with normal CSS parsing, it might also be nice to allow arbitrary whitespace before and after the string (not just a single space before and none after), and leading whitespace before the initial @-sign.  To do that, change the 20 to WW*, add another WW* at the beginning and a third before the final 3B, and refer to the encoding standard for the definition of ASCII whitespace.
Comment 1 Tab Atkins Jr. 2012-11-06 16:03:34 UTC
Does that match browser behavior?  I'm not interested in aligning @charset with other CSS parsing, because it's a very, very special thing.  However, I'll match whatever browsers do.
Comment 2 Zack Weinberg 2012-11-06 16:30:13 UTC
The relevant code in Firefox is css::Loader::GetCharsetFromData, here: https://mxr.mozilla.org/mozilla-central/source/layout/style/Loader.cpp#611
It only allows lowercase "charset", it doesn't allow single-quoted strings, the @ must be the first character in the document, and there must be exactly one space before the initial " character and none between the closing " and the semicolon.  I would consider the first two of these flat-out bugs in Firefox tbh.

I am not as familiar with Webkit, but the relevant code _appears_ to be TextResourceDecoder::checkForCSSCharset, here: https://trac.webkit.org/browser/trunk/Source/WebCore/loader/TextResourceDecoder.cpp#L455  This also does case-sensitive matching as far as I can tell, but it allows both single- and double-quoted strings, and it allows arbitrary whitespace before and after the string.  The @ still has to be the first character in the document.

Later today I will put together a comprehensive test case and find out what IE and Opera do.
Comment 3 Zack Weinberg 2012-11-06 19:20:27 UTC
Created attachment 1243 [details]
test case

Here's a test case.  It will show a list of variations on @charset and report whether they worked or not.

If you get all "yes" or all "no" results you should suspect that the browser might actually be *ignoring* @charset within a data: URL.  This appears to be the behavior of, at least, the version of Chromium I have on here.  Results with all the browsers I can conveniently test with will follow.
Comment 4 Zack Weinberg 2012-11-06 19:20:56 UTC
@charset "utf-8";    yes
@CHARSET "utf-8";    no
@ChArSeT "utf-8";    no
@cHaRsEt "utf-8";    no

@charset"utf-8";     yes
@charset  "utf-8";   no
@charset\9"utf-8";   no
@charset\A"utf-8";   no
@charset\C"utf-8";   no
@charset\D"utf-8";   no
@charset "utf-8" ;   no
 @charset "utf-8";   no

@charset 'utf-8';    no

Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0 Iceweasel/16.0.2
Comment 5 Zack Weinberg 2012-11-06 19:21:28 UTC
@charset "utf-8";    no
@CHARSET "utf-8";    no
@ChArSeT "utf-8";    no
@cHaRsEt "utf-8";    no

@charset"utf-8";     no
@charset  "utf-8";   no
@charset\9"utf-8";   no
@charset\A"utf-8";   no
@charset\C"utf-8";   no
@charset\D"utf-8";   no
@charset "utf-8" ;   no
 @charset "utf-8";   no

@charset 'utf-8';    no

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4

(this is the browser that appears to ignore @charset in a data: URL altogether)
Comment 6 Zack Weinberg 2012-11-06 19:22:05 UTC
@charset "utf-8";    yes
@CHARSET "utf-8";    no
@ChArSeT "utf-8";    no
@cHaRsEt "utf-8";    no

@charset"utf-8";     yes
@charset  "utf-8";   yes
@charset\9"utf-8";   yes
@charset\A"utf-8";   no
@charset\C"utf-8";   no
@charset\D"utf-8";   no
@charset "utf-8" ;   yes
 @charset "utf-8";   no

@charset 'utf-8';    yes

Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/535.22+ (KHTML, like Gecko) Chromium/17.0.963.56 Chrome/17.0.963.56 Safari/535.22+ Debian/unstable (3.4.2-2) Epiphany/3.4.2

(this is probably representative of *uncustomized* WebKit)
Comment 7 Zack Weinberg 2012-11-06 19:40:39 UTC
@charset "utf-8";    yes
@CHARSET "utf-8";    yes
@ChArSeT "utf-8";    yes
@cHaRsEt "utf-8";    yes

@charset"utf-8";     yes
@charset  "utf-8";   yes
@charset\9"utf-8";   yes
@charset\A"utf-8";   yes
@charset\C"utf-8";   yes
@charset\D"utf-8";   yes
@charset "utf-8" ;   yes
 @charset "utf-8";   yes

@charset 'utf-8';    yes


Opera/9.80 (Windows NT 6.1; WOW64; U; en) Presto/2.10.289 Version/12.00

... and unfortunately, IE9 appears not to support this use of data: URLs, so I'm going to have to redo the test case with a whole bunch of tiny supporting files to get that to work. :-(
Comment 8 Kang-Hao (Kenny) Lu 2012-11-06 23:58:53 UTC
Created attachment 1244 [details]
corrected test case
Comment 9 Kang-Hao (Kenny) Lu 2012-11-07 00:06:51 UTC
(In reply to comment #4)
> @charset"utf-8";     yes

I was puzzled by this as the code is pretty clear. Then I figured "b1"'s data URL isn't correct.

It now gives me:

@charset "utf-8";    yes
@CHARSET "utf-8";    no
@ChArSeT "utf-8";    no
@cHaRsEt "utf-8";    no

@charset"utf-8";     no
@charset  "utf-8";   no
@charset\9"utf-8";   no
@charset\A"utf-8";   no
@charset\C"utf-8";   no
@charset\D"utf-8";   no
@charset "utf-8" ;   no
 @charset "utf-8";   no

@charset 'utf-8';    no

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:17.0) Gecko/17.0 Firefox/17.0

(In reply to comment #7)
> ... and unfortunately, IE9 appears not to support this use of data: URLs, so
> I'm going to have to redo the test case with a whole bunch of tiny
> supporting files to get that to work. :-(

So I did that for you ;)

@charset "utf-8";    yes
@CHARSET "utf-8";    no
@ChArSeT "utf-8";    no
@cHaRsEt "utf-8";    no

@charset"utf-8";     no
@charset  "utf-8";   no
@charset\9"utf-8";   no
@charset\A"utf-8";   no
@charset\C"utf-8";   no
@charset\D"utf-8";   no
@charset "utf-8" ;   no
 @charset "utf-8";   no

@charset 'utf-8';    no

 
Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; eSobiSubscriber 2.0.4.16; .NET4.0C)


It's not suprsing to me as CSS 2.1 is OK clear here:

  # @charset must be written literally, i.e., the 10 characters '@charset "'
  # (lowercase, no backslash escapes), followed by the encoding name, followed
  # by '";'.
Comment 10 Tab Atkins Jr. 2012-11-07 01:28:33 UTC
Yay!  With Firefox and IE9 both being very strict, it looks like I can justify patching WebKit to be strict, and keeping the spec strict as well.
Comment 11 Henri Sivonen 2012-11-07 07:17:44 UTC
(In reply to comment #10)
> Yay!  With Firefox and IE9 both being very strict, it looks like I can
> justify patching WebKit to be strict, and keeping the spec strict as well.

Cool. 

(Aside: Allowing multiple spaces between "@charset" and the quote would have been particularly annoying, since it would have introduced the problem XML has: There can be an arbitrary amount of whitespace padding in XML after "<?xml" but before "encoding=", so implementations end up setting an arbitrary limit.)
Comment 12 Zack Weinberg 2012-11-07 23:35:08 UTC
Yeah, given mostly-consistent browser behavior I am OK keeping the spec strict as well.  Please add an explanation of the ASCII interpretation of the byte stream, though.
Comment 13 Tab Atkins Jr. 2012-11-07 23:56:37 UTC
(In reply to comment #12)
> Yeah, given mostly-consistent browser behavior I am OK keeping the spec
> strict as well.  Please add an explanation of the ASCII interpretation of
> the byte stream, though.

Done.  I'll mark this as fixed, and go patch webkit.