This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17514 - URI token should be agnostic to escaping the characters 'u', 'r', 'l' (reopening of Issue 23)
Summary: URI token should be agnostic to escaping the characters 'u', 'r', 'l' (reopen...
Status: NEW
Alias: None
Product: CSS
Classification: Unclassified
Component: CSS Level 2 (show other bugs)
Version: unspecified
Hardware: All Windows 3.1
: P2 normal
Target Milestone: ---
Assignee: Bert Bos
QA Contact: public-css-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-16 14:28 UTC by Anton P
Modified: 2013-01-14 18:21 UTC (History)
0 users

See Also:


Attachments

Description Anton P 2012-06-16 14:28:21 UTC
4.1.1 (Tokenization) defines the URI token as:

  # URI    url\({w}{string}{w}\)
  #       |url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}\)

(and similarly for BAD_URI), whilst G.1 gives:

  # "url("{w}{string}{w}")"   {return URI;}
  # "url("{w}{url}{w}")"    {return URI;}

(and similarly for BAD_URI).

This means that if you escape any of the characters 'u', 'r', 'l' in a property value intended to match the <uri> value type, then what you might have expected to have tokenized as URI is actually tokenized as FUNCTION.

However, this doesn't match UAs; Trident, Gecko and Presto all allow the characters to be escaped and still invoke the normal URI token parsing.

The spec should be changed to match reality.

Conversation begins:
Bug report:
http://lists.w3.org/Archives/Public/www-style/2012May/0327.html
Comment 1 Anton P 2012-06-16 15:15:32 UTC
This issue has an interesting history.


It began life as Issue 23 [http://wiki.csswg.org/spec/css2.1#issue-23]:

URL
    http://lists.w3.org/Archives/Public/www-style/2007Dec/0215.html
Summary
    “url(” needs to be {u}{r}{l}”(” in grammar
Resolution
    Assumed editorial.
Status
    Closed.
Testcases
    uri-015


It was implemented in the 2009-04-23 CR as Change 5.89 [http://www.w3.org/TR/CSS21/changes.html#q376]:

C.5.89 Section G.2 Lexical scanner

[2008-03-05] Change the tokenizer rules

from

"url("{w}{string}{w}")" {return URI;}
"url("{w}{url}{w}")"    {return URI;}

to

{U}{R}{L}"("{w}{string}{w}")"	{return URI;}
{U}{R}{L}"("{w}{url}{w}")"	{return URI;}


Unfortunately, this change only happened in G.2, rather than uniformly across both G.2 and 4.1.1, as observed by Bjoern Hoehrman in http://lists.w3.org/Archives/Public/www-style/2010Jul/0499 .  This led to the change being reversed in the 2011-04-12 PR as Change 8.52 [http://www.w3.org/TR/CSS21/changes.html#q546]:

C.8.52 G.2 Lexical scanner

The tokenizer in the appendix allowed backslashes in the URI token, in contradiction with the same token in the core grammar and the error recovery token {baduri}:

Change from

    {U}{R}{L}"("{w}{string}{w}")"      {return URI;}
    {U}{R}{L}"("{w}{url}{w}")"         {return URI;}

to

    "url("{w}{string}{w}")"            {return URI;}
    "url("{w}{url}{w}")"               {return URI;}

(Note that the Change description mentions an incompatiblity with the BAD_URI token, which is rather disingenuous given that this token was first introduced at the same time as the Change.)


This led to a discussion (starting at http://lists.w3.org/Archives/Public/www-style/2011Apr/0680.html ) concerning the validity of a particular test case in the test suite.


The issue was noted in http://lists.w3.org/Archives/Public/www-style/2012Apr/0152 to have an impact on the attr() function from css3-values.
Comment 2 Anton P 2013-01-14 18:21:12 UTC
The WG resolved[1] to permit escaping of the letters 'u', 'r' and 'l' at the start of the URI token (and implicitly resolved to do the same for BAD_URI).

[1] http://lists.w3.org/Archives/Public/www-style/2013Jan/0080.html