17514 – URI token should be agnostic to escaping the characters 'u', 'r', 'l' (reopening of Issue 23)

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17514 - URI token should be agnostic to escaping the characters 'u', 'r', 'l' (reopening of Issue 23)

Summary: URI token should be agnostic to escaping the characters 'u', 'r', 'l' (reopen...

Status:	NEW

Alias:	None

Product:	CSS
Classification:	Unclassified
Component:	CSS Level 2 (show other bugs)
Version:	unspecified
Hardware:	All Windows 3.1

Importance:	P2 normal
Target Milestone:	---
Assignee:	Bert Bos
QA Contact:	public-css-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-06-16 14:28 UTC by Anton P
Modified:	2013-01-14 18:21 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Anton P 2012-06-16 14:28:21 UTC

4.1.1 (Tokenization) defines the URI token as:

  # URI    url\({w}{string}{w}\)
  #       |url\({w}([!#$%&*-\[\]-~]|{nonascii}|{escape})*{w}\)

(and similarly for BAD_URI), whilst G.1 gives:

  # "url("{w}{string}{w}")"   {return URI;}
  # "url("{w}{url}{w}")"    {return URI;}

(and similarly for BAD_URI).

This means that if you escape any of the characters 'u', 'r', 'l' in a property value intended to match the <uri> value type, then what you might have expected to have tokenized as URI is actually tokenized as FUNCTION.

However, this doesn't match UAs; Trident, Gecko and Presto all allow the characters to be escaped and still invoke the normal URI token parsing.

The spec should be changed to match reality.

Conversation begins:
Bug report:
http://lists.w3.org/Archives/Public/www-style/2012May/0327.html

Comment 1 Anton P 2012-06-16 15:15:32 UTC

This issue has an interesting history.


It began life as Issue 23 [http://wiki.csswg.org/spec/css2.1#issue-23]:

URL
    http://lists.w3.org/Archives/Public/www-style/2007Dec/0215.html
Summary
    “url(” needs to be {u}{r}{l}”(” in grammar
Resolution
    Assumed editorial.
Status
    Closed.
Testcases
    uri-015


It was implemented in the 2009-04-23 CR as Change 5.89 [http://www.w3.org/TR/CSS21/changes.html#q376]:

C.5.89 Section G.2 Lexical scanner

[2008-03-05] Change the tokenizer rules

from

"url("{w}{string}{w}")" {return URI;}
"url("{w}{url}{w}")"    {return URI;}

to

{U}{R}{L}"("{w}{string}{w}")"	{return URI;}
{U}{R}{L}"("{w}{url}{w}")"	{return URI;}


Unfortunately, this change only happened in G.2, rather than uniformly across both G.2 and 4.1.1, as observed by Bjoern Hoehrman in http://lists.w3.org/Archives/Public/www-style/2010Jul/0499 .  This led to the change being reversed in the 2011-04-12 PR as Change 8.52 [http://www.w3.org/TR/CSS21/changes.html#q546]:

C.8.52 G.2 Lexical scanner

The tokenizer in the appendix allowed backslashes in the URI token, in contradiction with the same token in the core grammar and the error recovery token {baduri}:

Change from

    {U}{R}{L}"("{w}{string}{w}")"      {return URI;}
    {U}{R}{L}"("{w}{url}{w}")"         {return URI;}

to

    "url("{w}{string}{w}")"            {return URI;}
    "url("{w}{url}{w}")"               {return URI;}

(Note that the Change description mentions an incompatiblity with the BAD_URI token, which is rather disingenuous given that this token was first introduced at the same time as the Change.)


This led to a discussion (starting at http://lists.w3.org/Archives/Public/www-style/2011Apr/0680.html ) concerning the validity of a particular test case in the test suite.


The issue was noted in http://lists.w3.org/Archives/Public/www-style/2012Apr/0152 to have an impact on the attr() function from css3-values.

Comment 2 Anton P 2013-01-14 18:21:12 UTC

The WG resolved[1] to permit escaping of the letters 'u', 'r' and 'l' at the start of the URI token (and implicitly resolved to do the same for BAD_URI).

[1] http://lists.w3.org/Archives/Public/www-style/2013Jan/0080.html