<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>17514</bug_id>
          
          <creation_ts>2012-06-16 14:28:21 +0000</creation_ts>
          <short_desc>URI token should be agnostic to escaping the characters &apos;u&apos;, &apos;r&apos;, &apos;l&apos; (reopening of Issue 23)</short_desc>
          <delta_ts>2013-01-14 18:21:12 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>CSS</product>
          <component>CSS Level 2</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>Windows 3.1</op_sys>
          <bug_status>NEW</bug_status>
          <resolution></resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Anton P">antonsforums</reporter>
          <assigned_to name="Bert Bos">bert</assigned_to>
          
          
          <qa_contact>public-css-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>69147</commentid>
    <comment_count>0</comment_count>
    <who name="Anton P">antonsforums</who>
    <bug_when>2012-06-16 14:28:21 +0000</bug_when>
    <thetext>4.1.1 (Tokenization) defines the URI token as:

  # URI    url\({w}{string}{w}\)
  #       |url\({w}([!#$%&amp;*-\[\]-~]|{nonascii}|{escape})*{w}\)

(and similarly for BAD_URI), whilst G.1 gives:

  # &quot;url(&quot;{w}{string}{w}&quot;)&quot;   {return URI;}
  # &quot;url(&quot;{w}{url}{w}&quot;)&quot;    {return URI;}

(and similarly for BAD_URI).

This means that if you escape any of the characters &apos;u&apos;, &apos;r&apos;, &apos;l&apos; in a property value intended to match the &lt;uri&gt; value type, then what you might have expected to have tokenized as URI is actually tokenized as FUNCTION.

However, this doesn&apos;t match UAs; Trident, Gecko and Presto all allow the characters to be escaped and still invoke the normal URI token parsing.

The spec should be changed to match reality.

Conversation begins:
Bug report:
http://lists.w3.org/Archives/Public/www-style/2012May/0327.html</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>69148</commentid>
    <comment_count>1</comment_count>
    <who name="Anton P">antonsforums</who>
    <bug_when>2012-06-16 15:15:32 +0000</bug_when>
    <thetext>This issue has an interesting history.


It began life as Issue 23 [http://wiki.csswg.org/spec/css2.1#issue-23]:

URL
    http://lists.w3.org/Archives/Public/www-style/2007Dec/0215.html
Summary
    “url(” needs to be {u}{r}{l}”(” in grammar
Resolution
    Assumed editorial.
Status
    Closed.
Testcases
    uri-015


It was implemented in the 2009-04-23 CR as Change 5.89 [http://www.w3.org/TR/CSS21/changes.html#q376]:

C.5.89 Section G.2 Lexical scanner

[2008-03-05] Change the tokenizer rules

from

&quot;url(&quot;{w}{string}{w}&quot;)&quot; {return URI;}
&quot;url(&quot;{w}{url}{w}&quot;)&quot;    {return URI;}

to

{U}{R}{L}&quot;(&quot;{w}{string}{w}&quot;)&quot;	{return URI;}
{U}{R}{L}&quot;(&quot;{w}{url}{w}&quot;)&quot;	{return URI;}


Unfortunately, this change only happened in G.2, rather than uniformly across both G.2 and 4.1.1, as observed by Bjoern Hoehrman in http://lists.w3.org/Archives/Public/www-style/2010Jul/0499 .  This led to the change being reversed in the 2011-04-12 PR as Change 8.52 [http://www.w3.org/TR/CSS21/changes.html#q546]:

C.8.52 G.2 Lexical scanner

The tokenizer in the appendix allowed backslashes in the URI token, in contradiction with the same token in the core grammar and the error recovery token {baduri}:

Change from

    {U}{R}{L}&quot;(&quot;{w}{string}{w}&quot;)&quot;      {return URI;}
    {U}{R}{L}&quot;(&quot;{w}{url}{w}&quot;)&quot;         {return URI;}

to

    &quot;url(&quot;{w}{string}{w}&quot;)&quot;            {return URI;}
    &quot;url(&quot;{w}{url}{w}&quot;)&quot;               {return URI;}

(Note that the Change description mentions an incompatiblity with the BAD_URI token, which is rather disingenuous given that this token was first introduced at the same time as the Change.)


This led to a discussion (starting at http://lists.w3.org/Archives/Public/www-style/2011Apr/0680.html ) concerning the validity of a particular test case in the test suite.


The issue was noted in http://lists.w3.org/Archives/Public/www-style/2012Apr/0152 to have an impact on the attr() function from css3-values.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>81347</commentid>
    <comment_count>2</comment_count>
    <who name="Anton P">antonsforums</who>
    <bug_when>2013-01-14 18:21:12 +0000</bug_when>
    <thetext>The WG resolved[1] to permit escaping of the letters &apos;u&apos;, &apos;r&apos; and &apos;l&apos; at the start of the URI token (and implicitly resolved to do the same for BAD_URI).

[1] http://lists.w3.org/Archives/Public/www-style/2013Jan/0080.html</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>