<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>10167</bug_id>
          
          <creation_ts>2010-07-14 16:01:34 +0000</creation_ts>
          <short_desc>HTML5 Polyglot spec breaks RDFa case sensitivity</short_desc>
          <delta_ts>2010-10-29 20:35:48 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>HTML WG</product>
          <component>pre-LC1 HTML/XHTML Compat. Authoring Guide (ed: Eliot Graff)</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc>http://www.w3.org/TR/2010/WD-html-polyglot-20100624/#attribute-values</bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>critical</bug_severity>
          <target_milestone>FPWD</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Manu Sporny">msporny</reporter>
          <assigned_to name="Eliot Graff">eliotgra</assigned_to>
          <cc>eliotgra</cc>
    
    <cc>hsivonen</cc>
    
    <cc>julian.reschke</cc>
    
    <cc>mail</cc>
    
    <cc>mike</cc>
    
    <cc>public-html-admin</cc>
    
    <cc>public-html-wg-issue-tracking</cc>
    
    <cc>shane</cc>
          
          <qa_contact name="HTML WG Bugzilla archive list">public-html-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>36840</commentid>
    <comment_count>0</comment_count>
    <who name="Manu Sporny">msporny</who>
    <bug_when>2010-07-14 16:01:34 +0000</bug_when>
    <thetext>The polyglot spec currently states the following:

http://www.w3.org/TR/2010/WD-html-polyglot-20100624/#attribute-values

[[[
Polyglot markup uses lowercase letters for the values of the attributes in the following list when they exist on HTML elements. More specifically, where required, polyglot markup must use lower case letters for all ASCII letters in these attribute values; however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. Attributes for HTML elements other than those in the following list may have values made of mixed case letters. All attributes on non-HTML elements may have values made of mixed case letters.
]]]

This means that authors won&apos;t be able to use case-sensitive vocabulary terms in RDFa 1.1, which is a bad thing. Take the following code snippet as an example:

This document conforms to the &lt;a vocab=&quot;http://purl.org/dc/terms/&quot; rel=&quot;http://www.w3.org/TR/html5/conformsTo&quot; href=&quot;http://www.w3.org/TR/html5/&quot;&gt;HTML5&lt;/a&gt; standard.

If that&apos;s not convincing, the same would apply to URLs:

This document conforms to the &lt;a rel=&quot;http://purl.org/dc/terms/conformsTo&quot; href=&quot;http://www.w3.org/TR/html5/&quot;&gt;HTML5&lt;/a&gt; standard.

Based on the rules above, the author would be forced to lower-case the URL, which would create the following triple:

&lt;&gt; &lt;http://purl.org/dc/terms/conformsto&gt; &lt;http://www.w3.org/TR/html5&gt; .

Note that the predicate URL is lower-cased, which is a meaningless predicate - it won&apos;t dereference to the correct machine-readable URL. This issue can be resolved by adding text with something to this effect:

&quot;However, attribute values that are designed to be case sensitive, like certain RDFa predicate values or URLs placed in @rel and @rev MUST be specified in a case sensitive manner.&quot;

You could also resolve the issue by stating that only enumerated attribute values MUST be lowercased, all other attribute values MUST preserve case.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>36841</commentid>
    <comment_count>1</comment_count>
    <who name="Manu Sporny">msporny</who>
    <bug_when>2010-07-14 16:08:03 +0000</bug_when>
    <thetext>This markup is wrong:

&gt; This document conforms to the &lt;a vocab=&quot;http://purl.org/dc/terms/&quot;
&gt; rel=&quot;http://www.w3.org/TR/html5/conformsTo&quot;
&gt; href=&quot;http://www.w3.org/TR/html5/&quot;&gt;HTML5&lt;/a&gt; standard.

it should be this:

This document conforms to the &lt;a vocab=&quot;http://purl.org/dc/terms/&quot; rel=&quot;conformsTo&quot; href=&quot;http://www.w3.org/TR/html5/&quot;&gt;HTML5&lt;/a&gt; standard.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>38716</commentid>
    <comment_count>2</comment_count>
    <who name="Eliot Graff">eliotgra</who>
    <bug_when>2010-09-09 00:26:16 +0000</bug_when>
    <thetext>Manu,

Do you have someplace specific for me to point to to reference the last sentence of this change?

I&apos;ve updated the spec to read as such:

6.2.3 Attribute Values

Polyglot markup uses lowercase letters for the values of the attributes in the following list when they exist on HTML elements. More specifically, where required, polyglot markup must use lower case letters for all ASCII letters in these attribute values; however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. For attribute values on HTML elements other than those in the following list, polyglot markup may use mixed case letters.

Because XML is case sensitive, polyglot markup also requires case to be consistent for values between markup, DOM APIs, and CSS. In addition, polyglot markup respects the case sensitivity of all other attribute values. Although polyglot markup must always have lowercase values of the attributes in the following list when they exist on HTML elements, attributes not in this list and attributes on non-HTML elements may have values made of mixed case letters. Note that other specifications, such as RDFa, may place additional restrictions on the allowed values of certain attributes. 


Thanks,

Eliot</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>39613</commentid>
    <comment_count>3</comment_count>
    <who name="Eliot Graff">eliotgra</who>
    <bug_when>2010-09-27 21:38:41 +0000</bug_when>
    <thetext>Manu,

Unless you have other objections, I believe that the changes made satisfy your concerns and I&apos;m resolving this bug. Thanks so very much.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40527</commentid>
    <comment_count>4</comment_count>
    <who name="Manu Sporny">msporny</who>
    <bug_when>2010-10-04 02:47:57 +0000</bug_when>
    <thetext>Sorry it took so long to get back to you...

(In reply to comment #2)
&gt; Manu,
&gt; 
&gt; Do you have someplace specific for me to point to to reference the last
&gt; sentence of this change?

No need to point to anything specific there, imho.

&gt; I&apos;ve updated the spec to read as such:
&gt; 
&gt; 6.2.3 Attribute Values
&gt; ...
&gt; Because XML is case sensitive, polyglot markup also requires case to be
&gt; consistent for values between markup, DOM APIs, and CSS. In addition, polyglot
&gt; markup respects the case sensitivity of all other attribute values. Although
&gt; polyglot markup must always have lowercase values of the attributes in the
&gt; following list when they exist on HTML elements, attributes not in this list
&gt; and attributes on non-HTML elements may have values made of mixed case letters.
&gt; Note that other specifications, such as RDFa, may place additional restrictions
&gt; on the allowed values of certain attributes. 

Hmm... so @rel is in the attribute values list of attributes that must have lower-cased attribute values and it is also an RDFa attribute that requires case to be preserved. The text that you have states that for @rel: &quot;polyglot markup must always have lowercase values of the attributes in the following list when they exist&quot; - nothing in that paragraph seems to indicate that case must be preserved for attribute values in @rel for Polyglot documents.

It almost seems as if you&apos;re saying - you must lower-case attribute values for @rel. In other words, the following markup:

This document conforms to the &lt;a vocab=&quot;http://purl.org/dc/terms/&quot;
rel=&quot;conformsTo&quot; href=&quot;http://www.w3.org/TR/html5/&quot;&gt;HTML5&lt;/a&gt; standard.

should express rel=&quot;conformsTo&quot; as rel=&quot;conformsto&quot; per Polyglot markup. What bit of the text that you added prevents that from happening, as it&apos;s not that clear to me?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40792</commentid>
    <comment_count>5</comment_count>
    <who name="Eliot Graff">eliotgra</who>
    <bug_when>2010-10-07 18:49:10 +0000</bug_when>
    <thetext>(In reply to comment #4)
&gt; What bit of the text that you added prevents that from happening, as it&apos;s not that clear to me?

Would this note suffice?

Note that polyglot markup is case-consistent for values on the &lt;code&gt;rel&lt;/code&gt; attribute. This is because XML treats the following as two different values for the &lt;code&gt;rel&lt;/code&gt; attribute:

&lt;a rel=friend href=&quot;http://www.friendlysite.com/&quot;&gt;My buddy&lt;/a&gt;
&lt;a rel=FRIEND href=&quot;http://www.friendlysite.com/&quot;&gt;My buddy&lt;/a&gt;

Thanks,

Eliot</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40827</commentid>
    <comment_count>6</comment_count>
    <who name="Manu Sporny">msporny</who>
    <bug_when>2010-10-08 00:13:01 +0000</bug_when>
    <thetext>(In reply to comment #5)
&gt; (In reply to comment #4)
&gt; &gt; What bit of the text that you added prevents that from happening, as it&apos;s not that clear to me?
&gt; 
&gt; Would this note suffice?
&gt; 
&gt; Note that polyglot markup is case-consistent for values on the &lt;code&gt;rel&lt;/code&gt;
&gt; attribute. This is because XML treats the following as two different values for
&gt; the &lt;code&gt;rel&lt;/code&gt; attribute:
&gt; 
&gt; &lt;a rel=friend href=&quot;http://www.friendlysite.com/&quot;&gt;My buddy&lt;/a&gt;
&gt; &lt;a rel=FRIEND href=&quot;http://www.friendlysite.com/&quot;&gt;My buddy&lt;/a&gt;

That&apos;s great, Eliot - works for me.

I&apos;m going to ask the RDFa WG to look at this issue and make sure that they agree with the text. If you haven&apos;t heard back from us in 7 days, please RESOLVE this bug and assume that we&apos;re fine with the text above.

Thanks for all of the hard work on the Polyglot spec - your time and energy are very much appreciated. :)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40835</commentid>
    <comment_count>7</comment_count>
    <who name="Toby Inkster">mail</who>
    <bug_when>2010-10-08 09:53:15 +0000</bug_when>
    <thetext>Given that neither of HTML 5 nor XHTML 5 require rel values to be lowercased, I can&apos;t see why the polyglot spec (which aims to help authors write documents that conform to both) should require it.

i.e. rel=&quot;FRIEND&quot; and rel=&quot;friend&quot; are considered equivalent under HTML 5 and XHTML 5.

A generalised XML processor with no special knowledge of XHTML will have problems of course, but merely lowercasing the attribute won&apos;t help such a processor. Consider rel=&quot;friend met&quot; versus rel=&quot;met friend&quot; which are equivalent under HTML and XHTML rules, but a generalised XML processor won&apos;t treat as equivalent.

Note also that this bug should also cover rev.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>40845</commentid>
    <comment_count>8</comment_count>
    <who name="Shane McCarron">shane</who>
    <bug_when>2010-10-08 14:53:06 +0000</bug_when>
    <thetext>I agree with Toby.  But, if the editor feels strongly that these attributes need to be mentioned at all, then please ensure that the case of the input is preserved in the DOM.  If &apos;case-consistent&apos; means that, then I am happy.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41088</commentid>
    <comment_count>9</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2010-10-12 11:58:15 +0000</bug_when>
    <thetext>(In reply to comment #7)
&gt; Given that neither of HTML 5 nor XHTML 5 require rel values to be lowercased, I
&gt; can&apos;t see why the polyglot spec (which aims to help authors write documents
&gt; that conform to both) should require it.

Indeed. The polyglot doc should just document inferences from normative documents. If the inferences are inconvenient, the documents from which the inferences are drawn should be changed if anything is changed.

In this case, the appropriate change would be changing RDFa not to expect case-sensitivity in rel.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41133</commentid>
    <comment_count>10</comment_count>
    <who name="Toby Inkster">mail</who>
    <bug_when>2010-10-12 16:11:21 +0000</bug_when>
    <thetext>For what it&apos;s worth, it&apos;s not just RDFa that is broken by this recommendation. The HTML5 and Microdata draft specs both make use of case-sensitive tokens in @rel in some places.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41164</commentid>
    <comment_count>11</comment_count>
    <who name="Eliot Graff">eliotgra</who>
    <bug_when>2010-10-12 20:00:53 +0000</bug_when>
    <thetext>&gt; Indeed. The polyglot doc should just document inferences from normative
&gt; documents. If the inferences are inconvenient, the documents from which the
&gt; inferences are drawn should be changed if anything is changed.
&gt; 
&gt; In this case, the appropriate change would be changing RDFa not to expect
&gt; case-sensitivity in rel.

Manu, Henri, Toby, Shane, et al.

I think I hear a couple of different things from the last volley of comments. I do not have a strong opinion one way or another, but I would like to have some consensus on this. Before we run down a rabbit hole on these specific instances, though, can we start by looking at the current spec? Section 6.3.3 opens with this statement:

[[
Polyglot markup uses lowercase letters for the values of the attributes in the following list when they exist on HTML elements.
]]

And has this statement in Section 6.3.3, right before the list of attributes whose values must be lowercase when used in HTML:

[[
Note that other specifications, such as RDFa, may place additional restrictions on the allowed values of certain attributes. 
]]

Do these satisfy the need to respect case-sensitivity from other places? Are there other sentences that you would like to see rewritten to strngthen that notion?

I am open to suggestion here.

Thanks for your help.

Eliot</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41173</commentid>
    <comment_count>12</comment_count>
    <who name="Toby Inkster">mail</who>
    <bug_when>2010-10-12 21:20:01 +0000</bug_when>
    <thetext>&quot;additional restrictions&quot; implies that the polyglot restrictions still applies, what is needed is language that states that other specifications can relax or remove the polyglot restrictions.

But a bigger issue is over why this restriction is in the polyglot spec at all. Polyglot is supposed to be a set of rules derived from looking at the intersection of HTML and XHTML syntax. Neither HTML nor XHTML requires rel or rev values to be lower-cased.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41234</commentid>
    <comment_count>13</comment_count>
    <who name="Shane McCarron">shane</who>
    <bug_when>2010-10-13 15:43:07 +0000</bug_when>
    <thetext>I agree with Toby - certainly my preferred approach would be to remove mention of rel and rev from this section altogether.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>41848</commentid>
    <comment_count>14</comment_count>
    <who name="Eliot Graff">eliotgra</who>
    <bug_when>2010-10-29 20:35:48 +0000</bug_when>
    <thetext>After careful consideration, I am making changes to section 6.3.3 of the polyglot spec. I believe that these edits will satisfy both Manu&apos;s original concerns and those that arose later in this thread. I am therefore going to close this bug after I publish the following:

]]
Polyglot markup requires the case used for characters in the values of the following attributes to be consistent between markup, DOM APIs, and CSS 
when these attributes are used on HTML elements. This is because XML is case sensitive, but the values of these attributes are treated as case insensitive in HTML when matched via CSS selectors (See &lt;a href=&quot;http://dev.w3.org/html5/spec/links.html#selectors&quot;&gt;4.14.1 Case-sensitivity&lt;/a&gt;, in the HTML5 specification). [[!HTML5]] In addition, polyglot markup respects the case sensitivity of all other attribute values and for non-ASCII characters in the values of the attributes listed. Note that other specifications, such as RDFa, may place additional restrictions on the allowed values of certain attributes. 
[[

I think that this satisfies all of the requests, and so I am going to resolve this bug.

Thanks, everyone, for all of your help and feedback.

Eliot</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>