This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10167 - HTML5 Polyglot spec breaks RDFa case sensitivity
Summary: HTML5 Polyglot spec breaks RDFa case sensitivity
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML/XHTML Compat. Authoring Guide (ed: Eliot Graff) (show other bugs)
Version: unspecified
Hardware: All All
: P2 critical
Target Milestone: FPWD
Assignee: Eliot Graff
QA Contact: HTML WG Bugzilla archive list
URL: http://www.w3.org/TR/2010/WD-html-pol...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-07-14 16:01 UTC by Manu Sporny
Modified: 2010-10-29 20:35 UTC (History)
8 users (show)

See Also:


Attachments

Description Manu Sporny 2010-07-14 16:01:34 UTC
The polyglot spec currently states the following:

http://www.w3.org/TR/2010/WD-html-polyglot-20100624/#attribute-values

[[[
Polyglot markup uses lowercase letters for the values of the attributes in the following list when they exist on HTML elements. More specifically, where required, polyglot markup must use lower case letters for all ASCII letters in these attribute values; however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. Attributes for HTML elements other than those in the following list may have values made of mixed case letters. All attributes on non-HTML elements may have values made of mixed case letters.
]]]

This means that authors won't be able to use case-sensitive vocabulary terms in RDFa 1.1, which is a bad thing. Take the following code snippet as an example:

This document conforms to the <a vocab="http://purl.org/dc/terms/" rel="http://www.w3.org/TR/html5/conformsTo" href="http://www.w3.org/TR/html5/">HTML5</a> standard.

If that's not convincing, the same would apply to URLs:

This document conforms to the <a rel="http://purl.org/dc/terms/conformsTo" href="http://www.w3.org/TR/html5/">HTML5</a> standard.

Based on the rules above, the author would be forced to lower-case the URL, which would create the following triple:

<> <http://purl.org/dc/terms/conformsto> <http://www.w3.org/TR/html5> .

Note that the predicate URL is lower-cased, which is a meaningless predicate - it won't dereference to the correct machine-readable URL. This issue can be resolved by adding text with something to this effect:

"However, attribute values that are designed to be case sensitive, like certain RDFa predicate values or URLs placed in @rel and @rev MUST be specified in a case sensitive manner."

You could also resolve the issue by stating that only enumerated attribute values MUST be lowercased, all other attribute values MUST preserve case.
Comment 1 Manu Sporny 2010-07-14 16:08:03 UTC
This markup is wrong:

> This document conforms to the <a vocab="http://purl.org/dc/terms/"
> rel="http://www.w3.org/TR/html5/conformsTo"
> href="http://www.w3.org/TR/html5/">HTML5</a> standard.

it should be this:

This document conforms to the <a vocab="http://purl.org/dc/terms/" rel="conformsTo" href="http://www.w3.org/TR/html5/">HTML5</a> standard.
Comment 2 Eliot Graff 2010-09-09 00:26:16 UTC
Manu,

Do you have someplace specific for me to point to to reference the last sentence of this change?

I've updated the spec to read as such:

6.2.3 Attribute Values

Polyglot markup uses lowercase letters for the values of the attributes in the following list when they exist on HTML elements. More specifically, where required, polyglot markup must use lower case letters for all ASCII letters in these attribute values; however, case requirements do not apply to non-ASCII letters such as Greek, Cyrillic, or non-ASCII Latin letters. For attribute values on HTML elements other than those in the following list, polyglot markup may use mixed case letters.

Because XML is case sensitive, polyglot markup also requires case to be consistent for values between markup, DOM APIs, and CSS. In addition, polyglot markup respects the case sensitivity of all other attribute values. Although polyglot markup must always have lowercase values of the attributes in the following list when they exist on HTML elements, attributes not in this list and attributes on non-HTML elements may have values made of mixed case letters. Note that other specifications, such as RDFa, may place additional restrictions on the allowed values of certain attributes. 


Thanks,

Eliot
Comment 3 Eliot Graff 2010-09-27 21:38:41 UTC
Manu,

Unless you have other objections, I believe that the changes made satisfy your concerns and I'm resolving this bug. Thanks so very much.
Comment 4 Manu Sporny 2010-10-04 02:47:57 UTC
Sorry it took so long to get back to you...

(In reply to comment #2)
> Manu,
> 
> Do you have someplace specific for me to point to to reference the last
> sentence of this change?

No need to point to anything specific there, imho.

> I've updated the spec to read as such:
> 
> 6.2.3 Attribute Values
> ...
> Because XML is case sensitive, polyglot markup also requires case to be
> consistent for values between markup, DOM APIs, and CSS. In addition, polyglot
> markup respects the case sensitivity of all other attribute values. Although
> polyglot markup must always have lowercase values of the attributes in the
> following list when they exist on HTML elements, attributes not in this list
> and attributes on non-HTML elements may have values made of mixed case letters.
> Note that other specifications, such as RDFa, may place additional restrictions
> on the allowed values of certain attributes. 

Hmm... so @rel is in the attribute values list of attributes that must have lower-cased attribute values and it is also an RDFa attribute that requires case to be preserved. The text that you have states that for @rel: "polyglot markup must always have lowercase values of the attributes in the following list when they exist" - nothing in that paragraph seems to indicate that case must be preserved for attribute values in @rel for Polyglot documents.

It almost seems as if you're saying - you must lower-case attribute values for @rel. In other words, the following markup:

This document conforms to the <a vocab="http://purl.org/dc/terms/"
rel="conformsTo" href="http://www.w3.org/TR/html5/">HTML5</a> standard.

should express rel="conformsTo" as rel="conformsto" per Polyglot markup. What bit of the text that you added prevents that from happening, as it's not that clear to me?
Comment 5 Eliot Graff 2010-10-07 18:49:10 UTC
(In reply to comment #4)
> What bit of the text that you added prevents that from happening, as it's not that clear to me?

Would this note suffice?

Note that polyglot markup is case-consistent for values on the <code>rel</code> attribute. This is because XML treats the following as two different values for the <code>rel</code> attribute:

<a rel=friend href="http://www.friendlysite.com/">My buddy</a>
<a rel=FRIEND href="http://www.friendlysite.com/">My buddy</a>

Thanks,

Eliot
Comment 6 Manu Sporny 2010-10-08 00:13:01 UTC
(In reply to comment #5)
> (In reply to comment #4)
> > What bit of the text that you added prevents that from happening, as it's not that clear to me?
> 
> Would this note suffice?
> 
> Note that polyglot markup is case-consistent for values on the <code>rel</code>
> attribute. This is because XML treats the following as two different values for
> the <code>rel</code> attribute:
> 
> <a rel=friend href="http://www.friendlysite.com/">My buddy</a>
> <a rel=FRIEND href="http://www.friendlysite.com/">My buddy</a>

That's great, Eliot - works for me.

I'm going to ask the RDFa WG to look at this issue and make sure that they agree with the text. If you haven't heard back from us in 7 days, please RESOLVE this bug and assume that we're fine with the text above.

Thanks for all of the hard work on the Polyglot spec - your time and energy are very much appreciated. :)
Comment 7 Toby Inkster 2010-10-08 09:53:15 UTC
Given that neither of HTML 5 nor XHTML 5 require rel values to be lowercased, I can't see why the polyglot spec (which aims to help authors write documents that conform to both) should require it.

i.e. rel="FRIEND" and rel="friend" are considered equivalent under HTML 5 and XHTML 5.

A generalised XML processor with no special knowledge of XHTML will have problems of course, but merely lowercasing the attribute won't help such a processor. Consider rel="friend met" versus rel="met friend" which are equivalent under HTML and XHTML rules, but a generalised XML processor won't treat as equivalent.

Note also that this bug should also cover rev.
Comment 8 Shane McCarron 2010-10-08 14:53:06 UTC
I agree with Toby.  But, if the editor feels strongly that these attributes need to be mentioned at all, then please ensure that the case of the input is preserved in the DOM.  If 'case-consistent' means that, then I am happy.
Comment 9 Henri Sivonen 2010-10-12 11:58:15 UTC
(In reply to comment #7)
> Given that neither of HTML 5 nor XHTML 5 require rel values to be lowercased, I
> can't see why the polyglot spec (which aims to help authors write documents
> that conform to both) should require it.

Indeed. The polyglot doc should just document inferences from normative documents. If the inferences are inconvenient, the documents from which the inferences are drawn should be changed if anything is changed.

In this case, the appropriate change would be changing RDFa not to expect case-sensitivity in rel.
Comment 10 Toby Inkster 2010-10-12 16:11:21 UTC
For what it's worth, it's not just RDFa that is broken by this recommendation. The HTML5 and Microdata draft specs both make use of case-sensitive tokens in @rel in some places.
Comment 11 Eliot Graff 2010-10-12 20:00:53 UTC
> Indeed. The polyglot doc should just document inferences from normative
> documents. If the inferences are inconvenient, the documents from which the
> inferences are drawn should be changed if anything is changed.
> 
> In this case, the appropriate change would be changing RDFa not to expect
> case-sensitivity in rel.

Manu, Henri, Toby, Shane, et al.

I think I hear a couple of different things from the last volley of comments. I do not have a strong opinion one way or another, but I would like to have some consensus on this. Before we run down a rabbit hole on these specific instances, though, can we start by looking at the current spec? Section 6.3.3 opens with this statement:

[[
Polyglot markup uses lowercase letters for the values of the attributes in the following list when they exist on HTML elements.
]]

And has this statement in Section 6.3.3, right before the list of attributes whose values must be lowercase when used in HTML:

[[
Note that other specifications, such as RDFa, may place additional restrictions on the allowed values of certain attributes. 
]]

Do these satisfy the need to respect case-sensitivity from other places? Are there other sentences that you would like to see rewritten to strngthen that notion?

I am open to suggestion here.

Thanks for your help.

Eliot
Comment 12 Toby Inkster 2010-10-12 21:20:01 UTC
"additional restrictions" implies that the polyglot restrictions still applies, what is needed is language that states that other specifications can relax or remove the polyglot restrictions.

But a bigger issue is over why this restriction is in the polyglot spec at all. Polyglot is supposed to be a set of rules derived from looking at the intersection of HTML and XHTML syntax. Neither HTML nor XHTML requires rel or rev values to be lower-cased.
Comment 13 Shane McCarron 2010-10-13 15:43:07 UTC
I agree with Toby - certainly my preferred approach would be to remove mention of rel and rev from this section altogether.
Comment 14 Eliot Graff 2010-10-29 20:35:48 UTC
After careful consideration, I am making changes to section 6.3.3 of the polyglot spec. I believe that these edits will satisfy both Manu's original concerns and those that arose later in this thread. I am therefore going to close this bug after I publish the following:

]]
Polyglot markup requires the case used for characters in the values of the following attributes to be consistent between markup, DOM APIs, and CSS 
when these attributes are used on HTML elements. This is because XML is case sensitive, but the values of these attributes are treated as case insensitive in HTML when matched via CSS selectors (See <a href="http://dev.w3.org/html5/spec/links.html#selectors">4.14.1 Case-sensitivity</a>, in the HTML5 specification). [[!HTML5]] In addition, polyglot markup respects the case sensitivity of all other attribute values and for non-ASCII characters in the values of the attributes listed. Note that other specifications, such as RDFa, may place additional restrictions on the allowed values of certain attributes. 
[[

I think that this satisfies all of the requests, and so I am going to resolve this bug.

Thanks, everyone, for all of your help and feedback.

Eliot