13469 – Enable Web page authors to override text/IRI content

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13469 - Enable Web page authors to override text/IRI content

Summary: Enable Web page authors to override text/IRI content

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	LC1 HTML Microdata (editor: Ian Hickson) (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P3 normal
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-07-30 16:07 UTC by Manu Sporny
Modified:	2011-08-21 18:45 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description Manu Sporny 2011-07-30 16:07:33 UTC

This feedback is filed as a personal comment and is not intended to be any sort
of official feedback from any standards working group.

Currently, Microdata does not allow authors to override text content. Overriding IRI content is less important, but supporting that would be nice as well. 

http://manu.sporny.org/2011/uber-comparison-rdfa-md-uf/#text-iri-override

Microdata does provide a mechanism to express "hidden" content via <meta> and <link>, but neither of these mechanisms associate if the data expressed is overriding something on the page. For example, you can do this in RDFa:

<span property="weight" content="15" datatype="kilograms">fifteen kilos</span>

This is good for at least three reasons:

1. It allows the author to specify what part of the page is being overridden.

2. It enables us to automatically build reasoning maps - for example, a machine could know that one way to interpret the string "fifteen kilos" is as a unit of measurement - weight, with a specific datatype of "kilograms".

3. It enables search engines, and other crawlers, to detect if the data that is being expressed in Microdata/RDFa on the page matches up with what is visible to the people looking at the page. You cannot do that with <meta> and <link> - which leaves <meta> and <link> more open to abuse by overzealous SEO folks.

I suggest that Microdata remove the capability to use <meta> and <link> from the places it expanded their scope and instead drop back to an attribute value, such as @itemvalue.

Comment 1 Philip Jägenstedt 2011-07-30 22:43:37 UTC

This would be resolved by bug 13240 if you want to comment over there.

Comment 2 Ian 'Hixie' Hickson 2011-08-02 07:16:21 UTC

This wouldn't actually be solved by bug 13240, because <data> would still not associate the DOM contents with the value.

There's no difference in microdata between these two:

   <div itemscope>
     <h1 itemprop=a>bla<em itemprop=b>foo</em>bla</h1>
   </div>

   <div itemscope>
     <meta itemprop=a content="blafoobla">
     <meta itemprop=b content=foo>
   </div>

These are _identical_ from the microdata perspective. The fact that in one some text is shared between properties, that some text is emphasised, that there are elements, etc, is irrelevant to the microdata model.

There would similarly be absolutely no difference between these three:

   <data itemprop=a value=foo>bar</data>
   <data itemprop=a value=foo>quux</data>
   <data itemprop=a value=foo></data>

They all convey _exactly_ the same microdata semantics.


Regarding the original request's reasons:

1. What is the use case for this? It doesn't seem necessary. There is nothing in microdata like this currently (other than top-level microdata items being associated with a particular element); why would adding this help?

2. This is already possible, just make sure the vocabulary defines the units. (We learnt from scheme="" on <meta> that having this associated with the data rather than the definition of the property is bad design.)

3. The assertion in this reason seems false. It does not permit such checking as far as I can tell. If it did, the whole feature wouldn't be necessary at all.


EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: No compelling use cases.

Comment 3 Michael[tm] Smith 2011-08-04 05:05:44 UTC

mass-move component to LC1

Comment 4 Manu Sporny 2011-08-09 00:37:29 UTC

The first part of your response misses the point of the use cases that I described. The goal isn't to remove markup tags, the goal is to override one set of text with another and be able to specify the DOM element whose contents are overridden.

"<data itemprop=a value=foo>quux</data>"

Is that valid HTML5 markup? I couldn't find the data element nor could I find the rules for processing the DATA element in the Microdata specification. If links to those two things exist, that element and respective Microdata processing rules address the use case I was concerned about.

"1. What is the use case for this? It doesn't seem necessary. There is nothing
in microdata like this currently (other than top-level microdata items being
associated with a particular element); why would adding this help?"

It would enable two use cases:

1. Allow search engine companies to determine if the data from the text override on the page matches the text that was overridden, thus providing an additional tool to combat nasty SEO practices. Hidden data is bad, so at least allowing a spider to examine if the hidden data and the non-hidden data match up is a good thing. For example if the value "14" is the override and "fourteen" is the overridden text, then a spider can reason that a) the data matches the text and b) the text was shown on the page at some point so is less likely to be data spam.

2. It would allow debuggers, like the one built into Google Chrome, to highlight the sections of a page that particular pieces of data came from down to the exact span of text on the page.

"2. This is already possible, just make sure the vocabulary defines the units.
(We learnt from scheme="" on <meta> that having this associated with the data
rather than the definition of the property is bad design.)"

There are issues with this approach, see:

http://www.w3.org/Bugs/Public/show_bug.cgi?id=13465#c4

"3. The assertion in this reason seems false. It does not permit such checking
as far as I can tell. If it did, the whole feature wouldn't be necessary at
all."

I explain this a bit more above - do you understand the intent now?

Comment 5 Ian 'Hixie' Hickson 2011-08-15 04:44:55 UTC

<data> is a proposal, see bug 13240.

> 1. Allow search engine companies to determine if the data from the text
> override on the page matches the text that was overridden, thus providing an
> additional tool to combat nasty SEO practices. Hidden data is bad, so at least
> allowing a spider to examine if the hidden data and the non-hidden data match
> up is a good thing. For example if the value "14" is the override and
> "fourteen" is the overridden text, then a spider can reason that a) the data
> matches the text and b) the text was shown on the page at some point so is less
> likely to be data spam.

This is false. If it was possible for spiders to do this kind of reasoning in the first place, then spiders wouldn't need microdata. Indeed if tools were to decide whether or not to trust the microdata in a page based on non-microdata on the page, that tool would be in violation of the microdata processing rules.


> 2. It would allow debuggers, like the one built into Google Chrome, to
> highlight the sections of a page that particular pieces of data came from down
> to the exact span of text on the page.

The piece of data in the proposal would come from the attribute, not the element's contents. It certainly would allow a development tool to highlight the (possibly unrelated) contents of the element that happened to have the attribute, but that's not especially more useful than highlighting a <meta> element immediately before that same content, which is possible today.


EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Use cases are still not compelling.

Comment 6 Manu Sporny 2011-08-21 18:45:28 UTC

> This is false. If it was possible for spiders to do this kind of reasoning in
> the first place, then spiders wouldn't need microdata. Indeed if tools were to
> decide whether or not to trust the microdata in a page based on non-microdata
> on the page, that tool would be in violation of the microdata processing rules.

I don't understand why you wrote the first and second sentence above - please elaborate on what you mean.

The third sentence seems to be a layer violation - what a higher-level application does with Microdata and non-Microdata on the page has nothing to do with the Microdata processing rules. I don't remember reading anything that limited what a higher-level application does with the Microdata in a page in the Microdata spec. Do you have a link to that language?

>> 2. It would allow debuggers, like the one built into Google Chrome, to
>> highlight the sections of a page that particular pieces of data came from 
>> down to the exact span of text on the page.
>
> The piece of data in the proposal would come from the attribute, not the
> element's contents. It certainly would allow a development tool to highlight
> the (possibly unrelated) contents of the element that happened to have the
> attribute, but that's not especially more useful than highlighting a <meta>
> element immediately before that same content, which is possible today.

Having a WYSIWYG editor that highlighted the exact text that is associated with the word "fourteen" and the data "14" retrieved from the page would be helpful. Having a spider understand that the on-page data doesn't deviate from the "hidden" data in the page would also be useful. The fact remains that <meta> does not allow you to do that.

The resolution does not address my concerns, but I will not be pursuing the matter further as this shortcoming has been raised repeatedly and the answer has always been the same.