This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18766 - I'm not a fan of this spec. It lacks expressive power where needed. Specifically, no effort was made to establish parity with the expressiveness of JSON, the format that Microdata would be an ideal successor of. It would seem logical to position Microdata
Summary: I'm not a fan of this spec. It lacks expressive power where needed. Specifica...
Status: RESOLVED NEEDSINFO
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard: *
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-01 07:33 UTC by contributor
Modified: 2013-03-20 20:14 UTC (History)
2 users (show)

See Also:


Attachments

Description contributor 2012-09-01 07:33:25 UTC
Specification: http://www.w3.org/TR/2011/WD-microdata-20110525/
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

Comment:
I'm not a fan of this spec. It lacks expressive power where needed.
Specifically, no effort was made to establish parity with the expressiveness
of JSON, the format that Microdata would be an ideal successor of. It would
seem logical to position Microdata as the ideal payload format for AJAX
requests in rich internet applications that are based on progressive
enhancement techniques (e.g. using your sites own Microdata-annotated HTML web
pages as your data interchange layer for a JavaScript application instead of a
separate non-accessible JSON-based set of dedicated web services).

Areas needing additional expressive capability:

You can't tell it to pull the value out of an attribute instead of out of the
content of an element, for example.

<div itemscope><a href="http://example.com" itemprop="name">Martha</a></div>

There is no way to get this to set name to "Martha" instead of the URL
"http://example.com" without changing the markup (e.g. wrapping the name in an
otherwise superfluous span element).

It would be very powerful to be able to specify the attribute to retrieve the
value from:

<div itemscope><abbr itemprop="name" itemattr="title" title="World Health
Organization">WHO</abbr></div>

This would set the "name" property to "World Health Organization" instead of
"WHO", but there is no attribute for "itemattr", so this syntax is illegal.
Instead you must repeat the text "World Health Organization" elsewhere on the
page, perhaps styled to be hidden, which is non-ideal.

It would then makes sense for <a>, <img> and other "special" elements to
behave consistently with the rest of the elements, having their content used
by default, unless an "itemattr" attribute is specified which explicitly
references a target attribute:

<div itemscope><a href="http://example.com/" itemattr="href"
itemprop="location">Example</a></div>

In the above proposed markup, it is the "itemattr" attribute that causes the
URL "http://example.com" to be used as the value for the "location" property
instead of the string "Example", which would have been used by default.

For elements lacking content (e.g. an img or link element), the "itemattr"
attribute would be required. For example:

<div itemscope><img itemprop="image" src="http://example.com"
alt="Example"></div>

Would be illegal because the img element doesn't have content to be assigned
to the "image" property. A "itemattr" attribute with either the value "alt" or
the value "src" would be required to make the above markup valid.

The way arrays are implemented is just awkward. Depending on how you look at
it, one of two things are true:

1. Every property is an array. There is no mechanism with which to express
either a scalar or an empty array.

2. Every property is either a scalar, or an array with at least two elements.
There is no mechanism with which to express an array with only one element or
an empty array. Additionally, the semantics of a property can suddenly change
when another property with the same name is inserted (i.e. scalars suddenly
become arrays, which is unintuitive and a likely source of errors).

If you use declare a schema, you can type scalar properties as Numeric or
Boolean. Ad hoc payloads with no schema can't define the types of scalars
because the itemtype attribute cannot be used without a itemscope attribute.
For example:

<div itemscope><span itemprop="weight"
itemtype="http://schema.org/Number">1.23e-4</span></div>

The above is illegal because the "weight" property is a scalar and an itemtype
attribute is provided. Removing itemtype causes the value "1.23e-4" to be
treated as a string rather than a parsed numeric value. This is true for other
scalar types as well, such as boolean.

This is especially odd since dates and URLs are typed scalars, although by
context rather than by an explicitly specified itemtype. Consider this
example:

<div itemscope><a itemprop="a" href="http://example.com">Hi</a><span
itemprop="b">Hi</span></div>

The markup structure for the properties "a" and "b" are identical. But because
the spec defines different semantics for the a element versus the span
element, the two properties are parsed and types differently. The property "a"
will be taken from the "href" attribute of the a element, and the property "b"
will be taken from the textual content of the span element. This is not
intuitive and requires document authors and maintainers to remember which tags
and attributes have special semantics.

There is no mechanism by which to express the value "null", although a hack
would be to set an itemtype of "http://example.com/Null" on an empty item, and
leave it up to the consuming software to do the substitution. This is
obviously non-deal, but could also be used to express the boolean values
"true" and "false", or other enumerated types' values.

A mechanism is provided by which a value can be expressed once, then
references by multiple items, avoiding repetition. For example:

<div itemscope itemref="a"></div><div itemscope><span itemprop="name"
id="a">George</span></div>

This markup defines two top level items, both of which have a property named
name with the value "George".

Unfortunately there is no way for these two items to use a different property
name for this value. For example, there is no way for the first item to define
the property "author" with the value "George", while the second item defines
the property "name" with the value "George", without introducing repetition:

<div itemscope><span itemprop="author">George</span></div><div itemscope><span
itemprop="name">George</span></div>

Ideally, itemref would allow property names to be specified for each
referenced property. For example:

<div itemscope itemref="a=author"></div><div itemscope><span itemprop="name"
id="a">George</span></div>

In the above illegal markup, the itemref defines not only the reference, but
also a property name (overriding the itemprop on the referenced item). This
proposed additional syntax provides additional expressiveness without
introducing an significant amount of syntactic complexity.

Posted from: 75.92.57.94
User agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:15.0) Gecko/20100101 Firefox/15.0
Comment 1 Ian 'Hixie' Hickson 2013-01-25 23:19:19 UTC
> no effort was made to establish parity with the expressiveness of JSON

Some effort was put into this, but it wasn't a requirement so the effort was quickly abandoned.


> the format that Microdata would be an ideal successor of

I don't think they're really related. Why would either replace the other?


> It would seem logical to position Microdata as the ideal payload format for
> AJAX requests in rich internet applications that are based on progressive
> enhancement techniques 

I don't think that would be logical at all. Microdata's use cases have nothing to do with this kind of thing. It wasn't designed to address that, and doesn't do a good job of it; I don't see any value in trying to coerce it into doing so.


> You can't tell it to pull the value out of an attribute instead of out of the
> content of an element, for example.

This is intentional. Simplicity was a key aspect of microdata's design; it leads to it being easier to use. Simplicity is more useful than power in many ways.


> <div itemscope><a href="http://example.com" itemprop="name">Martha</a></div>
> 
> There is no way to get this to set name to "Martha" instead of the URL
> "http://example.com" without changing the markup (e.g. wrapping the name in
> an otherwise superfluous span element).

I assume by "changing the markup" you mean adding or removing elements. (If by changing the markup you mean any change at all, then it's hard to see how any in-band solution could succeed at these requirements.)


> It would be very powerful to be able to specify the attribute to retrieve the
> value from:
> 
> <div itemscope><abbr itemprop="name" itemattr="title" title="World Health
> Organization">WHO</abbr></div>

It would be powerful in a sense, but in another, what would it achieve that you can't achieve today by adding new elements? I don't see much value in this.


> The way arrays are implemented is just awkward. Depending on how you look at
> it, one of two things are true:

The data model is explicitly defined:

# Each group is known as an item. Each item can have item types, a global 
# identifier (if the vocabulary specified by the item types support global 
# identifiers for items), and a list of name-value pairs. Each name in the name-
# value pair is known as a property, and each property has one or more values. 
# Each value is either a string or itself a group of name-value pairs (an item). 
# The names are unordered relative to each other, but if a particular name has 
# multiple values, they do have a relative order.


> If you use declare a schema, you can type scalar properties as Numeric or
> Boolean. Ad hoc payloads with no schema can't define the types of scalars
> because the itemtype attribute cannot be used without a itemscope attribute.

Correct. If this is a problem, why is it a problem?


> This is especially odd since dates and URLs are typed scalars, although by
> context rather than by an explicitly specified itemtype. Consider this
> example:
> 
> <div itemscope><a itemprop="a" href="http://example.com">Hi</a><span
> itemprop="b">Hi</span></div>
> 
> The markup structure for the properties "a" and "b" are identical.

No, one uses <a> and the other uses <span>. Quite different.



> This is not intuitive and requires document authors and maintainers to 
> remember whiche tags and attributes have special semantics.

I don't think in practice this is a big deal. The list is pretty obvious.


> There is no mechanism by which to express the value "null", although a hack
> would be to set an itemtype of "http://example.com/Null" on an empty item, and
> leave it up to the consuming software to do the substitution. This is
> obviously non-deal, but could also be used to express the boolean values
> "true" and "false", or other enumerated types' values.

In what situations are these real problems?


> A mechanism is provided by which a value can be expressed once, then
> references by multiple items, avoiding repetition. [...]
> Unfortunately there is no way for these two items to use a different property
> name for this value.

Yeah, this is a feature with definite limitations. It was the part that we had the most difficulty finding a way to make mostly usable. I'm open to adding more syntax in this area, but before doing so we should collect real-world examples of people being forced into writing suboptimal markup because of this limitation, so that we can make sure our solution addresses those cases. If you're interested in doing that, please file a separate bug with that data.
Comment 2 Ian 'Hixie' Hickson 2013-03-20 20:14:16 UTC
Please respond to the questions in comment 1 and reopen this bug. Thanks!