User:Tantekelik/data element

From W3C Wiki


<data> element change proposal

Summary

Add the <data> element to HTML5 as it was defined in the recent edit made by the editor.

The data element was discussed in the HTML Working Group on 2011-11-03, and there was a rough consensus in support of adding a data element.

This is a change proposal to address:

Rationale

Use-cases: The data element is quite useful, for publishers wanting to express numerous kinds of information where a machine readable format is different from the same information expressed in a human readable format.

Note that the use of machine-readable data that is separate from human readable data should be an exception and not the rule (see below about DRY violation risk), thus it is advantageous to provide an explicit element "<data>" for this purpose to make it obvious when such exceptions are being made.

Advantages over alternatives:

  • <data> over <meta/>
    • Authors prefer containment over adjacency. Currently it is possible to use the meta element to provide machine-readable information adjacent to a human-readable equivalent, e.g.:

      <meta class="p-cost" itemprop="cost" content="99USD" /> $99

      However web authors/developers have either resisted this method or found it uncomfortable enough to not adopt it. This makes sense as their existing experience with microformats (and typical microdata or RDFa) re-uses existing elements or introduces span elements to contain the human readable data. Thus with this proposal they would write the following instead:

      <data class="p-cost" itemprop="cost" value="99USD"> $99

  • <data> over a global content/value/itempropvalue attribute
    • More explicit. As stated above, a more explicit way of marking up when the content is in a particular attribute is easier to recognize (maintainability) and more obvious, both of which are good qualities that tend to help data quality which is at risk due to the DRY violation of duplicating machine-data vs human-data.
    • Similar to other microformats/microdata element-specific patterns. Similar to how microformats and microdata have special parsing rules for specific elements like abbr (prefer title attribute if present), a (use href), img (using src and/or alt attributes), a new data element with the special parsing rule to use the value attribute fits in with this existing design that authors are used to. A global attribute operates differently and is potentially a third way that authors would have to keep in mind as a potential source of data.
    • Less possibility of confusion with multiple sources of data. Right now the value of a Microdata property can come from three different sources, depending on context - the nested Microdata item if the element has itemscope as well, a special attribute on some elements, or the textual content of the element. They are resolved in that order, which is somewhat logical. Adding a fourth source of values, and one without an intuitively obvious location in the existing hierarchy, will end up confusing authors. As an example, what's the value of the "foo" property on <img itemprop=foo src=bar content=baz>? What about <span itemprop=foo content=baz itemscope>? Pulling it all together, what about <a itemprop=foo href=bar content=baz itemscope>?

Details

  • Introduce a <data> element with a 'value' attribute.
  • Provide generic language directing semantic markup extensions to utilize the value attribute (if present) of the data element when extracting semantic information from the element (perhaps similar to the related generic language in the Drag and Drop section). Specific wording left up to the editor's discretion.
  • Provide one or more examples that show use with microformats, microdata, and RDFa, without preferring one over another. Prefer use of openly developed vocabularies/URLs (e.g. microformats.org, whatwg.org, w3.org) rather than those developed by one company (or just a few companies) like schema.org.

Additional details as defined by the editor.

Impact

Positive Effects

  • Provides an element for better use with microformats that express things like measured quantities, currency amounts, ratings, etc.

Negative Effects

  • One more element. Each additional element adds complexity and incremental cognitive load to learning and using the language.

Conformance Classes

As defined by the editor.

Risks

  • Potential DRY violation/rot. In cases where the author publishes the machine-readable information in addition to the human-readable element content, there is a chance of drift between the two and the machine data rotting since the human-readable element content is seen by more people and more likely to get fixed when necessary.

Other Proposals

Rebuttals to other proposal(s):

  • Change Proposal for ISSUE-184: Enhance the element with a type system AKA counter-proposal
    • Counter-proposal is unimplementable. As noted by the chairs ([1], [2]), the counter-proposal as of 2012-05-15 is unimplementable:
      • "Details are still not clear enough to unambiguously apply"
      • "Trying to imagine myself in the role of editor, I do not believe I would be able to determine the correct edits to make to the spec"
      • "no parsing rules or format definitions for the different types are specified"
      • "doesn't seem to identify a clear use case for adding a type attribute to <output>"
    • Counter-proposal 'type' attribute is unnecessary. All the use-cases/examples mentioned are already handled by the existing <time> element which has broad consensus, is already defined in the spec, and growing in actual use on the web, as well as supported by tools.
    • Counter-proposal 'type' attribute requires more typing. Using the <time> element requires less typing than using a <data type="????"> element. Nevermind the added authoring problem of having to figure out the 'type' attribute which is the next point:
    • Counter-proposal 'type' attribute hurts authorability of date/time information as compared to the existing <time> element. For example compare:

      1. <time>2012-05</time>

      and

      2. <data type="month">2012-05</data>

      The first is much easier to author, the time element implementation automatically knows how to parse and provide the year and month without being explicitly told. The second requires that the author do the extra cognitive work of thinking about what kind of date they are providing, and then having to look up somewhere in a list (undefined) of possible 'type' values and try to match that to what the author needs. This extra step is unnecessary and an undesirable burden on authors.
    • Counter-proposal 'type' attribute is error prone due to excessively precise typing. Given the two examples above again, note that the second example depends on getting the type exactly right. Web authors will easily forget and perhaps instead put something like:

      3. <data type="date">2012-05</data>

      thus unnecessarily invalidating what they mean (a year and month vs a precise year month and day). Or the data itself may change over time (as web pages often do), e.g. the author may go in and when they have more precise information, add the day to the visible data, but forget about updating any invisible meta data/types etc. because it doesn't seem to affect anything on the page, and end up with (again iterated from the above examples):

      1a. <time>2012-05-29</time>

      and

      2a. <data type="month">2012-05-29</data>

      In this case the <time> element works great - all the smarts to handle this increasing precision of data are in the browser's parsing of the time element. However 2a breaks because the author means to specify a precise day, yet the now out of date (and prematurely excessively precise) 'type' attribute gets in the way, and inaccurately continues to convey only a month granularity. The opposite error can occur as well. That is an author could start with:

      4. <data type="date">2012-06-30</data>

      and then later decide they were unsure about the day, and remove it:

      4a. <data type="date">2012-06</data>

      resulting in an error - from the 'type' the parser expects a precise date, doesn't see one. Whatever error happens it can't be what the author expected, which is that it "just work" when they update the date precision, which does just work with the equivalent time elements:

      5. <time>2012-06-30</time>

      and later:

      5a. <time>2012-06</time>

    • DRY violation. Any time a system ends up requiring such a "parallel change" as the above examples show - it's a sign of a DRY violation. By simple matter of the fact that the proposed 'type' attribute in practice would require parallel changes to the 'type' attribute when the information itself changes in precision is a form of DRY violation and known to be an anti-pattern in terms of data quality.
    • Lack of support for counter-proposal. When asked at the recent 2012 May f2f in Mountain View, there was *no one* there that expressed any support at all for the counter-proposal. Even online, very little interest (if any) has been shown in the counter-proposal beyond the author.
    • Broadening consensus of original proposal among participants. There was discussion of, support for, and the lack of any objections to, the original proposal (this page), and thus consensus among all those that attended and participated at the recent f2f.
    • No follow-up on above criticisms. Most of the above arguments were communicated over a month ago in the #html-wg IRC channel during and immediately following an HTMLWG teleconference. No responses/follow-ups/rebuttals to any of the arguments have been made. The rebuttals against the counter-proposal are archived here to help document why adding a 'type' attribute to the data element is a bad idea in the hopes of both resolving this currently, and providing reasoning to discourage future such proposals.

References

References are inline.

Contributors

Other collaborators welcome!

Related