This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17933 - I'd like to provide feedback on the section on "Encoding Microdata", specifically the subsection on "Values". I believe it is crucial forthr usefulness of microdata that one is able to markup form fields as itemprops, and have the itemvalue correctly iden
Summary: I'd like to provide feedback on the section on "Encoding Microdata", specific...
Status: RESOLVED WONTFIX
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
: 20380 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-07-18 07:19 UTC by contributor
Modified: 2013-06-13 01:35 UTC (History)
6 users (show)

See Also:


Attachments
Google using microdata (68.56 KB, image/png)
2012-12-13 19:53 UTC, Brian Hazzard
Details

Description contributor 2012-07-18 07:19:28 UTC
This was was cloned from bug 16795 as part of operation convergence.
Originally filed: 2012-04-19 03:07:00 +0000

================================================================================
 #0   contributor@whatwg.org                          2012-04-19 03:07:00 +0000 
--------------------------------------------------------------------------------
Specification: http://www.w3.org/TR/microdata/
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

Comment:
I'd like to provide feedback on the section on "Encoding Microdata",
specifically the subsection on "Values".

I believe it is crucial forthr usefulness of microdata that one is able to
markup form fields as itemprops, and have the itemvalue correctly identified.

For example, an itemprop on an input element should have its itemvalue defined
as the element's value attribute.

For itemprops on select lists, the itemvalue should be the value attribute(s)
of the selected option(s).

This would make microdata useful for interactive application semantics as well
as static markup.

Posted from: 108.15.203.180
User agent: Mozilla/5.0 (iPhone; CPU iPhone OS 5_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9B179 Safari/7534.48.3
================================================================================
Comment 1 Ian 'Hixie' Hickson 2012-07-26 23:49:41 UTC
Can you elaborate on the use case for this feature?
Comment 2 Brian Hazzard 2012-12-13 19:24:55 UTC
(In reply to comment #1)
> Can you elaborate on the use case for this feature?

Sure!

It would allow developers to define semantics for applications that present data as HTML forms (rather than other HTML elements).

This would open the door for semantic web clients to intelligently consume these applications without the need for duplicating the data already available in forms as other HTML elements.

In the far-future, it also could make HTML an option as the primary media-type of choice for Hypermedia APIs.
Comment 3 Brian Hazzard 2012-12-13 19:28:22 UTC
*** Bug 20380 has been marked as a duplicate of this bug. ***
Comment 4 Ian 'Hixie' Hickson 2012-12-13 19:40:35 UTC
I don't understand. What software is going to consume the data here? Can you elaborate on what specific concrete problem you're trying to solve?
Comment 5 Brian Hazzard 2012-12-13 19:51:29 UTC
(In reply to comment #4)
> I don't understand. What software is going to consume the data here? Can you
> elaborate on what specific concrete problem you're trying to solve?

No problem.

My understanding of the purpose of microdata (and other technologies, like microformats, or rdfa) is to allow machines to consume HTML documents intelligently. "Intelligently" in this case means that they need only understand the semantics of the data being represented, not the format of the data.

So some examples of software that would consume the data:
- Sophisticated search engines. Google does this with microdata and microformats to intelligently display search results in a format that is useful. Why not allow them to understand the semantics of data in forms as well? (See Attachment "Google")
- API client software. HTML is a valid format for the representation of data in a hypermedia API. API clients could use the semantics from microdata on forms to understand that a form represents a person, and if they want to update that person's last-name, they can simply modify the input element that represents the itemprop="last-name" and submit the form. (See: http://codeartisan.blogspot.de/2012/07/using-html-as-media-type-for-your-api.html)
Comment 6 Brian Hazzard 2012-12-13 19:53:01 UTC
Created attachment 1287 [details]
Google using microdata
Comment 7 Brian Hazzard 2012-12-13 20:10:14 UTC
I may have chosen a poor way to answer your question. I gave you futuristic applications, and you asked for the specific concrete problem I am trying to solve.

I am building a website that represents all of it's data in HTML forms, and ONLY in HTML forms.

I'd like to markup the website semantically (via microdata) without duplicating the data that is already available in HTML forms as non-form elements. This way machine consumers (like google) of the web-site can understand the content's meaning, and act accordingly. Unfortunately, marking up form elements is not a part of the microdata spec, and as such current microdata parsers will not be able to understand my data.

This leaves me with a choice: either do not support these machine consumers, or duplicate my data just to support them.

I don't think that supporting this use case will have any ill effects, and it certainly would be useful for my case (and I am guessing for other folks as well).
Comment 8 Ian 'Hixie' Hickson 2012-12-14 23:32:26 UTC
> - Sophisticated search engines. Google does this with microdata and
> microformats to intelligently display search results in a format that is
> useful. Why not allow them to understand the semantics of data in forms as
> well? (See Attachment "Google")

Because Google doesn't have anyone typing in the form field when it crawls the Web, so you don't get anything out of making form fields support this rather than, say, just putting those default values into <span> elements.


> - API client software. HTML is a valid format for the representation of data
> in a hypermedia API. API clients could use the semantics from microdata on
> forms to understand that a form represents a person, and if they want to
> update that person's last-name, they can simply modify the input element
> that represents the itemprop="last-name" and submit the form.

That seems like huge overengineering. Why not just document a simple API? It's not like software is going to write itself, you still need someone at some point to write to the API.


> I am building a website that represents all of it's data in HTML forms, and
> ONLY in HTML forms.

I think you may be missing the point of forms. :-) They're for getting data from the user, not for outputting data.

Do you have a sample page I could look at?


> I don't think that supporting this use case will have any ill effects

All features have costs, even the good ones:

   http://wiki.whatwg.org/wiki/FAQ#Where.27s_the_harm_in_adding.E2.80.94

That's why it's important to only add the really valuable features.
Comment 9 Brian Hazzard 2012-12-19 16:59:07 UTC
(In reply to comment #8)
> Because Google doesn't have anyone typing in the form field when it crawls
> the Web, so you don't get anything out of making form fields support this
> rather than, say, just putting those default values into <span> elements.

Of course I understand that google doesn't type into form fields, but that doesn't mean it isn't crawling pages that are displaying values in forms instead of spans. Also, who is to say that in the future we won't have applications that ARE crawling the web and intelligently filling in form fields. The sign of good design is that it can be useful in ways unforeseen by the designers.

> That seems like huge overengineering. Why not just document a simple API?
> It's not like software is going to write itself, you still need someone at
> some point to write to the API.
A good API should be self documenting. Microdata has the opportunity to be the standard for self-documenting HTML based APIs.


Boss just asked for my time, so I'll respond with more later :)
Comment 10 Ian 'Hixie' Hickson 2012-12-31 04:46:47 UTC
(In reply to comment #9)
> (In reply to comment #8)
> > Because Google doesn't have anyone typing in the form field when it crawls
> > the Web, so you don't get anything out of making form fields support this
> > rather than, say, just putting those default values into <span> elements.
> 
> Of course I understand that google doesn't type into form fields, but that
> doesn't mean it isn't crawling pages that are displaying values in forms
> instead of spans.

What pages are putting data in editable form fields that are accessible to Google and where it would make sense to mark the data up using microdata? Do you have any URLs I could look at?


> Also, who is to say that in the future we won't have
> applications that ARE crawling the web and intelligently filling in form
> fields.

If an application is clever enough to know the information it's typing into form fields, then it doesn't need the microdata annotations to read the data back from the form fields. It already has the information: it entered it.


> The sign of good design is that it can be useful in ways unforeseen
> by the designers.

That sounds good, but I'm not sure it's necessarily a good metric. The sign of a _bad_ design is that it is more complicated than it needs to be to do all the things that people do with it.


> > That seems like huge overengineering. Why not just document a simple API?
> > It's not like software is going to write itself, you still need someone at
> > some point to write to the API.
>
> A good API should be self documenting.

I don't know what this means. What good APIs do you know of that are self-documenting? All the good APIs I know of have extensive documentation, but not as part of the API itself — in books and online tutorials and specifications and so forth. How would self-documenting APIs even work?


> Microdata has the opportunity to be
> the standard for self-documenting HTML based APIs.

I'm pretty sure that's not something I want it to be. :-)
Comment 11 Brian Hazzard 2013-01-14 03:43:13 UTC
I'm sure it's clear that I'd like you to reconsider your position. It's also clear that I've taken the wrong approach at explaining the value of this feature.

Forget what I've said about APIs. Lets focus on why this is generally useful...

Microdata is meant to make it easier for other developers to scrape the data from your site to reuse it in other applications, right?

A form with pre-filled values is a valid response to a GET request. I view this as just as valid a way to represent information as in span tags. It is just one more way of displaying data (it also happens to provide controls for changing the data, but nevermind that for now).

So, it would be useful to be able to mark up the semantics of this information, without needing to duplicate it in non-form elements.
Comment 12 Brian Hazzard 2013-01-14 04:07:25 UTC
(In reply to comment #10)
> > Also, who is to say that in the future we won't have
> > applications that ARE crawling the web and intelligently filling in form
> > fields.
> 
> If an application is clever enough to know the information it's typing into
> form fields, then it doesn't need the microdata annotations to read the data
> back from the form fields. It already has the information: it entered it.

Imagine a scenario like this: a Drug company keeps a canonical representation of their drug catalog, marked up as: http://schema.org/Drug

Then there are clients that crawl the web for public databases of Drugs... Things like WebMD looking for forms that are marked up as http://schema.org/Drug

Because the client knows the canonical source and understands how to interact with the forms, it can respond to changes at the canonical representation and make sure other references are accurate.

So when a new adverseOutcome is discovered for some drug, microdata on forms makes it possible for an automated client to make sure everyone knows about it.
Comment 13 Ian 'Hixie' Hickson 2013-01-15 00:02:40 UTC
(In reply to comment #11)
> I'm sure it's clear that I'd like you to reconsider your position.

That I haven't closed the bug is an indication that I'm still trying to be convinced by your arguments. :-)

(Even if I close a bug, I'm still happy to consider new information; just reopen such a bug and leave a comment explaining what new information has not yet been considered that would change the conclusion.)


> Microdata is meant to make it easier for other developers to scrape the data
> from your site to reuse it in other applications, right?

That's a simplistic description of one of several use cases that microdata is designed for, yes.


> A form with pre-filled values is a valid response to a GET request. I view
> this as just as valid a way to represent information as in span tags. It is
> just one more way of displaying data (it also happens to provide controls
> for changing the data, but nevermind that for now).
>
> So, it would be useful to be able to mark up the semantics of this
> information, without needing to duplicate it in non-form elements.

I think in theory that may be true, but is it true in practice? What sites mark up data that is useful enough to scrape, but mutable enough that the non-authenticated GET response actually includes that data in form controls only? If you have any examples of such sites that would be interested in using microdata, but either cannot (or rather, do not) because their data is only in form fields, or examples of sites that currently mark up their data in microdata but wish to remove all the ways they expose that data except form fields but currently can't for the aforementioned reason, that would be very helpful.

Note that this differs a bit from what was proposed earlier. Before, we were talking about giving the semantics of the _current value_ of the control; now we're more talking about giving the semantics of the _default value_ of the control. These are subtly different.


(In reply to comment #12)
Such a system would at a minimum need to be given site-specific credentials. At that point, it seems trivial to also give the system site-specific information on how to update the data as well — and I'd be surprised if the simplest way to do that was to reuse the form controls. You could just define an industry-wide format that includes both the credentials and the mechanism by which to send updates, and each site could provide the update system with that one data file.
Comment 14 Brian Hazzard 2013-01-24 14:49:12 UTC
> (In reply to comment #12)
> Such a system would at a minimum need to be given site-specific credentials.
> At that point, it seems trivial to also give the system site-specific
> information on how to update the data as well — and I'd be surprised if the
> simplest way to do that was to reuse the form controls. You could just
> define an industry-wide format that includes both the credentials and the
> mechanism by which to send updates, and each site could provide the update
> system with that one data file.

I'm not sure why you would want to limit microdata's usefulness to non-authenticated GETs. Sure that system would need site-specific credentials, but why reinvent the wheel and create a new industry-wide format, when their might already be defined a microdata schema for the information being shared.

My opinion is that microdata should be generally useful, both for non-authenticated scrapers like Google, and for authenticated Machine-to-Machine integrations.

Think about the software principle known as DRY (Don't Repeat Yourself). I think it also applies to schemas and specifications. We should be able to reuse microdata and the associated schemas from sites like data-vocabulary.org or schema.org data representing common things, regardless of the application type.
Comment 15 Ian 'Hixie' Hickson 2013-01-25 03:09:55 UTC
> Sure that system would need site-specific credentials,
> but why reinvent the wheel and create a new industry-wide
> format, when their might already be defined a microdata schema for the
> information being shared.

Why not? It would be far easier to come up with a new syntax than it would to parse HTML and take the microdata out of it. That's even assuming that the data the systems want is the same as the data they output, which isn't a given at all; for example, one could easily imagine applications where the systems want raw data that is then processed and output in a more useful form (e.g. temperatures being stored in Kelvin but output in Fahrenheit for a local audience).

I am not at all a believer of reuse for the sake of reuse. Reuse of the microdata vocabulary, in the case where the vocabulary really is reusable, is easy without having to reuse the syntax.
Comment 16 Brian Hazzard 2013-02-05 20:47:53 UTC
I'm not sure I'm going to be able to convince you. I believe that there is no good reason for the proliferation of formats that exist on the web when HTML is applicable for most use cases. A way to convey the semantics of user input controls, and the default values they contain, I believe is critical to the long-term usefulness of this technology.
Comment 17 Ian 'Hixie' Hickson 2013-02-06 21:19:18 UTC
Critical how? The only concrete example we've discussed here is how a drug company could update drug information Web sites by automatically crawling a form marked up with microdata and uploading information for new drugs. But I think that's a highly unlikely scenario — drug information sites don't have forms, in my experience; uploading information like this would almost certainly need human supervision and prefilling Web forms seems like it wouldn't make human supervision as easy as other solutions; the microdata vocabularies for output don't seem like they'd match the format for input anyway; the software would have to be hand-coded to support this vocabulary anyway so why not make it use a simpler format than HTML and not require any parsing, just use well-defined end points; you'd have to have credentials anyway to make any of this work, the drug information site would almost certainly want to review any new input regardless, so the format it's sent in seems of minor importance; the whole area is likely to be highly regulated; the industry is one of the richest industries and can easily afford to have custom systems rather than having to re-use what may not be a perfect fit, etc. etc. etc.

I just don't see how one could say that microdata in this scenario would be critical.


Can you elaborate on your actual use case (per comment 7)? Is this a site I could look at? Maybe I'm missing something that would become obvious when I saw how this site. As mentioned in comment 8, every feature has a cost, so I don't want to add this unless it's worth it. But equally, I don't want to reject something without understanding it sufficiently to be sure that it isn't worth it.
Comment 18 Brian Hazzard 2013-02-07 12:51:04 UTC
See http://vimeo.com/m/20781278 for an explanation of the power of what I'm describing. In his example, he is forced to use the microform style, relying on class names. This works, but its more of a kluge when microdata is available.

HTML has these advantages for machine to machine processing:
- Well-defined Hypermedia support with templates links via forms
- Structure decoupled from semantics via microdata
- Powerful tooling, not to mention browser support

I don't have a production use case that uses microdata in this way, as it isn't supported.
Comment 19 Ian 'Hixie' Hickson 2013-02-08 02:56:03 UTC
You wrote in comment #7:

> I am building a website that represents all of it's data in HTML forms, and
> ONLY in HTML forms.

This is the Web site that I was talking about when I wrote:

> Can you elaborate on your actual use case (per comment 7)? Is this a site I
> could look at? Maybe I'm missing something that would become obvious when I
> saw how this site.

(Looking at video now.)
Comment 20 Ian 'Hixie' Hickson 2013-02-08 03:08:09 UTC
I don't really see the value of what's described in that video. Determining the API for a product is not the difficult part of writing code. There's no value to making the API "self-describing", it just introduces more potential sources of bugs. Instead of having beautiful UI with a hard-coded implementation, you end up having a beautiful UI that breaks as soon as the thing it's talking to decides it can change its API to be something the UI designer didn't expect.

Mostly I put self-describing REST APIs in the same bucket as I put RDF. There are generic over-engineered solutions to non-problems.
Comment 21 Ian 'Hixie' Hickson 2013-02-08 04:29:39 UTC
(For example, around minute 42 he talks about how the server could change from HTTP to HTTPS just by changing the URL. But that assumes that the client supports HTTP. Or he says you can use Basic or Digest auth. But that assumes the client has any auth support. And what if you want to use a one-time password device? The client has no UI to support that, so it doesn't matter what you do in the markup. In fact the whole architecture he describes at minute 44 is flawed: it assumes that the page have the forms the UI/application state expects. He says "all the stuff on the right is reusable", but it's also completely pointless. It doesn't actually add anything that you couldn't have just by hardcoding the whole API in a few lines of code, as far as I can tell.)
Comment 22 Judson 2013-02-19 18:51:33 UTC
I come at this from the same perspective that Brian does.  If nothing else, it's very powerful to be able to decouple as much as possible the API implementation and the client implementation.  Witness, for instance, the lag time in distributing new versions of iOS and Android apps, and needing to maintain versioned APIs as a result.  (Or the apps that break until new distributions are available.)

If nothing else, the fact that input elements will always represent a property with an empty string as its value seems like an inconsistent design decision.  Where itemref and the property crawling algorithm seem to provide for the markup of HTML with machine readable data in a very flexible way, excluding input elements adds a complete restriction on what I would have thought would be a straightforward usage.  "Yes, you can look up values from the meta tags in the head of your document, but no you cannot use the key values you were already using in forms."
Comment 23 Ian 'Hixie' Hickson 2013-03-26 00:04:59 UTC
(In reply to comment #22)
> Witness, for instance, the lag time in distributing new versions of iOS
> and Android apps, and needing to maintain versioned APIs as a result.

Can you walk me through how the proposal here would affect this? I mean, at a concrete level, with a real app.


> If nothing else, the fact that input elements will always represent a
> property with an empty string as its value seems like an inconsistent design
> decision.

It's not really a decision, it's just that <input> is handled the same way as <span> and <div> and <p> and <textarea> and everything else. Supporting <input> in a special way is additional complexity, so we have to make sure we have a very strong reason for doing it.
Comment 24 Brian Hazzard 2013-03-26 19:46:09 UTC
Ian, I don't suppose you are going to http://fluentconf.com/fluent2013

If so, I'd love to talk to you about this in person. Long shot, I know...
Comment 25 Ian 'Hixie' Hickson 2013-05-04 00:17:56 UTC
Sorry, I don't go to many conferences. :-(

Any chance you can help me with the questions in comment 23? I'm not against this proposal per se, I just don't want to add something to the spec if we don't have compelling use cases, and so far I'm not seeing how this proposal would translate to real user or author benefits, in a concrete fashion.
Comment 26 Brian Hazzard 2013-06-05 15:15:49 UTC
The compelling use case is being able to support a REST/Hypermedia architectural style. An example is in the video I sent earlier. Whether you like the style or not, it is something that people are trying to do, and for now they have to overload the meaning of html/css classes to achieve what they are after.

FWIW RDFa Lite supports this feature. You can see this for yourself by going to http://rdfa.info/play/ and pasting the following example into the test area.

<form vocab="http://schema.org/" typeof="Person">
  <input name="name" property="name" type="text" value="Brian Hazzard" />
</form>
Comment 27 Ian 'Hixie' Hickson 2013-06-12 22:27:20 UTC
I don't think the REST/Hypermedia architectural style is something we should encourage. IMHO it's a rather poor approach. It doesn't lead to good UIs. Compelling user interfaces are not going to be generically generated from microdata descriptions of forms, they're going to be carefully thought out, hand crafted to solve specific problems.

That RDFa supports this isn't really a reason to add it to microdata — part of the reason microdata exists in the first place is that RDFa was poorly designed and ended up not being a good approach to solving the problems that people raised which led to microdata. RDFa lite hasn't fixed these underlying problems. We could easily overengineer microdata in the same way, but that would merely mean microdata didn't solve the problem well either.

I'm marking this WONTFIX because the presented use cases aren't compelling, as described above and in comment 21. If there are other use cases, please feel free to reopen the bug.
Comment 28 Brian Hazzard 2013-06-13 01:35:27 UTC
(In reply to comment #27)
> I don't think the REST/Hypermedia architectural style is something we should
> encourage. IMHO it's a rather poor approach. It doesn't lead to good UIs.
> Compelling user interfaces are not going to be generically generated from
> microdata descriptions of forms, they're going to be carefully thought out,
> hand crafted to solve specific problems.
REST does not imply user experiences that are generic in regards to your problem set. Rather it implies a user experience that is intimately aware of the problem domain, but not tied to the specific implementation of the solution.

> That RDFa supports this isn't really a reason to add it to microdata — part
> of the reason microdata exists in the first place is that RDFa was poorly
> designed and ended up not being a good approach to solving the problems that
> people raised which led to microdata. RDFa lite hasn't fixed these
> underlying problems. We could easily overengineer microdata in the same way,
> but that would merely mean microdata didn't solve the problem well either.
I agree that this isn't an argument for adding it to microdata, I was just pointing out that I had found a technology that meets my needs.

> I'm marking this WONTFIX because the presented use cases aren't compelling,
> as described above and in comment 21. If there are other use cases, please
> feel free to reopen the bug.
I sincerely applaud you for being a part of driving web standards forward and striving to keep them simple.