13465 – Implement datatyping in Microdata

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13465 - Implement datatyping in Microdata

Summary: Implement datatyping in Microdata

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	LC1 HTML Microdata (editor: Ian Hickson) (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P3 normal
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-07-30 14:37 UTC by Manu Sporny
Modified:	2011-08-21 23:00 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description Manu Sporny 2011-07-30 14:37:01 UTC

This feedback is filed as a personal comment and is not intended to be any sort of official feedback from any standards working group.

Microdata currently has no way of expressing data types using the syntax. This is an issue for at least two reasons:

1. It's impossible to determine whether or not a value is an IRI, or a text string that looks like an IRI. This is an issue for forward-compatibility reasons, as a string intended to be a colon separated value today, e.g. dc:title, could be interpreted as a IRI tomorrow if a new scheme 'dc' were created. There is no deterministic way to understand the authors intent without datatyping in this case.

2. It is useful to applications to understand the datatype of a particular item in a machine-readable way. Asking application developers to create a universal way of figuring out what "45" means in a generic data processor is eased by tagging the value with a datatype. In one example, 'delta' could be expressed in "degrees Celcius, a unit of temperature measurement", in another "degrees, a unit of angle measurement".

Ensure that the data-typing carries through in the JSON representation in some way. JSON-LD demonstrates how this can happen via the @context without making the data values used in the JSON more complex.

One suggestion is to create something in the Microdata spec that ensures that Web vocabularies are machine-readable and then set a "default type" for Microdata Web vocabularies. This would make it so that the complexity of datatyping is handled in the Web Vocabulary and does not have to be expressed by the Web author. Ideally, you would be able to specify a default type for Web Vocabulary terms as well as override that default type using something like @itemproptype.

Comment 1 Philip Jägenstedt 2011-07-30 22:17:32 UTC

The general idea with microdata is that the property type and its syntax is defined by the vocabulary.

1. If the property is given using a URL property element the type is implicitly a URL. Is that enough?

2. If one doesn't understand the item's vocabulary itself, why would one be inspecting individual property values?

Comment 2 Aryeh Gregor 2011-08-01 17:48:51 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:

   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Additional Information Needed
Change Description: no spec change
Rationale:

I don't understand the use-cases.  In what circumstances would a microdata consumer be able to make use of data type information, without any hardwired information about the vocabulary?  If it does have hardwired information about the vocabulary, why does data typing need to be part of the general syntax instead of the vocabulary-specific semantics?  I looked at the use-cases that microdata was written to cover and didn't see any where it would be useful:

http://wiki.whatwg.org/wiki/Microdata_Problem_Descriptions

(In reply to comment #0)
> 1. It's impossible to determine whether or not a value is an IRI, or a text
> string that looks like an IRI. This is an issue for forward-compatibility
> reasons, as a string intended to be a colon separated value today, e.g.
> dc:title, could be interpreted as a IRI tomorrow if a new scheme 'dc' were
> created. There is no deterministic way to understand the authors intent without
> datatyping in this case.

There is: specify it in the vocabulary, and assume that tools that consume the data will be hardwired with vocabulary-specific information.

> 2. It is useful to applications to understand the datatype of a particular item
> in a machine-readable way. Asking application developers to create a universal
> way of figuring out what "45" means in a generic data processor is eased by
> tagging the value with a datatype. In one example, 'delta' could be expressed
> in "degrees Celcius, a unit of temperature measurement", in another "degrees, a
> unit of angle measurement".

What's a concrete example where this would be useful, as opposed to hardwiring the units into the vocabulary?

> One suggestion is to create something in the Microdata spec that ensures that
> Web vocabularies are machine-readable and then set a "default type" for
> Microdata Web vocabularies. This would make it so that the complexity of
> datatyping is handled in the Web Vocabulary and does not have to be expressed
> by the Web author. Ideally, you would be able to specify a default type for Web
> Vocabulary terms as well as override that default type using something like
> @itemproptype.

Why can't the vocabulary require that a particular type always be used, possibly a type that includes units in-band and therefore can accommodate different units without extra metadata (e.g. "5km", "5mi", "5m")?

Comment 3 Michael[tm] Smith 2011-08-04 05:05:45 UTC

mass-move component to LC1

Comment 4 Manu Sporny 2011-08-09 00:08:12 UTC

"If the property is given using a URL property element the type is implicitly a URL. Is that enough?"

How does this happen? Is it machine readable? You are sort-of talking about type coercion here:

http://json-ld.org/spec/ED/20110808#type-coercion

Type coercion is just one aspect of this issue. Yes, adding type coercion to microdata would go toward addressing this issue. However, you can't always depend on type coercion - for example, when a vocabulary is used to specify multiple different types of units. For example - "weight" can be defined in a variety of different units of measurement.

"If one doesn't understand the item's vocabulary itself, why would one be inspecting individual property values?"

That's not the issue. The question should be "If a vocabulary term can express something like 'weight', what is the best way to associate the unit of measurement with the number?"

"In what circumstances would a microdata consumer be able to make use of data type information, without any hardwired information about the vocabulary?"

Not "microdata consumer" - "data consumer". You can have a software module that is capable of translating weights and measures into any unit that you want. You can include this software module in your software to do the translation accurately. For example - I have an application that works with recipes, one of the ways to declare an ingredient in a recipe is by weight. You have two major schools of thought on expressing weights - imperial and metric. There are hundreds of millions of people that use both, so asking people to use one or the other exclusively unnecessarily excludes hundreds of millions of people. So, you can't:

* Tell people to just use metric measurements
* Create a library that is capable of reading any type of short-hand for weights/measurements because of overlaps in short-hand for expressing weights/measures (m for minutes vs. metres, deg for degrees (angle) vs. degrees (temperature)).

However, if you tag it with an IRI datatype, the datatype is unique and discoverable (you can provide other things like the name of the datatype in multiple different languages). A consuming application wouldn't have to know anything about the vocabulary to know that a datatype of http://example.org/measurement/temperature#fahrenheit is a measurement for temperature in degrees fahrenheit as well as how to express the value in a person's native language because dereferencing the IRI will provide a fair amount of other information about the measurement (like the language-native glyphs for the concept).

"If it does have hardwired information about the vocabulary, why does data typing need to be part of the general syntax instead of the vocabulary-specific semantics?"

You shouldn't hardwire this type of information in the vocabulary. See the use case above.

"I looked at the use-cases that microdata was written to cover and didn't see any where it would be useful."

Then add the recipe use case to Microdata. Being able to specify units of measurement is important when baking a cake, giving somebody a dose of medication, and when sending scientific instruments to Mars.

"Why can't the vocabulary require that a particular type always be used,
possibly a type that includes units in-band and therefore can accommodate
different units without extra metadata (e.g. "5km", "5mi", "5m")?"

I answer this above. Doing this is a hack that requires people to express their units of measurement in a specific way. People are not very good at doing this.

"What's a concrete example where this would be useful, as opposed to hardwiring
the units into the vocabulary?"

See above.

Comment 5 Ian 'Hixie' Hickson 2011-08-15 04:49:36 UTC

> That's not the issue. The question should be "If a vocabulary term can express
> something like 'weight', what is the best way to associate the unit of
> measurement with the number?"

That is indeed the question. And the answer is, "with the definition of the 'weight' property in the specification that describes the vocabulary that contains this property".


EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: What you expose to the user (imperial or metric, fahrenheit or centigrade or kelvin, lightyears or kilometers or inches, etc) is irrelevant to how you encode the data. Just make sure your tool serialises the data in the units that the vocabulary you are using defines the properties to have, and parse the values in the same way, and then you can display them to the user in whatever form you like without having to do anything new at all at the syntax layer.

Comment 6 Manu Sporny 2011-08-21 19:08:18 UTC

This requirement places an unnecessary burden on Web authors, e.g. forcing those that author in imperial or metric units to the opposite method, as well as forcing how they express units of measurement counter to common cultural practices. For example, requiring all recipes to be expressed in imperial units or all recipes to be expressed in metric units is unnecessary when a computer could do a simple conversion from one to the other. While the ideal world would have everyone standardized on a single system of units, we don't live in an ideal world. Forcing the users of a particular vocabulary to use a system of measurement that is unfamiliar to them is an unnecessary requirement.

The resolution does not address my concern. The editor has responded in the same way in the past and there is no reason to believe that he will change his mind now. I do not plan to escalate this issue, but note that this is a bad design decision, IMHO.

Comment 7 Aryeh Gregor 2011-08-21 23:00:41 UTC

(In reply to comment #6)
> This requirement places an unnecessary burden on Web authors, e.g. forcing
> those that author in imperial or metric units to the opposite method, as well
> as forcing how they express units of measurement counter to common cultural
> practices.

The vocabulary can define units itself by requiring that particular pieces of information adhere to some format that includes a unit.  For instance, a piece of information whose meaning is a length could be required to be a valid floating-point number followed by optional whitespace followed by either "m" or "ft", and be interpreted accordingly.

Since this is a common need, it would make sense for different vocabularies to share parsing infrastructure.  For instance, someone might write a spec that would define a syntax for lengths including units, with a specific set of supported units, and an algorithm for converting any of these lengths to a number of (say) meters.  Then different vocabularies that need a length unit could refer to that spec for syntax and parsing requirements.

None of this requires support for units at the syntax level.  Syntactic support just means that providing units is more cumbersome: instead of making them part of the string where you give the quantity, as people are accustomed to, you force authors to specify them separately.  You can solve the same problem more simply by just allowing the vocabulary to define how to interpret the values.

> The resolution does not address my concern. The editor has responded in the
> same way in the past and there is no reason to believe that he will change his
> mind now. I do not plan to escalate this issue, but note that this is a bad
> design decision, IMHO.

You can add the keyword Disagree if you don't agree with the resolution but don't want to escalate.