User:Cjones/ISSUE-184

From W3C Wiki

Change Proposal for ISSUE-184: Enhance the <data> element with a type system

Summary

This is a supplementary proposal for the resolution of ISSUE-184 which defines the addition of the <data> element.

This proposal enhances the <data> element with a "type" attribute for discriminating the class of data represented by the value attribute or the contained element contents.

The valid values for the "type" attribute are an enumerated set of keywords initially relating to a sub-set of the values currently accepted for the <input> element.

This proposal also includes the addition of the "type" attribute to the <output> element for "typed" values to be represented by this element in order to provide style formatting across elements in a predictable and complementary manner.

Rationale

The addition of the <data> element allows for the semantic markup of machine readable data. The definition of this element only contains the "value" attribute in addition to the set of global element attributes. This exhibits a lack of definition as to what the <data> element represents and how it can be interpreted or should be used.

Type Definition

The contents of the <data> element is a text-format representation of the machine readable data. While this text could be parsable it is highly ambiguous and can not be used to definitely acquire the value or type of data.

The "value" attribute of the <data> element is the text-encoded representation of the data information, however there are numerous types of data represented by the <data> element and there is significant scope for overlap within text-encodings. This leaves the problem of what the value represents and how to use it.

The "type" attribute solves this problem by providing the necessary definition to the value in order that it can be converted into a binary representation for machine processing.

Value Integrity

The addition of a type attribute for data values introduces a set of constraints over the valid values that the element may represent. This provides the ability for the value to be restricted to a known data type which can be enforced within the HTML client. This provides the benefit that data values can be used programatically in a safe manner where data can be accessed without having to handle invalid values or manipulate encodings.

Value Formatting

The value of the <data> element is required to be displayed to the user as part of the HTML document rendering. The <data> element caters for customization through the ability for an author to declare the text content of an element and for that value to be the text which is displayed to the user. This process is orchistrated by the HTML client which, in the absence of text content, will process the label by direct substitution of the data value.

With the addition of a type attribute, this process can be enhanced to allow the client to render the value to a format defined through the type discriminator and the element's resolved locale. This allows for HTML documents to be created for both machine readability and client-localization for the automated translation into formats that adhere to the cultural conventions of the user. This support is currently only afforded to <input> elements and otherwise results in semi-translation within a HTML document which is confusing and error prone for users.

Input & Output Elements

From the introduction of a set of type-classes into the HTML vocabulary, the scope for the application of these types can naturally be extended to the other data-related elements <input> and <output>. These elements represent the means for user-interaction over data and are subject to the same requirements of data integrity and format representation.

The <output> element additionally benifits from this extension as it can be explicity tied to <input> element value's and their modification events. This process is enhanced through the addition of the type system as it allows for the values to be restricted or converted into specific types and for this to be constrained and controlled within the HTML client.

Details

  1. Add the "type" attribute to the <data> and <output> elements
  2. Define the valid states for the "type" attribute for these elements as a set of enumerated values based a sub-set of valid states for the <input> element: text, tel, url, email, datetime, date, month, week, time, datetime-local, number, range, color
  3. Define the default type for elements with un-set or unknown type attribute as 'text' state
  4. Define the value of the element as the resolution of its value through the application of the corresponding HTML microsyntax
  5. Define the rendering algortithm for elements with text content as the direct substitution of the state of the IDL 'textContent' attribute.
  6. Define the rendering algortithm for elements without text content as the substitution of the state of the 'value' attribute after conversion through the application of the element's resolved locale to the formatting rules defined in the Unicode Common Locale Data Repository (CLDR).

Impact

Positive Effects

  • Data is annotated with specific known types
  • Data can be accessed using simple DOM functions
  • Consumers are protected from invalid values or needing to manipulate text encodings
  • Allows for automatic user agent substitution of formatting of data values with client-side localization
  • The <output> element may perform automatic transliteration of data type values from <input> values with restriction and validation
  • Allows for greater specificity and control over the denominations of date and time information currently afforded to the <time> element and its proposed extensions

Negative Effects

  • Introduces a value type which is independent of the external declaration of data markup technologies, however since all values are currently interpreted as text this offers an additional level of validation and integrity available prior to external data interpretation

Conformance Classes Changes

  • Additional conformance as specified in the changes described in "Details" section

Risks

  • As the addition of new features and the application of as type system as value restriction over value expression the risks of introducing this feature set is limited to future development and deployments

References

  • ISSUE-184: Add a data element [1]
  • Tantekelik's propsal for ISSUE-184 [2]
  • Unicode Common Locale Data Repository (CLDR) [3]