IdAndTypeID

From HTML WG Wiki
Jump to: navigation, search

The ID type, and the attributes: id, xml:id

Problem statement / use cases

  • Authors may want to uniquely identify an element without desiring the strictness of the ID data type
  • The xml:id attribute already provides an attribute taking a value of type ID and only one such attribute is permitted in the XML serialization of HTML5
  • Authors want consistency between the text/html serialization and the XML serialization of HTML5
  • Authors and authoring tools already produce documents without carefully checking the uniqueness of ID values (even within XHTML and XML documents) within the document of the uniqueness of type ID attributes on each element, so HTML5 should clearly define interoperable norms for matching such IDs, including:
    • the CSS id/ID selector.
    • DOM id related methods (getElementById).
  • Authors want a way to aggregate content such as articles from different sites or articles from the same site in a way that does not cause ID collisions nor id attribute collisions.
  • Authors want to mix arbitrary vocabularies in compound documents that make use of xml:id as cross-vocabulary/cross-namespace ID-valued attribute.
  • While including ids in hand-coded HTML can be cumbersome and therefore authors are likely to include them only for some direct and immediate need, authoring tools can easily add auto-generated id values to elements or elements of a particular type. However to ensure document-wide uniqueness, such id values may end up being difficult to read and type accurately, difficult to distinguish, and long.
  • The usage of xml:id is still in its infancy and now is the time to shape its usage and treatment by common UAs.
  • Unique id attribute value violations are common place and so clear interoperable processing guidance needs to be provided when id attribute value collisions do occur (consider this example).
  • The goals of maintaining document-wide uniqueness for IDs and also achieving ID persistence are at odds with one another (for example consider pasting content, and aggregation of content)

Summary

The need for clearly defining the data type and usage of id and other identifying attributes.

For the XML serialization, elements can only have a single attribute that take the type ID. Also, the general discussion of issues surrounding HTML5 and the id attribute lean toward treating the value type for the attribute as something other than ID. Combine this with the common disregard for uniqueness of id attributes (and the ID datatype) and this raises issues of interoperability. This proposal suggests adding a new IDENT data type for the id attribute. Similarly, this proposal suggests HTML5 provide guidance for authors to use xml:id for identifying elements with a data type of ID and the use of UUID’s for such xml:id(ID) values, while defining a new IDENT data type for the traditional HTML id attribute. This would then allow for consistency across the text/html and XML serializations. Other benefits include:

  • easier for authors to aggregate documents without identifier data type collisions (ID and others)
  • clearly defined interoperable guidance for UAs on how to match on IDENT and ID data types

Proposed solutions

Introduce a new IDENT data type for the id attribute

A new data type for the id attribute of IDENT. In contrast to the data type ID, which must be unique within an entire document, an attribute value of type IDENT must only be unique among all of the sibling elements. This implies that for the root element and for any elements without siblings the IDENT uniqueness requirement is automatically satisfied.

This uniqueness constraint also permits authors to more thoroughly use the id attribute without concern for collisions with the id attribute on other elements as is the case with ID uniqueness (required on a document-wide basis). For elements likely to be combined with other elements (such as the ARTICLE element), HTML5 can recommend that authors should use universally unique IDs (UUIDs) to avoid collisions. For example:

<body>

<article xml:id='73D852A2-65B4-4BF9-B2F0-5F31B67AFA99' >
<section id='introduction' >
...
</section>
<section id='section2'>
...
<section id='section2a' >
...
</section>
</section>
</article>

<article xml:id='524AFB3A-1547-4524-9F21-519A004716F7' >
<section id='introduction' >
...
<section id='abriefhistory' >
...
</section>
</section>
<section id='needsassessment'>
...
</section>
</article>

</body>

Consider this example of an order list where authors can use duplicate id attribute values on child elements of each list item (though still unique among the siblings).


<ol >
<li><span id='name'>Brandon</span><span id='post' >90210</span></li>
<li><span id='name'>Mark Green</span><span id='post' >60607</span></li>
<li><span id='post' >19102</span> <span id='name'>Cliff Huxtable</span> </li>
<li><span id='name'>Rico Tubbs</span><span id='post' >33122</span></li>
<li><span id='name'>Chandler Bing</span><span id='post' >10203</span></li>
</ol>

Regardless of the order of child elements of the list item elements, a processor could provide a simple method to sort the list on a particular id/key.

xml:id for authors looking for a genuine ID data type

In both the XML and text/html serializations recommend UAs process and authors create documents that use xml:id attributes only when an author wants to specify and takes on the responsibility of ensuring document wide uniqueness.

Add an idScope DOM attribute to the document interface

By setting the idScope of a DOMDocument (or probably DOMHTMLDocument), authors can fine-tune the DOM matching of id attributes for the getElementById method.

Define a precise processing hierarchy for ID and IDENT data types

Some research will be required to determine the approach HTML5 should recommend. This is merely a first guess.

CSS id selector

For a CSS id selector such as '#identity'

  • select the first element in tree order where the value of xml:id matches 'identity' (we want to make sure the error of matching multiple elements whose xml:id attribute matches 'identity' is not reinforced)
  • if no such xml:id attribute matches identity, match every element whose id attribute has the value 'identity'

DOM getElementById from a linked script or a script element in the document head

Consider a DOM method such as mydocument.getElementById('identity').

  • return the first element in tree order where the value of xml:id matches 'identity' (we want to make sure the error of matching multiple elements whose xml:id attribute matches 'identity' is not reinforced)
  • if no such xml:id attribute matches identity, returns the first element in tree order whose id attribute has the value 'identity'

DOM getElementById from a script element in the document body

Consider a DOM method such as mydocument.getElementById('identity').

  • return the first element in tree order where the value of xml:id matches 'identity' (we want to make sure the error of matching multiple elements whose xml:id attribute matches 'identity' is not reinforced)
  • if no such xml:id attribute matches identity, returns the next element in tree order when starting from the script element itself whose id attribute has the value 'identity', ascending up the tree in sibling order until a match is found or the root element is reached

IDREFs and IDREF hashes

In the case of for/id associations and other attributes referencing IDs, HTML5 should make a complete specification of the association of a referenced ID and IDENT.

  • If xml:id is present whose value matches the IDREF, then this should be the associated element
  • if the document is nonconforming and the same ID value appears more than once in the document, then match the first element encountered in depth-first tree order starting from the root element with the xml:id attribute value matching the IDREF
  • If no such xml:id is present then:
    • the IDREF from from a referrer outside the document should match the first IDENT in depth-first tree order
    • the IDREF from from a referrer within the document should match the first element found whose id attribute matches the IDREF in the following order:
      • the first descendant element in depth-first tree order whose id matches the IDREF
      • the first element whose id matches the IDREF when recursively traversing the tree moving through each sibling element of the initial element’s parent element in depth-first tree order, from the first sibling to the last (i.e., skipping over the initial element)
      • the root HTML element (if it has a matching id attribute)
      • no match implies null for the IDREF’s referenced element so no element is associated (in the case of an activation UAs should maintain the same scroll position in the viewport)

isId DOM method

Since adding an IDENT data type so closely related to the ID data type complicates the processing of ID values, it is important to clearly define how the id attribute in HTML responds to the DOM method isID. Since the IDENT is not an entirely new data type, it should still be considered an ID for purposes of the isID method under many or even most circumstances. However, if an xml:id with the same value appears in the document, the IDENT value should return 'false' for the isId method. Also if an xml:id attribute is on the same element, the id attribute should return false for the isId method. There may be other circumstances where HTML5 should specify a 'false' return value for the isId method, but in general the IDENT should often be considered isId 'true' for the purposes of the DOM.

XPath / XPointer processing

[similar to IDREF processing]

Also, liaison with XPath and XPointer WGs to establish a new IDENT syntax to express an XPath using IDENT values. Also see the HTML5 proposal on -DocFragPointer: enhanced document fragment pointers

Imputed table cell IDENTs

By imputing table cell IDENTs from columns and rows (and column groups and row groups), authors can reference any arbitrary cell in a table without needing to provide ID or IDENT data types on every cell. In this way the reference could be made as "#tableidref/rowident/columnident" or "#tableidref/columnident/rowident" to select a specific cell.

Conclusion

Note that the results of this proposal are entirely compatible with content where the author follows the current practice of treating the id attributes data type as an ID and where each ID value in a document is unique. However, the proposal also improves interoperability for authors who already fail to ensure the uniqueness of their ID values. Finally, the proposal — by adding the xml:id attribute to both serializations — further supports the needs of authors by continuing to provide a unique ID-valued attribute.

Discussion and evaluation

Email

WG members should post feedback and other discussion to the WG’s list serve (the URI for the links below provides date information). Search on this [ email subject].

See also