MetadataContainerConfusion

From HTML WG Wiki
Jump to: navigation, search

Metadata container elements

Problem statement / use cases

  • several elements in the HTML5 draft (aside, nav, header, footer) attempt to serve two separate but related roles: as containers spatially related to one another and as containers for certain metadata or supporting data, where the elements appear to be named for spatial components of a page, but then defined for semantic presentationally independent purposes
  • the names of elements matter and the naming will undermine the reliable use of these elements for their intended purpose
  • with the facilities available from CSS, providing generic containers for metadata is less useful than allowing authors to add semantic attributes to markup or use dedicated and precise semantic elements (e.g., ADDRESS, though again the name creates issues)
  • some justification for providing these names is drawn from their frequent use as class or id values by authors, however those uses are for spatial containers and not reliably marked-up metadata
  • the terms heading and header are often confused by authors and our use of header as another heading element contributes to that confusion
  • if spatial containers or components are necessary for authors, we should simply provide them without overloading their meaning with specific metadata
  • if further metadata markup is necessary for authors, we should simply provide that markup rather than trying to force such metadata to fit into elements with spatially oriented names

Further discussion

ASIDE

From the draft:

   “The aside element represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content. Such sections are often represented as sidebars in printed typography.”

The aside element is for tangential or subordinate text, but the draft implies it is spatially intended for a sidebar. Why not simply have a “sidebar” element for a sidebar and provide a “subtext” element or “aside” for subordinate text.

NAV

From the draft: “The nav element represents a section of a page that links to other pages or to parts within the page: a section with navigation links. Not all groups of links on a page need to be in a nav element — only sections that consist of primary navigation blocks are appropriate for the nav element. In particular, it is common for footers to have a list of links to various key parts of a site, but the footer element is more appropriate in such cases.”

Note the advice at the end which suggests that if the navigation is at the bottom of the page, it is better to use the footer element. Together with the other advice in the draft, these elements appear to define four peripheral regions of a HTML document: a header at the top; a footer at the bottom; a navigation component on one side; and a sidebar component on the other side. There is nothing wrong with defining these regions or even providing dedicated component elements for authors to use. However, the problem is to provide them, but then blur the lines by trying to make these components sound like their semantic web type elements and not spatially-oriented components.

HEADER

From the draft: “The header element represents the header of a section. The element is typically used to group a set of h1–h6 elements to mark up a page's title with its subtitle or tagline. However, header elements may contain more than just the section's headings and subheadings — for example it would be reasonable for the header to include version history information.

“For the purposes of document summaries, outlines, and the like, header elements are equivalent to the highest ranked h1–h6 element descendant of the header element (the first such element if there are multiple elements with that rank).

“Other heading elements in the header element indicate subheadings or subtitles.

“The rank of a header element is the same as for an h1 element (the highest rank).

“The section on headings and sections defines how header elements are assigned to individual sections.”

Here, we should try to keep the concept of a header (as a runner appearing a the top of a page or as a component appearing at the top of a document) separate from a heading which is the title of an article or section. Certainly it might often make sense to place the headings in a header, but this is something better left for CSS. We could define a header as a component/container for the beginning of the body or the beginning of a section without saying anything about what belongs in side it (leaving that to the page designer).

FOOTER

“The footer element represents the footer for the section it applies to. A footer typically contains information about its section such as who wrote it, links to related documents, copyright data, and the like.

“Contact information for the section given in a footer should be marked up using the address element.

“Footers don't necessarily have to appear at the end of a section, though they usually do.”

Again, what is this element for if it is not to appear at the end of a section or document. Requiring authors to place specific metadata in the footer is probably not advisable, so then this becomes an solely an element whose name is footer that should usually go at the end. Why not just make it a footer element that, if it is used, must be used to end a section or body element. This makes its meaning more reliable.

Proposed solutions

To achieve better separation of concerns, divide the function of these elements in two: 1) spatial component elements and 2) metadata supporting markup.

Spatial components

With the introduction of a broader content model for paragraphs and also the introduction of article and section elements, the need for the DIV element is almost entirely reduced to an element to divide an HTML document into logical spatial components or divisions. We could further augment this by providing specific named divisions such as “header”, “footer”, “sidebar” and “navbar”, etc. Or we could continue to recommend authors follow the practice of assigning id values or class names to such DIV elements.

In addition, CSS is also working to address these spatial concerns through new CSS modules that give authors greater control of the spatial presentation of a document.

Metadata

For the other aspect of these components, we should try to provide more complete metadata capabilities within HTML. Some facilities are already in HTML, but we could round out those facilitates and provide a more thorough base for document metadata. Using the Dublin-Core recommendations as a point of departure, we can consider what might be needed to round out the metadata capacities of HTML. Note the use of Dublin-Core here is not to say we need to adopt anything about Dublin-Core, just that it is good authoritative standard for commonly valued metadata properties. Ideally, the metadata expressiveness of HTML should be complete enough to map abstractly to any of many metadata standards.

Dublin-core metadata already available

metadata
property
Document Sections Note
titles TITLE H1–H6 or H titles are handled inherently in HTML through both the TITLE element and the H1–H6 elements. We might introduce a structural H element to allow authors to get away from the numbered heading elements. Similarly nested H elements would provide authors a way to structurally markup subheadings separate from headings
Identifiers Document URL fragment identifiers The use of URIs and promoting the practice of persistent universal URLs provides a valuable built-in facility for identifiers for HTML
Languages HTML@lang or HTML@xml:lang lang or xml:lang HTML already includes language markup through the lang and xml:lang attributes
Relations LINK@href @href, @cite, @attributeto The LINK facilities for HTML provide a way of encoding relation metadata in documents
Formats n/a n/a Typically “HTML” would constitute the format itself or it could be further determined by the DocType or the HTTP content-headers or operating environment type so providing additional metadata is typically unnecessary

Dublin-core metadata gaps

In addition to those Dublin-Core properties already handled by existing HTML facilities, there is a need to provide a clear interoperable conformance norms for other common metadata properties. For example, there is already a common practice of including a meta element with the name “keywords” and comma-separated values for each keyword in the content attribute of a meta element. This could be enhanced by globalizing the ConsiderGlobalAttributesModules: metadata attributes and perhaps using a separate T (term) element for each keyword with metadata attributes attached. In this way the metadata can be provided either for the entire document or for specific sections or articles within the document.

The following table shows a Dublin-Core property in the left column followed by potential HTML document-wide element for specifying the property followed by potential in body method for specifying the property. After those first three columns follow a fourth column with a recommended property keyword for use in the META@name, @property or @rel attributes. The fifth column indicates how authors express multiple value-instances of the same property: 1) “Y” for one property per element; 2) “N” for all properties comma-delimited within the same element; and 3) “1” indicates the metadata property should typically only be included once in a document or for a particular section (though in exceptional cases, multiple instances should involve separate elements)

metadata
property
Document Sections keyword one property
per element
Note
Sources @cite n/a Y The propose AttrtibuCitaQuotationReferencing: attribution and citation features provides thorough support for marking up Dublin-Core sources
Subjects (keywords) META METAD
ADDRESS@m
T
AnyElement
keywords N
Descriptions LINK
META
A
ADDRESS@m
METAD
description 1
Types META METAD
ADDRESS@m
T
AnyElement
types N
Rights LINK
META
A
ADDRESS@m
METAD
copyright
license
1 Also the 'license' keyword for linking to relevant use grant licenses
Scopes LINK
META
A
ADDRESS@m
METAD
scope Y
Identifiers META fragment identifiers identifier 1 Though the use of URIs and promoting the practice of persistent universal URLs provides a valuable built-in facility for universally unique identifiers for HTML, another practice — dependent on search indexing UAs — might be to use specific UUID or other universally unique and persistent IDs. Whereas with persistent URLs, care must be taken to keep the URL identifier unchanged, with UUID identifiers it is the opposite. There care must be taken to ensure the UUID is changed when copies are made or an identified document serves as a template for a newly identified document. Therefore, together the two techniques might serve as a check against one another.
Dates
Published META METAD
TIME
D (datum)
date.published Y
Created date.created 1 the content creation date may often differ from the filesystem metadata creation date
Modified date.lastmodified 1 the content last modified date may often differ from the filesystem metadata modified date
Addressable Entities
Creators META
LINK
ADDRESS
METAD
PN
creator Y
Contributors contributor Y
Publishers publisher 1

Discussion and evaluation

Email

WG members should post feedback and other discussion to the WG’s list serve (the URI for the links below provides date information). Search on this email subject.

Original thread

A thread begun by Jens Meiert originaly raised this issue.

See also