Warning:
This wiki has been archived and is now read-only.
MetadataContainerConfusion
Metadata container elements
Problem statement / use cases
- several elements in the HTML5 draft (aside, nav, header, footer) attempt to serve two separate but related roles: as containers spatially related to one another and as containers for certain metadata or supporting data, where the elements appear to be named for spatial components of a page, but then defined for semantic presentationally independent purposes
- the names of elements matter and the naming will undermine the reliable use of these elements for their intended purpose
- with the facilities available from CSS, providing generic containers for metadata is less useful than allowing authors to add semantic attributes to markup or use dedicated and precise semantic elements (e.g., ADDRESS, though again the name creates issues)
- some justification for providing these names is drawn from their frequent use as class or id values by authors, however those uses are for spatial containers and not reliably marked-up metadata
- the terms heading and header are often confused by authors and our use of header as another heading element contributes to that confusion
- if spatial containers or components are necessary for authors, we should simply provide them without overloading their meaning with specific metadata
- if further metadata markup is necessary for authors, we should simply provide that markup rather than trying to force such metadata to fit into elements with spatially oriented names
Further discussion
ASIDE
From the draft:
“The aside element represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content. Such sections are often represented as sidebars in printed typography.”
The aside element is for tangential or subordinate text, but the draft implies it is spatially intended for a sidebar. Why not simply have a “sidebar” element for a sidebar and provide a “subtext” element or “aside” for subordinate text.
NAV
From the draft: “The nav element represents a section of a page that links to other pages or to parts within the page: a section with navigation links. Not all groups of links on a page need to be in a nav element — only sections that consist of primary navigation blocks are appropriate for the nav element. In particular, it is common for footers to have a list of links to various key parts of a site, but the footer element is more appropriate in such cases.”
Note the advice at the end which suggests that if the navigation is at the bottom of the page, it is better to use the footer element. Together with the other advice in the draft, these elements appear to define four peripheral regions of a HTML document: a header at the top; a footer at the bottom; a navigation component on one side; and a sidebar component on the other side. There is nothing wrong with defining these regions or even providing dedicated component elements for authors to use. However, the problem is to provide them, but then blur the lines by trying to make these components sound like their semantic web type elements and not spatially-oriented components.
HEADER
From the draft: “The header element represents the header of a section. The element is typically used to group a set of h1–h6 elements to mark up a page's title with its subtitle or tagline. However, header elements may contain more than just the section's headings and subheadings — for example it would be reasonable for the header to include version history information.
“For the purposes of document summaries, outlines, and the like, header elements are equivalent to the highest ranked h1–h6 element descendant of the header element (the first such element if there are multiple elements with that rank).
“Other heading elements in the header element indicate subheadings or subtitles.
“The rank of a header element is the same as for an h1 element (the highest rank).
“The section on headings and sections defines how header elements are assigned to individual sections.”
Here, we should try to keep the concept of a header (as a runner appearing a the top of a page or as a component appearing at the top of a document) separate from a heading which is the title of an article or section. Certainly it might often make sense to place the headings in a header, but this is something better left for CSS. We could define a header as a component/container for the beginning of the body or the beginning of a section without saying anything about what belongs in side it (leaving that to the page designer).
FOOTER
“The footer element represents the footer for the section it applies to. A footer typically contains information about its section such as who wrote it, links to related documents, copyright data, and the like.
“Contact information for the section given in a footer should be marked up using the address element.
“Footers don't necessarily have to appear at the end of a section, though they usually do.”
Again, what is this element for if it is not to appear at the end of a section or document. Requiring authors to place specific metadata in the footer is probably not advisable, so then this becomes an solely an element whose name is footer that should usually go at the end. Why not just make it a footer element that, if it is used, must be used to end a section or body element. This makes its meaning more reliable.
Proposed solutions
To achieve better separation of concerns, divide the function of these elements in two: 1) spatial component elements and 2) metadata supporting markup.
Spatial components
With the introduction of a broader content model for paragraphs and also the introduction of article and section elements, the need for the DIV element is almost entirely reduced to an element to divide an HTML document into logical spatial components or divisions. We could further augment this by providing specific named divisions such as “header”, “footer”, “sidebar” and “navbar”, etc. Or we could continue to recommend authors follow the practice of assigning id values or class names to such DIV elements.
In addition, CSS is also working to address these spatial concerns through new CSS modules that give authors greater control of the spatial presentation of a document.
Metadata
For the other aspect of these components, we should try to provide more complete metadata capabilities within HTML. Some facilities are already in HTML, but we could round out those facilitates and provide a more thorough base for document metadata. Using the Dublin-Core recommendations as a point of departure, we can consider what might be needed to round out the metadata capacities of HTML. Note the use of Dublin-Core here is not to say we need to adopt anything about Dublin-Core, just that it is good authoritative standard for commonly valued metadata properties. Ideally, the metadata expressiveness of HTML should be complete enough to map abstractly to any of many metadata standards.
Dublin-core metadata already available
metadata property |
Document | Sections | Note | |
titles | TITLE | H1–H6 or H | titles are handled inherently in HTML through both the TITLE element and the H1–H6 elements. We might introduce a structural H element to allow authors to get away from the numbered heading elements. Similarly nested H elements would provide authors a way to structurally markup subheadings separate from headings | |
Identifiers | Document URL | fragment identifiers | The use of URIs and promoting the practice of persistent universal URLs provides a valuable built-in facility for identifiers for HTML | |
Languages | HTML@lang or HTML@xml:lang | lang or xml:lang | HTML already includes language markup through the lang and xml:lang attributes | |
Relations | LINK@href | @href, @cite, @attributeto | The LINK facilities for HTML provide a way of encoding relation metadata in documents | |
Formats | n/a | n/a | Typically “HTML” would constitute the format itself or it could be further determined by the DocType or the HTTP content-headers or operating environment type so providing additional metadata is typically unnecessary |
Dublin-core metadata gaps
In addition to those Dublin-Core properties already handled by existing HTML facilities, there is a need to provide a clear interoperable conformance norms for other common metadata properties. For example, there is already a common practice of including a meta element with the name “keywords” and comma-separated values for each keyword in the content attribute of a meta element. This could be enhanced by globalizing the ConsiderGlobalAttributesModules: metadata attributes and perhaps using a separate T (term) element for each keyword with metadata attributes attached. In this way the metadata can be provided either for the entire document or for specific sections or articles within the document.
The following table shows a Dublin-Core property in the left column followed by potential HTML document-wide element for specifying the property followed by potential in body method for specifying the property. After those first three columns follow a fourth column with a recommended property keyword for use in the META@name, @property or @rel attributes. The fifth column indicates how authors express multiple value-instances of the same property: 1) “Y” for one property per element; 2) “N” for all properties comma-delimited within the same element; and 3) “1” indicates the metadata property should typically only be included once in a document or for a particular section (though in exceptional cases, multiple instances should involve separate elements)
metadata property |
Document | Sections | keyword | one property per element |
Note | |
Sources | @cite | n/a | Y | The propose AttrtibuCitaQuotationReferencing: attribution and citation features provides thorough support for marking up Dublin-Core sources | ||
Subjects (keywords) | META | METAD ADDRESS@m T AnyElement |
keywords | N | ||
Descriptions | LINK META |
A ADDRESS@m METAD |
description | 1 | ||
Types | META | METAD ADDRESS@m T AnyElement |
types | N | ||
Rights | LINK META |
A ADDRESS@m METAD |
copyright license |
1 | Also the 'license' keyword for linking to relevant use grant licenses | |
Scopes | LINK META |
A ADDRESS@m METAD |
scope | Y | ||
Identifiers | META | fragment identifiers | identifier | 1 | Though the use of URIs and promoting the practice of persistent universal URLs provides a valuable built-in facility for universally unique identifiers for HTML, another practice — dependent on search indexing UAs — might be to use specific UUID or other universally unique and persistent IDs. Whereas with persistent URLs, care must be taken to keep the URL identifier unchanged, with UUID identifiers it is the opposite. There care must be taken to ensure the UUID is changed when copies are made or an identified document serves as a template for a newly identified document. Therefore, together the two techniques might serve as a check against one another. | |
Dates | ||||||
Published | META | METAD TIME D (datum) |
date.published | Y | ||
Created | date.created | 1 | the content creation date may often differ from the filesystem metadata creation date | |||
Modified | date.lastmodified | 1 | the content last modified date may often differ from the filesystem metadata modified date | |||
Addressable Entities | ||||||
Creators | META LINK |
ADDRESS METAD PN |
creator | Y | ||
Contributors | contributor | Y | ||||
Publishers | publisher | 1 |
Discussion and evaluation
WG members should post feedback and other discussion to the WG’s list serve (the URI for the links below provides date information). Search on this email subject.
Original thread
A thread begun by Jens Meiert originaly raised this issue.
- Initial Message regarding semantic problems (Jens Meiert)
- Supporting removal (Philip Taylor)
- Suggests presentational semantic (Rob Burns)
- No more presentational than thead or tfoot or h1 (Maciej Stachowiak)
- Suggests the names are presentational (Andrew Fedoniouk)
- 188 (Robert Burns)
- 196 (Jens Meiert)
- 197 (James Graham)
- 198 (Jens Meiert)
- Presentational names OK (James Graham)
- 203 (Robert Burns)
- 235 (Karl Dubost)
- 247 (Jens Meiert)
- 248 (Karl Dubost)
- 254 (Robert Burns)
- 407 (Jens Meiert)
- 410 (Robert Burns)
- 433 (Jens Meiert)
See also