Bug 15304 - Nested <META> tags in <HEAD>
Nested <META> tags in <HEAD>
Status: RESOLVED WONTFIX
Product: HTML WG
Classification: Unclassified
Component: HTML Microdata (editor: Ian Hickson)
unspecified
PC All
: P2 normal
: ---
Assigned To: Ian 'Hixie' Hickson
HTML WG Bugzilla archive list
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2011-12-21 21:57 UTC by Evan Sandhaus
Modified: 2012-01-03 09:34 UTC (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Evan Sandhaus 2011-12-21 21:57:36 UTC
I would like to suggest that the WG consider adding support for nested <META> tags in the <HEAD> element to facilitate the embedding of complex microdata objects.  

I am currently advising a number of teams on the implementation of HTML Microdata Schema.org markup and the inability to nest <META> tags in the <HEAD> has lead to some challenges.

An example:

Suppose you have a document to which you have added an 'itemscope' to the <HTML> element.  Lets further suppose that you want to add some non-visible metadata to the <head> element, say the word count.  To do this you'd code up something like:

<html itemscope itemtype='http://schema.org/NewsArticle">
<head>
	<meta itemprop="wordCount" content="1138"/>
	...

So far so good, but now suppose we want to add another bit of non-visible metadata to the <head>, but this time we want to add a 'Person' object.  

This is where the problem comes in.  Inserting a 'Person' object requires that we nest elements and it isn't legal to nest <META> tags.   Nor can you nest any of the other elements  legal in the scope of <HEAD> (i.e. <TITLE>, <BASE>, <LINK> and <STYLE>).  

If it were legal to nest <META> tags, I could write something like this:

<html itemscope itemtype='http://schema.org/NewsArticle">
<head>
	<meta itemprop="wordCount" content="1138"/>
	<meta itemscope itemtype="http://schema.org/Person">
		<meta itemprop="name" content="Evan Sandhaus"/>
	</meta>
       ....
</head>
	...
</html>

So this is why I'd love to see support for nested <META> tags in the <HEAD> element.

I've posted this idea to the <public-vocabs@w3.org> and a couple of contributors raised questions that I'd like to quickly address.

1)  Why not place these complex objects inside of an invisible element (such as a <DIV style='display:none'> or simply an empty <DIV>) in the <BODY> element?

My concern with this approach is twofold.  

First, the HTML 5 Spec defines the <HEAD> element as "a collection of metadata for the Document" whereas <BODY> element is defined as "the main content of the document."   Based on these definitions, a bit of descriptive metadata, such as the person a document is about, seems far more appropriate for the <HEAD> than the <BODY>.   

Secondly, I am concerned that placing non-visible content in the <BODY> may be interpreted as "cloaking" by search providers.  In fact, the documentation at schema.org says the following of this approach:  "This technique should be used sparingly. Only use meta with content for information that cannot otherwise be marked up." [http://schema.org/docs/gs.html]

2)  Why not use a flat <META> tag structure and use the itemref attribute to link the appropriate <META> tags to one another?

My concern with this approach is that it will likely be confusing to many potential implementors.  Nested <META> tags, however, offer what seem to me a far more intuitive approach.

Thank you for considering this feature request,

Evan
--
Evan Sandhaus
Lead Architect, Semantic Platforms
The New York Times Company
Comment 1 Tab Atkins Jr. 2011-12-22 02:03:13 UTC
Unfortunately, this cannot work.  <meta> elements are already void elements, which means they don't have an end tag.  In other words, "<meta>" is just fine; you don't need the closing slash like XML requires.  Thus, if you see "<meta><meta/></meta>", it's already valid code and represents two <meta> elements and a third unknown element with a tagname of "/meta".

The correct solution to this is the two possibilities already suggested - either use some <div>s in the body with no content, or use @itemref to establish the scope/prop relationship manually.
Comment 2 Evan Sandhaus 2011-12-22 02:55:43 UTC
Tab, - I 100% understand the void element issue.  Nested <META> tags are a non-starter.

There is one other alternative that I see to the two that I mentioned.  

In HTML 4 it was legal to have an <OBJECT> element in the scope of <HEAD>.  This has changed as of HTML 5.  Such a nesting is no longer permitted.   Is the door on that decision completely closed?  Were <OBJECT> still allowed in this context then one could write the following:

<html itemscope itemtype='http://schema.org/NewsArticle">
<head>
    <meta itemprop="wordCount" content="1138"/>
    <object itemscope itemtype="http://schema.org/Person" type='text/html'>
        <meta itemprop="name" content="Evan Sandhaus">
    </object>
       ....
</head>
    ...
</html>

A little less elegant than nested <META> tags, but still legible and cloak free.
Comment 3 Anne 2011-12-22 09:42:37 UTC
<object> implies <body> so we cannot do that either. Microdata is inspired by microformats though and the main idea behind is that the actual data is visible. If you hide it all it is likely to go out of date fast because nobody notices it is wrong.
Comment 4 Leonard Rosenthol 2011-12-22 12:10:52 UTC
Your example is one of what I would traditionally called "document level metadata" - semantic information about the document as a whole that is not directly connected to any specific content.  For example, the "Document Information" in PDF, OOXML or ODF or the <metadata> tag of EPUB.

Looking at those, I think this gives you two additional options - though either would require changes to the HTML5 language.

1 - Make the metadata tag of EPUB part of HTML5.  That would be align the two standards even closer, and mean that EPUB has one less "special addition" from the core HTML5 specification.

2 - Define a way to embed and/or reference XMP (ISO 16684-1), the industry standard XML/RDF-based metadata scheme used in PDF, JPEG, PNG, etc.
Comment 5 Marat Tanalin | tanalin.com 2011-12-22 12:35:46 UTC
100% backwards-compatible way:

<script type="meta" itemscope itemtype="http://schema.org/Person">
	<meta name="name" content="Some name" />
	<meta name="foo"  content="bar" />
</script>
Comment 6 Tab Atkins Jr. 2011-12-22 16:04:04 UTC
(In reply to comment #5)
> 100% backwards-compatible way:
> 
> <script type="meta" itemscope itemtype="http://schema.org/Person">
>     <meta name="name" content="Some name" />
>     <meta name="foo"  content="bar" />
> </script>

No, that will not work at all.  In conforming UAs, there will be a Microdata item with no properties, because the contents of a <script> are *not* parsed as elements.
Comment 7 Marat Tanalin | tanalin.com 2011-12-22 16:27:29 UTC
(In reply to comment #6)

Since the HTML5 spec development is still in progress, Microdata parsing rules can be changed if needed. Backward compatibility is related solely to pre-HTML5 things here.

My point here is just that in fact we can (actually forced to) use SCRIPT or STYLE elements as only containers inside HEAD which contents are reliably will be hidden and not "transparently" (quite insane and harmful algorithm by the way) moved by browser to BODY element -- in abstraction from Microdata or anything else.
Comment 8 Evan Sandhaus 2011-12-22 22:40:18 UTC
Thanks everybody for the great feedback!  I'd like to address a few of your points, and then suggest a possible approach for moving forwards.

---
Response to feedback
---

>Microdata is inspired by microformats though and the main idea behind is that the actual data is visible.

Although I realize Microformats and Microdata were conceived primarily as a mechanism for attaching semantic meaning to visible information, my experience in the publishing industry has convinced me that it is often desirable and perhaps even necessary to embed non-visible metadata.  

For example, 'wordCount' is a very useful property.  With it, one could  limit one's search to articles that are greater or less than a specified number of words.  Including this a visible metadata, however, presents a user interface design challenge.  I think it would be unnatural for every article on a news website to include something along the lines of "this article contains 1,138 words."  And because many publishers paginate articles across multiple HTML documents, this value cannot be reliably inferred from analysis of the body.

So, sadly, it is sometimes necessary for online publishers to include non-visible metadata.

> If you hide it all it is likely to go out of date fast because nobody
notices it is wrong.

Certainly a reasonable concern, but in my experience, publishers tend to render most page components from a production database that is reliably updated.  And most of the fields in such databases do get used for some production purpose. For instance, every article on nytimes.com is catalogued as being about certain people, places, organizations and descriptors.  Although this data is not used for much on the individual article pages, it is used to power numerous features on nytimes.com and is assiduously maintained by our production staff.  We currently include this data in our meta tags on every article page, and although it is not visible it is carefully maintained. 

> Make the metadata tag of EPUB part of HTML5.

This is a fascinating idea, and certainly worth exploring, but I'm not clear on how this would solves this problem.  My (very) cursory exploration of this tag suggests that it comes with its own data model and child attributes.  Am I right about this or would it be possible to use this tag to include arbitary metadata?

> Define a way to embed and/or reference XMP (ISO 16684-1), the industry
standard XML/RDF-based metadata scheme used in PDF, JPEG, PNG, etc.

The International Press Telecommunications Council (IPTC), to which I am a The NYT delegate, is very invested in XMP and supports its ongoing development.  However, I am concerned, that since XMP already comes with pre existing schemata for expressing various types of metadata, it may lack the flexibility to be a general framework for nesting complex HTML 5 Microdata objects in the <HEAD> element.

---
Proposed approach to moving forward:
---

In my work as an NYT Software Architect and as delegate to The IPTC, I have come to believe that it is necessary for organizations implementing HTML 5 Microdata to have the ability to nest complex objects in the <HEAD> element.  The HTML 5 proposal as its stands does not seem to allow for this.

A solution to this problem would be to create or alter a tag such that it can fill the role defined below as [nestable].

<html>
	<head>
		...
		<[nestable] itemscope>
			<meta itemprop='foo' content='bar'/>
		</[nestable]>
		...
	</head>
	...
</html>

We have ruled out <META> for this role because it is a void element.

There have been objections to <SCRIPT> and <STYLE> because the contents of these elements are generally not treated as parseable HTML markup.

There have been objections to the <OBJECT> element because it implies <BODY>. 

So where does that leave us?

There seem two ways of moving forward, either (1) we introduce a new element that satisfies the above definition of [nestable] or (2) we reexamine the limitations on <OBJECT> element and once again allow it in the <HEAD> (as it was in HTML 4).  

My specific preference is for the second approach, but my much stronger more general preference would be that the HTML 5 specification to allow for the embedding of arbitrarily complex Microdata in the <HEAD> element.
Comment 9 Tab Atkins Jr. 2011-12-22 22:48:05 UTC
(In reply to comment #8)
> There seem two ways of moving forward, either (1) we introduce a new element
> that satisfies the above definition of [nestable] 

Unfortunately, it's *extremely* painful to introduce new elements into <head>, due to backwards compatibility issues.  Unknown elements automatically close the <head> and imply <body>.  Unless there's an *extremely* good reason why we need a new element in <head>, this isn't a fruitful direction.

> or (2) we reexamine the
> limitations on <OBJECT> element and once again allow it in the <HEAD> (as it
> was in HTML 4).  
> 
> My specific preference is for the second approach, but my much stronger more
> general preference would be that the HTML 5 specification to allow for the
> embedding of arbitrarily complex Microdata in the <HEAD> element.

The decision to make <object> imply <body> was made based on backwards compat as well, and by now every modern browser has adjusted to match this (if they weren't already doing so).  Again, unless there's a very good reason to allow this, it's not a fruitful direction.

You omitted the other two ways of moving forward:

(3) Use @itemref to manually establish the scope/prop linkage across sibling <meta> elements in the <head>

(4) Use <div>s without any content in the <body> to carry the Microdata.

Both of these work today and will continue to work in the future.  Why are neither of these acceptable?
Comment 10 Marat Tanalin | tanalin.com 2011-12-22 22:59:04 UTC
(In reply to comment #9)

> (3) Use @itemref to manually establish the scope/prop linkage across sibling
> <meta> elements in the <head>

Such code would probably be too dirty/littered.

> (4) Use <div>s without any content in the <body> to carry the Microdata.

Empty meta-DIV could undesirably consume styles that are actually for another (nonmeta) DIV. This could result in unneeded complication of styles.
Comment 11 Leonard Rosenthol 2011-12-22 23:06:12 UTC
What about the work being done to enable XML embedding inside of HTML, with access from JavaScript - <http://www.w3.org/2010/html-xml/snapshot/#uc04>? 

That would enable rich, nestable metadata - thought not necessary using Microdata syntax.
Comment 12 Evan Sandhaus 2011-12-22 23:08:57 UTC
Tab, excellent questions and I should have reiterated the concerns I raised about these two approaches in my initial comment.

> (3) Use @itemref to manually establish the scope/prop linkage across sibling
<meta> elements in the <head>

My concern with this approach is that it will be confusing to potential implementors.  I am lucky to work with a bunch of super smart engineers who make me feel humble every day, but even they wrestle (albeit successfully) with the nuances of itemref.  Moreover, a several of the Microdata parsers currently available do not support / properly parse the @itemref attribute.  This further complicates development.

> (4) Use <div>s without any content in the <body> to carry the Microdata.

My concern with this approach is twofold.  

First, the HTML 5 Spec defines the <HEAD> element as "a collection of metadata
for the Document" whereas <BODY> element is defined as "the main content of the
document."   Based on these definitions, a bit of descriptive metadata, such as
the person a document is about, seems far more appropriate for the <HEAD> than
the <BODY>.   

Secondly, I am concerned that placing non-visible content in the <BODY> may be interpreted as "cloaking" by search providers.  Should this happen, publishers might actually opt not to implement microdata for feat that it would harm their search standings.  After all, the documentation at schema.org says the following of this approach:  "This technique should be used sparingly. Only use meta with content for information that cannot otherwise be marked up." [http://schema.org/docs/gs.html]

...

I think microdata is one of the most exciting aspects of the entire HTML 5 effort and I have done my best to be a driving force for its adoption in the news industry, however, the inability to (legally) nest complex metadata in the <HEAD> really is an obstacle to adoption.
Comment 13 Tab Atkins Jr. 2011-12-22 23:40:16 UTC
(In reply to comment #12)
> Tab, excellent questions and I should have reiterated the concerns I raised
> about these two approaches in my initial comment.
> 
> > (3) Use @itemref to manually establish the scope/prop linkage across sibling
> <meta> elements in the <head>
> 
> My concern with this approach is that it will be confusing to potential
> implementors.  I am lucky to work with a bunch of super smart engineers who
> make me feel humble every day, but even they wrestle (albeit successfully) with
> the nuances of itemref.  Moreover, a several of the Microdata parsers currently
> available do not support / properly parse the @itemref attribute.  This further
> complicates development.

I agree that it's potentially confusing.  If we had a nestable element in <head>, it would be much better.  Them's the breaks.


> > (4) Use <div>s without any content in the <body> to carry the Microdata.
> 
> My concern with this approach is twofold.  
> 
> First, the HTML 5 Spec defines the <HEAD> element as "a collection of metadata
> for the Document" whereas <BODY> element is defined as "the main content of the
> document."   Based on these definitions, a bit of descriptive metadata, such as
> the person a document is about, seems far more appropriate for the <HEAD> than
> the <BODY>.   

Don't worry about that.  Yes, that's the general principle behind the head/body division.  But it's not something you need to actually care about when writing your page.


> Secondly, I am concerned that placing non-visible content in the <BODY> may be
> interpreted as "cloaking" by search providers.  Should this happen, publishers
> might actually opt not to implement microdata for feat that it would harm their
> search standings.  After all, the documentation at schema.org says the
> following of this approach:  "This technique should be used sparingly. Only use
> meta with content for information that cannot otherwise be marked up."
> [http://schema.org/docs/gs.html]
> 
> ...
> 
> I think microdata is one of the most exciting aspects of the entire HTML 5
> effort and I have done my best to be a driving force for its adoption in the
> news industry, however, the inability to (legally) nest complex metadata in the
> <HEAD> really is an obstacle to adoption.

There's no invisible content here - you're not doing things like embedding a <div style="display:none;">ALL THE SPAM KEYWORDS</div> in your page.  You've just got some empty <div>s carrying some Microdata - the effect is identical to if you'd done the thing with <meta>.
Comment 14 Leonard Rosenthol 2011-12-23 00:29:36 UTC
But putting the metadata in the <body> means that a "spider", that is only concerned with document-level metadata (such as the WordCount example) will have to parse the ENTIRE PAGE to find it - rather than being able to just look in the <head>.
Comment 15 Tab Atkins Jr. 2011-12-23 00:33:09 UTC
(In reply to comment #14)
> But putting the metadata in the <body> means that a "spider", that is only
> concerned with document-level metadata (such as the WordCount example) will
> have to parse the ENTIRE PAGE to find it - rather than being able to just look
> in the <head>.

Yes.  That's not a big deal in general, because such data might very well be in the body.  For example, there may be a an @itemref on a <meta> that pointed at some element in the body.

If this is a specialized spider, such that you control both the spider and the documents, you could always put in a requirement that the Microdata-hosting <div>s be the first thing in the <body>, so the spider can stop as soon as it sees any *other* body-level element.
Comment 16 Henri Sivonen 2012-01-03 09:34:34 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the tracker issue; or you may create a tracker issue
yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Making meta elements non-void would be incompatible with existing content.