Bug 15936 - HTML+RDFa promotes DTD-based validation
HTML+RDFa promotes DTD-based validation
Status: RESOLVED FIXED
Product: HTML WG
Classification: Unclassified
Component: HTML+RDFa (editor: Manu Sporny)
unspecified
All All
: P2 normal
: ---
Assigned To: Manu Sporny
public-rdfa-wg
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-02-08 13:03 UTC by Henri Sivonen
Modified: 2013-01-26 18:30 UTC (History)
9 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Henri Sivonen 2012-02-08 13:03:04 UTC
http://dev.w3.org/html5/rdfa/#validation says:

- -
Documents written using the markup language defined in this specification may be validated using the DTDs defined in this section. If a document author wants to facilitate such validation, they may include the following declaration at the top of their document for HTML 4.01 + RDFa 1.1:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01+RDFa 1.1//EN"
   "http://www.w3.org/MarkUp/DTD/html401-rdfa11-1.dtd">

The following declaration may be included at the top of their document for HTML 4.01 + RDFa Lite 1.1:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01+RDFa Lite 1.1//EN"
   "http://www.w3.org/MarkUp/DTD/html401-rdfalite11-1.dtd">
- -

And the doctypes are styled to be labeled as examples.

This is all sorts of wrong.

 1) A very creative interpretation of the HTML WG charter is required to consider HTML 4.01+something to be in scope.

 2) This encourages DTD-based validation when DTDs are known to be obsolete and inadequate technology for the purpose of validating HTML or RDFa. It's irresponsible and counter-productive to promote known-obsolete and known-unsuitable-for-purpose technology.

 3) It assumes the use of an SGML parser when the group's charter says explicitly: "the Group will not assume that an SGML parser is used for 'classic HTML'"

 4) It encourages the use of novel doctypes when the WG has been working to retire doctypes as anything except for the purpose of triggering the standards mode in browsers.

 5) Normative "may" statements says that doctypes can be used, but there's no normative text saying what the doctypes are, since the doctypes are marked as examples and http://dev.w3.org/html5/rdfa/#conformance says examples are non-normative.

Please remove this section "A. Validation", its subsection and all references to DTDs from HTML+RDFa.
Comment 1 Manu Sporny 2012-03-11 01:01:02 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the tracker issue; or you may create a tracker issue
yourself, if you are able to do so. For more details, see this document:

http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Not a Bug. Possible Need to Examine HTML WG Charter and 
HTML+RDFa interplay by HTML WG Chairs.

Technical Description: 

The submitter has asked for the removal of section "A. Validation", its 
subsection, and all references to DTDs from HTML+RDFa. This is due to the 
move away from DTD-based validation in HTML5 and because the submitter 
believes that any work on extending HTML4 is not in scope in the 
HTML WG charter.

Rationale:

The submitter made a number of assertions in the bug report, a response 
is provided for each one below:

> And the doctypes are styled to be labeled as examples.

I'm assuming that markup as an example is implying that the markup is 
non-normative. This is a spec bug and I will fix it in the next release.

> 1) A very creative interpretation of the HTML WG charter is required to consider 
> HTML 4.01+something to be in scope.

The HTML WG charter says the following: 

"This group will maintain and produce incremental revisions to the HTML 
specification, which includes the series of specifications previously 
published as XHTML version 1. Both XML and 'classic HTML' syntaxes will 
be produced."

"The Group will define conformance and parsing requirements for 
'classic HTML', taking into account legacy implementations"

"The HTML WG is encouraged to provide a mechanism to permit independently 
developed vocabularies such as Internationalization Tag Set (ITS), Ruby, 
and RDFa to be mixed into HTML documents. Whether this occurs through the 
extensibility mechanism of XML, whether it is also allowed in the classic 
HTML serialization, and whether it uses the DTD and Schema modularization 
techniques, is for the HTML WG to determine."

Nothing in there bans "an incremental revision ot the HTML specification" that 
defines "conformance and parsing requirements for 'classic HTML', that is
"allowed in the classic HTML serialization" that "uses the DTD technique".

I assert that HTML4+RDFa is in scope. I do this not because I think HTML4+RDFa
is the future of the HTML language, but because there is evidence that some
Web authors do not intend to migrate to HTML5 at any point in the near future
and the RDF Web Apps WG would like to offer an RDFa solution to these authors.

>  2) This encourages DTD-based validation when DTDs are known to be obsolete and
> inadequate technology for the purpose of validating HTML or RDFa. It's
> irresponsible and counter-productive to promote known-obsolete and
> known-unsuitable-for-purpose technology.

Take a look at this thread:

http://www.reddit.com/r/web_design/comments/kh5mi/so_im_still_designing_sites_with_html_4_and_css_2/

In it, you will find a number of Web Authors working against their best 
interests by scoffing at HTML5 and just sticking with HTML4. That thread is
only 5 months old. While all of us would like to see everyone move as
quickly to HTML4 as possible... there are going to be IT departments that
continue to publish HTML4 for a very long time to come. Many of these in
government, which moves at a glacial pace.  Some of these authors 
might want to use RDFa to achieve a better SERP in the search engines.

So, while I do agree that DTD-based validation is inadequate, these authors
have no other solution that is being presented to them other than 
upgrade to HTML5, which the thread above demonstrates that a non-trivial
portion of them do not intend to do at any point in the near future.

I don't think merely providing a DTD encourages DTD-based validation. We are
moving away from it as a Web community. It will eventually die, but until
that happens, the RDF Web Apps WG would like to provide /something/ that
folks that want to stick with HTML4 can use to validate the RDFa in their
HTML4 documents.

>  3) It assumes the use of an SGML parser when the group's charter says
> explicitly: "the Group will not assume that an SGML parser is used for 'classic
> HTML'"

RDFa in HTML doesn't assume an SGML parser as the sole solution as HTML5+RDFa and 
HTML5+RDFa Lite use the new validation techniques. We could even add something
for HTML4+RDFa to validator.nu if that is what you would prefer. In fact,
I would be happy to see an HTML4+RDFa validator in validator.nu if that is
what you would prefer... at that point we could drop the HTML4+RDFa 1.1 DTDs.
However, what is not acceptable is dropping HTML4+RDFa 1.1 entirely. Thoughts?

>  4) It encourages the use of novel doctypes when the WG has been working to
> retire doctypes as anything except for the purpose of triggering the standards
> mode in browsers.

Again, happy to retire the DTD if HTML4+RDFa 1.1 continues to exist as something
that can be validated using a W3C tool.

>  5) Normative "may" statements says that doctypes can be used, but there's no
> normative text saying what the doctypes are, since the doctypes are marked as
> examples and http://dev.w3.org/html5/rdfa/#conformance says examples are
> non-normative.

An oversight, this will be fixed in the next revision.

> Please remove this section "A. Validation", its subsection and all references
> to DTDs from HTML+RDFa.

Only if a separate mechanism exists for validating HTML4+RDFa 1.1 documents.
Henri, since you're the expert here - would we be able to add this validation
mechanism to the validator.nu validator at W3C? (I'm not asking you to do 
the work as I think that Mike Smith has already done the majority of this 
work for HTML5+RDFa).

So, the current proposal is this: HTML4+RDFa 1.1 remains in the HTML+RDFa 
specification as it is in scope per the charter. We will need a ruling by the
Chairs of the HTML WG to assert anything to the contrary. We can remove the
HTML4+RDFa 1.1 DTD only if there is another currently valid mechanism
for validating HTML4+RDFa 1.1 documents. An addition to the validator.nu
validator at W3C would be acceptable.

What are your thoughts on this proposal, Henri?
Comment 2 Kang-Hao (Kenny) Lu 2012-03-11 02:43:31 UTC
(In reply to comment #1)
> > Please remove this section "A. Validation", its subsection and all references
> > to DTDs from HTML+RDFa.
> 
> Only if a separate mechanism exists for validating HTML4+RDFa 1.1 documents.

Such mechanism exists: the validation infrastructure in HTML5 (aka. content model, content attribute, global attributes, etc.). My guess is that no body ever had the interest nor the bandwidth to rewrite HTML4 DTD in these new concepts, but saying that it doesn't exist doesn't feel right.

In any case, my guess is that nobody would end up doing this so perhaps that section can keep as it is.

> Henri, since you're the expert here - would we be able to add this validation
> mechanism to the validator.nu validator at W3C? (I'm not asking you to do 
> the work as I think that Mike Smith has already done the majority of this 
> work for HTML5+RDFa).

(Not quite sure how would a feedback on a spec turn into an implementation request which trades for the removal of a section...)

> >  5) Normative "may" statements says that doctypes can be used, but there's no
> > normative text saying what the doctypes are, since the doctypes are marked as
> > examples and http://dev.w3.org/html5/rdfa/#conformance says examples are
> > non-normative.
> 
> An oversight, this will be fixed in the next revision.

To avoid confusion and to partially address Henri's concern, I think this section could say "HTML5 + RDFa 1.1" MUST NOT include these declarations. Or add "HTML 4.01 + RDFa 1.1" before 

  # Documents written using the markup language defined in this specification
  # MAY be validated using the DTDs defined in this section.

to make it clear.
Comment 3 Henri Sivonen 2012-03-15 12:25:40 UTC
It looks like you used the "EDITOR'S RESPONSE" template without setting the bug to RESOLVED with a resolution.
Comment 4 Henri Sivonen 2012-03-15 13:03:15 UTC
(In reply to comment #1)
> I assert that HTML4+RDFa is in scope.

I'd be interested in the Chairs' opinion on that.

> >  2) This encourages DTD-based validation when DTDs are known to be obsolete and
> > inadequate technology for the purpose of validating HTML or RDFa. It's
> > irresponsible and counter-productive to promote known-obsolete and
> > known-unsuitable-for-purpose technology.
> 
> Take a look at this thread:
> 
> http://www.reddit.com/r/web_design/comments/kh5mi/so_im_still_designing_sites_with_html_4_and_css_2/
> 
> In it, you will find a number of Web Authors working against their best 
> interests by scoffing at HTML5 and just sticking with HTML4.

I see many people saying they use HTML5 or telling the OP to do so. I'm trying to limit my http://xkcd.com/386/ activity to W3C specs--not Reddit.

> there are going to be IT departments that
> continue to publish HTML4 for a very long time to come. Many of these in
> government, which moves at a glacial pace.

No RDFa for them, then. It seems illogical to be OK with adding RDFa but not with adding the delta from HTML4 to 5.

> I don't think merely providing a DTD encourages DTD-based validation.

I disagree. And it obviously at least fails to properly discourage it.

> >  3) It assumes the use of an SGML parser when the group's charter says
> > explicitly: "the Group will not assume that an SGML parser is used for 'classic
> > HTML'"
> 
> RDFa in HTML doesn't assume an SGML parser as the sole solution as HTML5+RDFa
> and 
> HTML5+RDFa Lite use the new validation techniques. 

But it assumes an SGML parser be used as *a* validation solution. The charter says not to assume an SGML parser.

> In fact,
> I would be happy to see an HTML4+RDFa validator in validator.nu if that is
> what you would prefer... at that point we could drop the HTML4+RDFa 1.1 DTDs.
> However, what is not acceptable is dropping HTML4+RDFa 1.1 entirely. Thoughts?

I didn't expect spec comment handling to turn into Validator.nu feature bargaining.
 
> Henri, since you're the expert here - would we be able to add this validation
> mechanism to the validator.nu validator at W3C? (I'm not asking you to do 
> the work as I think that Mike Smith has already done the majority of this 
> work for HTML5+RDFa).

Probably you would.

> So, the current proposal is this: HTML4+RDFa 1.1 remains in the HTML+RDFa 
> specification as it is in scope per the charter. We will need a ruling by the
> Chairs of the HTML WG to assert anything to the contrary. We can remove the
> HTML4+RDFa 1.1 DTD only if there is another currently valid mechanism
> for validating HTML4+RDFa 1.1 documents. An addition to the validator.nu
> validator at W3C would be acceptable.
> 
> What are your thoughts on this proposal, Henri?

I'm not particularly happy about making an alternative implementation a prerequisite to removing the DTD from the spec.
Comment 5 Michael[tm] Smith 2012-03-15 13:37:04 UTC
I agree unequivocally with everything that Henri said in comment 4.
Comment 6 Michael[tm] Smith 2012-03-20 05:06:14 UTC
(In reply to comment #4)
> (In reply to comment #1)
> > In fact,
> > I would be happy to see an HTML4+RDFa validator in validator.nu if that is
> > what you would prefer... at that point we could drop the HTML4+RDFa 1.1 DTDs.
> > However, what is not acceptable is dropping HTML4+RDFa 1.1 entirely. Thoughts?
> > [...]
> > Henri, since you're the expert here - would we be able to add this validation
> > mechanism to the validator.nu validator at W3C? (I'm not asking you to do 
> > the work as I think that Mike Smith has already done the majority of this 
> > work for HTML5+RDFa).
> 
> Probably you would.

Actually I'm not interested in adding HTML4+RDFa support to the validator, any more than HTML4+SVG or HTML4 plus the <video> and <audio> elements, or backporting whatever other features to HTML4 validation. The HTML4 validation feature is there for existing legacy documents. Those existing legacy documents do not use RDFa attributes. We should be encouraging authors who are creating new documents or updating existing ones to use HTML5 instead, for many reasons, not encouraging them to label the new/updated documents with a legacy HTML4 document and expect that they can validate them as HTML4.
Comment 7 Karl Dubost 2012-03-26 20:52:07 UTC
(In reply to comment #6)
> We should be encouraging authors who are creating
> new documents or updating existing ones to use HTML5 instead, …

Agreed.
This triggered a (maybe bad idea), if we see markup which is HTML5 and/or RDFa maybe the validator could flag it and encourage people to move to HTML5.

"This document contains <a href="http://www.w3.org/TR/html5-author/">HTML5 features</a>, it would be better to use <!DOCTYPE html>"
Comment 8 Manu Sporny 2012-08-19 20:42:24 UTC
DTD-based validation has been removed from the HTML+RDFa spec:

http://lists.w3.org/Archives/Public/public-rdfa-wg/2012Aug/0007.html

Not closing the bug yet, as there might be backlash in the RDFa WG that we will need to work through.
Comment 9 Manu Sporny 2013-01-26 18:30:52 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the tracker issue; or you may create a tracker issue
yourself, if you are able to do so. For more details, see this document:

http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Fixed. DTD-based validation has been removed from the HTML+RDFa 1.1 specification. There has been no negative backlash in the RDFa WG.