Schematron for testing HTML 5 stricter content model

Hi,
For those of you interested in HTML 5 and/or HTML validation, here is
a short (beta) Schematron for testing the HTML 5 stricter content
model: 'some elements can now contain either "block level" or "inline
level" content, but not both'.

Those of you curious to see the proportion of XHTML 1.x documents
already compliant with this stricter content model can give it a try
(see also attachment).

If you need a crawler to test many pages, this Schematron is
integrated in my XHTML/XML validator
[http://alexandre.alapetite.net/distribution/weblide/] when choosing
the "expert" mode.

<?xml version="1.0" encoding="UTF-8"?>
<schema xmlns="http://www.ascc.net/xml/schematron">
	<ns prefix="html" uri="http://www.w3.org/1999/xhtml"/>
	<title>Schematron Schema for HTML 5 additional constraints</title>
	<p>
		This module is used to test some (X)HTML 5
		constraints on (X)HTML 4 documents.
		See [http://dev.w3.org/cvsweb/~checkout~/html5/html4-differences/Overview.html#stricter-content-models].
	</p>
	<pattern name="HTML 5 restrictions">
		<rule context="html:applet|html:blockquote|html:body|html:button|
		               html:center|html:dd|html:del|html:div|html:fieldset|
		               html:form|html:iframe|html:ins|html:li|html:noframe|
		               html:noscript|html:td|html:th"
		               role="WARNING">
			<report test="(html:address or html:blockquote or html:center
			               or html:dir or html:div or html:dl or html:fieldset
			               or html:form or html:h1 or html:h2 or html:h3 or html:h4
			               or html:h5 or html:h6 or html:hr or html:isindex
			               or html:menu or html:noframes or html:noscript or html:ol
			               or html:p or html:pre or html:table or html:ul)
			              and ((string-length(text()) > 0) or html:a or html:abbr
			               or html:acronym or html:applet or html:b or html:basefont
			               or html:bdo or html:big or html:br or html:button
			               or html:cite or html:code or html:dfn or html:em
			               or html:font or html:i or html:iframe or html:img
			               or html:input or html:kbd or html:label or html:map
			               or html:object or html:q or html:s or html:samp
			               or html:script or html:select or html:small or html:span
			               or html:strike or html:strong or html:sub or html:sup
			               or html:textarea or html:tt or html:u or html:var)">
				Compatibility HTML 5:
				There is both a
				<value-of select="name(html:*[self::html:address
				                  or self::html:blockquote
				                  or self::html:center or self::html:dir
				                  or self::html:div or self::html:dl
				                  or self::html:fieldset or self::html:form or self::html:h1
				                  or self::html:h2 or self::html:h3 or self::html:h4
				                  or self::html:h5 or self::html:h6 or self::html:hr
				                  or self::html:isindex or self::html:menu
				                  or self::html:noframes or self::html:noscript
				                  or self::html:ol or self::html:p or self::html:pre
				                  or self::html:table or self::html:ul][1])"/>
				and a
				<value-of select="substring(concat('#text',
				                   name(html:*[self::html:a or self::html:abbr
				                   or self::html:acronym or self::html:applet
				                   or self::html:b or self::html:basefont
				                   or self::html:bdo or self::html:big or self::html:br
				                   or self::html:button or self::html:cite
				                   or self::html:code or self::html:dfn
				                   or self::html:em or self::html:font or self::html:i
				                   or self::html:iframe or self::html:img
				                   or self::html:input or self::html:kbd
				                   or self::html:label or self::html:map
				                   or self::html:object or self::html:q or self::html:s
				                   or self::html:samp or self::html:script
				                   or self::html:select or self::html:small
				                   or self::html:span or self::html:strike
				                   or self::html:strong or self::html:sub
				                   or self::html:sup or self::html:textarea
				                   or self::html:tt or self::html:u
				                   or self::html:var][1])),
				                  1+5*number(string-length(text())=0),
				                  5+9*number(string-length(text())=0))"/>
				in the same <name/> element.
				A <name/> element can contain either "block level" (e.g. div, p)
				or "inline level" (e.g. #text, span, em) content, but not both.
			</report>
		</rule>
	</pattern>
</schema>

This Schematron can for sure be improved and extended... Comments are welcome.

Cordially,
Alexandre
http://alexandre.alapetite.fr

Received on Saturday, 16 June 2007 18:57:25 UTC