This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25104 - The RelaxNG schema should recognize more encoding values for the <annotation-xml> element
Summary: The RelaxNG schema should recognize more encoding values for the <annotation-...
Status: NEW
Alias: None
Product: HTML Checker
Classification: Unclassified
Component: General (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Michael[tm] Smith
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-20 10:56 UTC by fred.wang
Modified: 2015-08-23 06:58 UTC (History)
2 users (show)

See Also:


Attachments

Description fred.wang 2014-03-20 10:56:56 UTC
The HTML5 RelaxNG schema currently allows the following values for the encoding attribute of the <semantics> element:

HTML => string "application/xhtml+xml" | string "text/html"
SVG => string "SVG1.1"
MathML => string "MathML" | string "MathML-Content" | string "MathML-Presentation"

MathML3 suggests to use the MathML/SVG MIME types as encoding values and to keep the old values for backwards compatibility:

http://www.w3.org/TR/MathML3/chapter6.html
http://www.w3.org/TR/MathML3/chapter6.html#encoding-names
http://www.w3.org/TR/MathML3/chapter6.html#interf.graphics

Gecko & WebKit recognizes these MathML/SVG MIME types and I think they should be allowed by the RelaxNG schema too.
Comment 1 Michael[tm] Smith 2014-03-21 02:56:33 UTC
I'm happy to change this as long as it actually conforms to the MathML spec. I'll read up from the links you provided.
Comment 2 Michael[tm] Smith 2014-03-21 03:01:56 UTC
Cc'ing David Carlisle.

David, as far as I know we're using this part of upstream MathML schema as-is–without changes–so if what Fred says is correct, this seems like a change that you should also make to the upstream schema.
Comment 3 fred.wang 2014-03-21 05:33:37 UTC
(In reply to Michael[tm] Smith from comment #2)
> Cc'ing David Carlisle.
> 
> David, as far as I know we're using this part of upstream MathML schema
> as-is–without changes–so if what Fred says is correct, this seems like a
> change that you should also make to the upstream schema.

Sorry, I forgot to mention the RelaxNG links. As I understand, the upstream schema accepts arbitrary content for semantics:

http://www.w3.org/Math/RelaxNG/mathml3/mathml3-common.rnc

while the HTML5 one has been restricted to accepts only SVG/MathML/HTML:

https://bitbucket.org/validator/validator/src/5ee4172d2929787d5b78519c2035e62b503eee6c/schema/mml3/mathml3-common.rnc?at=default#cl-64
https://bitbucket.org/validator/validator/src/5ee4172d2929787d5b78519c2035e62b503eee6c/schema/xhtml5-svg-mathml.rnc?at=default#cl-31
Comment 4 Michael[tm] Smith 2014-03-21 05:48:11 UTC
(In reply to fred.wang from comment #3)
> Sorry, I forgot to mention the RelaxNG links. As I understand, the upstream
> schema accepts arbitrary content for semantics:
> 
> http://www.w3.org/Math/RelaxNG/mathml3/mathml3-common.rnc

The schema we use for the validator allows exactly the same content and attributes for the <semantics> element which the above file from the upstream schema does. I made no change at all to <semantics> as far as I can tell.

> while the HTML5 one has been restricted to accepts only SVG/MathML/HTML:
> 
> https://bitbucket.org/validator/validator/src/
> 5ee4172d2929787d5b78519c2035e62b503eee6c/schema/mml3/mathml3-common.
> rnc?at=default#cl-64
> https://bitbucket.org/validator/validator/src/
> 5ee4172d2929787d5b78519c2035e62b503eee6c/schema/xhtml5-svg-mathml.
> rnc?at=default#cl-31

Those links are for changes not to "the encoding attribute of the <semantics> element" (what you mention in the Description for this issue) but instead for 
changes to the <annotation-xml> element.

Is this bug about <annotation-xml> or about <semantics>?

At this point it's unclear to me what change you're asking for here.
Comment 5 fred.wang 2014-03-21 06:10:21 UTC
> Is this bug about <annotation-xml> or about <semantics>?
> At this point it's unclear to me what change you're asking for here.

It's about <annotation-xml>, sorry (<semantics> does not have encoding attribute AFAIK). Of course <annotation-xml>'s are used ad children of <semantics>, thus the confusion.
Comment 6 Michael[tm] Smith 2014-03-21 06:17:47 UTC
(In reply to fred.wang from comment #5)
> > Is this bug about <annotation-xml> or about <semantics>?
> > At this point it's unclear to me what change you're asking for here.
> 
> It's about <annotation-xml>, sorry (<semantics> does not have encoding
> attribute AFAIK). Of course <annotation-xml>'s are used ad children of
> <semantics>, thus the confusion.

OK, got it. I'll wait for David to weigh in, because I really still don't understand the use cases for <annotation-xml> enough to know myself and would like to get his take on what he thinks is the best thing to do here.
Comment 7 David Carlisle 2014-03-21 09:36:33 UTC
If I'd ruled the world things would have been different:-)

The situation is that in the core MathML RelaxNG schema annotation-xml allows arbitrary content and the encoding attribute takes arbitrary values.

attribute encoding {xsd:string}?

In application/xhtml+xml parsing there is an argument that says that this should be as above, however I think there is also the argument that the HTML+MathML+SVG schema should try to steer people towards HTML/XHTML compatibility by default.

In text/html parsing things are as usual rather more murky.

annotation-xml doesn't really take arbitrary content it's either parsed as html (encoding=text/html or application/xhtml+xml) or as MathML (any other encoding).
Namespaces in the content are as usual mangled/ignored.


This means for example that if you want to put SVG in the annotation-xml you need to use encoding application/xhtml+xml because then the html parser sees <svg> and automatically puts things back in foreign content in svg namespace and things work. If you put any SVG-related value for the encoding the elements will be parsed as unknown elements in the MathML namespace. 

So... In an ideal world I'd make the HTML parsing more sensible, but until or unless there is a proposal to make things work in HTML parsing I wouldn't say that the validator is wrong to warn about any encoding other than the HTML or MathML ones. That doesn't mean that the browsers shouldn't accept whatever makes sense of course.
Comment 8 David Carlisle 2014-03-21 09:51:04 UTC
(In reply to David Carlisle from comment #7)

oops.
> 
> In text/html parsing things are as usual rather more murky.

I got that part right:-)

> 
> This means for example that if you want to put SVG in the annotation-xml you
> need to use encoding application/xhtml+xml because then the html parser sees
> <svg> and automatically puts things back in foreign content in svg namespace
> and things work. If you put any SVG-related value for the encoding the
> elements will be parsed as unknown elements in the MathML namespace. 


I got that part wrong: <svg> is also recognised as svg in MathML (foreign content parsing)

However it is still basically true that in text/html there are only really two useful values for encoding, text/html and application/xhtml+xml. Any other value has the same effect as not having the encoding attribute at all.
Comment 9 Michael[tm] Smith 2014-03-24 17:56:08 UTC
(In reply to fred.wang from comment #0)
> The HTML5 RelaxNG schema currently allows the following values for the
> encoding attribute of the <semantics> element:
> 
> HTML => string "application/xhtml+xml" | string "text/html"

(In reply to David Carlisle from comment #8)
> However it is still basically true that in text/html there are only really
> two useful values for encoding, text/html and application/xhtml+xml. Any
> other value has the same effect as not having the encoding attribute at all.

So yeah the HTML parser depends on looking for those specific values for the "encoding" attribute.

http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#html-integration-point

That part of the HTML spec says:

— A node is an HTML integration point if it is one of the following elements:

— An annotation-xml element in the MathML namespace whose start tag token had an attribute with the name "encoding" whose value was an ASCII case-insensitive match for the string "text/html"
— An annotation-xml element in the MathML namespace whose start tag token had an attribute with the name "encoding" whose value was an ASCII case-insensitive match for the string "application/xhtml+xml"

Then there are other parts of the spec that say what to do when you an "HTML integration point" is encountered; e.g.:

  http://www.whatwg.org/specs/web-apps/current-work/multipage/tree-construction.html#parsing-main-inforeign

—Pop an element from the stack of open elements, and then keep popping more elements from the stack of open elements until the current node is a MathML text integration point, an HTML integration point, or an element in the HTML namespace.

So as far as I can see, the validator is aligned with the HTML spec here. If you want something other than encoding=application/xhtml+xml or encoding=text/html to be supported then you need to file a bug against the HTML spec.