25225 – No way to convert between XML and HTML documents

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25225 - No way to convert between XML and HTML documents

Summary: No way to convert between XML and HTML documents

Status:	ASSIGNED

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	DOM Parsing and Serialization (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	---
Assignee:	Travis Leithead [MSFT]
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-04-02 02:47 UTC by C. Scott Ananian
Modified:	2014-12-01 18:32 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description C. Scott Ananian 2014-04-02 02:47:11 UTC

outerHTML on an XML document is defined as an XML serialization, and outerHTML on an HTML document is defined as an HTML serialization.

You can use the XMLSerializer interface to get an XML serialization of an HTML document, but there is no equivalent way to get an HTML serialization of an XML document.

Additionally/alternatively, it would be nice if you could just clone an XML document from an HTML document and vice-versa, without having to go through a string serialization.

Comment 1 Simon Pieters 2014-04-02 09:48:43 UTC

What's the use case?

Comment 2 C. Scott Ananian 2014-04-02 15:59:20 UTC

See https://www.w3.org/Bugs/Public/show_bug.cgi?id=13410#c15

In general, there are XML and HTML serializations defined, and a way to get an XML serialization of an HTML document.  The fact that there's no way to get an HTML serialization of an HTML document seems to be a gap in the specification, and prevents good interoperability between the HTML and XML syntax.

Comment 3 Simon Pieters 2014-04-02 17:52:23 UTC

(In reply to C. Scott Ananian from comment #2)
> The fact that there's no way to
> get an HTML serialization of an [XML] document seems to be a gap in the
> specification,

Yeah, I agree.

> and prevents good interoperability between the HTML and XML
> syntax.

Not sure I follow this part.

Also, what's the use case for cloning documents (as suggested in comment 0)?

Comment 4 Simon Pieters 2014-04-02 17:59:05 UTC

As a workaround you can do something like this:
http://software.hixie.ch/utilities/js/live-dom-viewer/saved/2915

Comment 5 Travis Leithead [MSFT] 2014-10-13 23:43:33 UTC

(In reply to Simon Pieters from comment #4)
> As a workaround you can do something like this:
> http://software.hixie.ch/utilities/js/live-dom-viewer/saved/2915

I can't view the above site today, but I assume this workaround is related to creating an HTML document (from within an XML document); something like:
document.implementation.createHTMLDocument(), cloning the nodes to serialize as HTML into that document, and then invoking the innerHTML getter on an html container element...?

Regardless of the use cases, I wouldn't want to extend XMLSerializer to handle this case, as that's an abuse of the name :)

I suppose a new HTMLSerializer object would be the obvious corollary.

Any browser implementations interested in an HTMLSerializer interface? Seems kinda redundant on the web (dominated by text/html documents); though I don't know the use cases for this in an XML data pipeline.

Comment 6 C. Scott Ananian 2014-10-20 16:27:02 UTC

(In reply to Travis Leithead [MSFT] from comment #5)
> Any browser implementations interested in an HTMLSerializer interface? Seems
> kinda redundant on the web (dominated by text/html documents); though I
> don't know the use cases for this in an XML data pipeline.

As maintainer of a non-browser DOM implementation (domino) I'm certainly interested in a standard HTMLSerializer class.  Domino already implements this in an unofficial ad-hoc way as document.outerHTML but there are issues wrt whitespace preservation in the outer nodes. Jsdom implements a serializer as well, I believe, probably not the same way.  The Wikimedia Parsoid and Visual Editor projects also implement their own versions of HTMLSerializer (former in node, the latter in browser).