This document attempts to clearly delineate the issues raised by embedding SVG in other documents and referencing SVG from other documents. Many of these issues are not unique to SVG, and suggest a need for a better mixed namespace document embedding and referencing framework.
This document is a Note submitted to the W3C with the intention that it be used as a basis to further the work of embedding and referencing SVG from other documents.
This Note has been produced by SVG Working Group representatives from SchemaSoft and represents the opinions of Philip Mansfield, Darryl Fuller, and Yuri Khramov, as well as information from e-mail discussion related to this topic with SVG Working Group members.
SVG is a document format that can can be embedded or referenced by documents in other XML formats, and vice-versa. This can be done recursively, leading to complex multi-format content. Those other document formats are typically dialects such as SMIL, XHTML, or SVG itself. How SVG behaves when it is embedded within other SVG is fully defined, but how SVG behaves when it is referenced from SVG using the <image> element is not. How to embed XHTML with the <foreignObject> element is partially defined, but very little is specified about the resulting behavior. How XHTML is supposed to reference SVG with the <img> and <object> elements is defined, but how this behaves is incompletely specified. Furthermore, the existing specifications make no mention of using SVG as a background image format for XHTML.
There are a number of issues that have not been methodically addressed. How are events handled? What is time zero for animations for a referenced or embedded SVG? How do the different DOM trees interact? How does CSS cascading work when the same property may have different interpretations in different namespaces? When is referenced or embedded content intended for rendering, and when is it not? While some of this is specified for the case of SVG embedding or referencing SVG, interacting with other document types such as XHTML and MathML needs to be looked at. We are already seeing mixed-namespace documents coming into being with work on modularizing XHTML and SVG, and the ability of browsers such as Mozilla and Amaya to handle XHTML and MathML mixed-namespace documents.
Before we can come up with solutions, we need to clearly delineate the problems. Some of these issues have been addressed in different contexts, sometimes in conflicting ways. This also needs to be clearly documented.
There are basically two main techniques of combining several documents: direct inclusion or referencing. Direct inclusion of one <svg> element into another does not generate different documents, and is covered by the SVG 1.0 W3C Recommendation. Direct support for SVG and MathML in some browsers allows for mixed-namespace situations.
We will use the terms "hosted" and "hosting" documents to cover both situations, "referenced" and "referencing" documents for the referencing case, "embedded" and "enclosing" documents for the inclusion situation
It is possible to differentiate between these two situations and define different behavior for them with regard to the issues described below. Moreover, the existing applications (cf. Adobe's SVG Viewer) do have different behavior depending on the way one document is embedded into another.
Some mixed-in namespaces are not intended to be directly rendered; for example, they might describe metadata, schemas or processes. Even formats that are normally rendered might be mixed with the intent that they are not rendered. For example, the intent of a mixed X3D/SVG file might be that the SVG is the current 2D view of the 3D model encoded in X3D, and that 2D view may change via the DOM based on user interaction. In that case, it would be a mistake to render both the X3D and the SVG. Rather, one or the other should be rendered.
Which one? If the intent is to do the rendering through DOM-driven SVG, then the SVG should always be rendered. However, if the intent is that the SVG is just a fallback for those who do not have an X3D-savvy user agent, then which gets rendered is dependent on the particular software setup.
In general, when encountering two mixed namespaces, how does a User Agent decide whether to render both, one, or the other? Some languages have some support for this choice. For example, a hierarchy of fallback renderings can be specified with XHTML's <object> tag or with SVG's <switch> tag. However, there is no universal mechanism, and the models for how this works are not necessarily consistent from language to language.
When rendering a document hosted by SVG, the hosted document's viewport may be constrained or transformed in a number of ways. Consider the case of the <image> element. Height and width may be specified by the hosted document format, or not. The aspect ratio may be preserved by the hosting document, or not. In general, the software that renders the SVG needs access to height, width and any other constraints on the hosted content, which it might not have if a separate piece of software takes over the parsing and rendering of that format, and does not have this information in its API.
Next, consider XHTML hosted by SVG. Must it occur within a <foreignObject> element to be rendered? The stated aim of this element is to pass processing on to the hosted language processor, as with in-place activation. However, the XHTML may be in the context of a transform that makes the viewport not rectangular and upright. It may be rotated and/or skewed, for example. What happens when in-place activation requires the hosted content to be strictly upright and rectangular, as with XHTML, because its processor cannot handle the rotation or skew? Does it figure out the largest upright rectangle that will fit into the rotated and skewed viewport and render into that? Or must there be an API to pass pixels to the hosting SVG processor, so that the skew or rotate can be performed on pixels? If so, then all interaction with the HTML is presumably lost (hyperlinks and imagemaps, select/copy/paste text, scripted dynamic behavour, etc.)
Consider the case of SVG being referenced by the <img> element of XHTML as specified. If the height and width are not specified by the <img> element, the SVG must communicate its size to the XHTML rendering engine. If the width of the SVG is "100%", then there has to be communication back and forth between the two engines. If the height of the SVG is "100%", what does that mean? In general, section 7.2 of the SVG 1.0 specification describes a negotiation process between hosted SVG and the hosting language, and is only specific about how that negotiatiation proceeds in isolated cases like CSS2 applied to HTML. There remain issues of generalizing this and issues of how to implement the two-way information exchange in independently written software modules.
SVG has its own rules for Z order, composition, and opacity. It applies them when it references things with the <image> tag. A referenced PNG may have filters applied to it, be made translucent, and have other SVG from the hosting document both "in front of" and "behind" it. XHTML has completely different Z-order and compositing rules. Images (perhaps SVG) referenced by XHTML may be behind things (background images) or on top of things (on top of background image), but there isn't any more layering than that. There is no concept of filters or opacity. So what happens with SVG hosting XHTML or MathML? Can the HTML rendering engine be expected to apply an SVG filter effect? Whether or not this is possible may depend upon the underlying engine. Is it a plug-in like the Adobe viewer or is it an all in one engine like Mozilla? Certainly putting an SVG image on top of an XFORMS buttons seems to be a natural application, as does using an SVG image as background for XHTML or XHTML as a bit of wrapped text on top of an SVG rectangle or circle. Different XML dialects have different rules for Z-order, capabilities for compositing, etc., so how to you mix them or resolve conflicts?
The XML Event specification (http://www.w3.org/TR/2001/WD-xml-events-20011026) does not define a mechanism for event capture and bubbling between two different documents. This is clearly an issue for embedded documents. It is relevant for referenced documents as well; for example, in the case of an image map on an XHTML <object> that refers to an SVG file. There are several possible approaches for setting up this mechanism. Some options are described below.
Apparently, the SVG Working Group inclines toward option 2. But shall we take into account Z-order if the svg document is used for a background image, say as the background for XHTML? The Z-order for hosting and hosted documents does not always have the hosted element on top. Consider the case of SVG referencing SVG using the <image> element. The hosting document could have things both on top of and behind the hosted SVG image. Z order seems to be more natural in this case.
There are several possibilitites in setting up DOM interoperability options. One is not to allow any interoperability at all; a second is to glue the hosted document to the node of the hosting document as described in option 1 of the previous section. Another option is to have DOM interoperability for included documents but not referenced documents. Note that there are already people doing inspired hacks with the Adobe plugin to call into the DOM of an HTML document from script within an SVG document that it references via the nonstandard <embed> tag (the reverse is more straightforward due to the nature of browser plug-ins). See Kurt Cagle's Interactive SVG presentation for examples of this. Client-side script would be much more reliable if this sort of mechanism were pre-planned, uniform and vendor-neutral.
If your hosting and/or hosted document have metadata, then what is the scope of that metadata? There are a couple of different options here. If you treat metadata scope like CSS Inheritance, and the trees are just "glued together", then the metadata of the hosting document applies to the hosted document. This may be less than desirable. Consider the case of SVG hosting SVG. Your hosting SVG could be a map, with metadata describing the co-ordinate system as being geographic. Your hosted SVG could be a company logo for a gas station that is to be put on the map. Surely the logo's co-ordinate system is not geographic! Furthermore, what if your map also hosts some MathML? The metadata about geographic co-ordinates does not seem to apply. Worse yet, you could have metadata that has the same name but conflicting meanings in the hosted and hosting document. The alternative is to have metadata scope not cross the host/hosting barrier, or to only cross this barrier in the case of embedded content. But surely there are cases where it would be appropriate for the hosting document's metadata to apply to the hosted document.
At first glance, it would seem sensible that CSS inheritance passes seamlessly through included documents, but not through referenced documents; but things are not so simple. CSS presentation attributes may have different interpretations or even different syntax in different namespaces, so what does it mean when you pass through the namespace barrier? A trivial case of this could be font. In an SVG document, you could define your own SVG font, and even name it the same as some system font. If you then inherit into embedded XHTML, what does that font name mean? If there is further SVG hosted by that XHTML, does it pick up the outermost meaning of the font name, or the meaning that it had in the XHTML name space? How do you resolve clashes?
Even the notion of what stylesheets apply to what content is a problem for software. A generic CSS processor would recognize use of the xml-stylesheet processing instruction, and would have to apply such stylesheets to the whole document, no matter how many embedded namespaces it has. On the other hand, there are grammar-specific ways to reference CSS, such as the <style> element, style attribute and presentation attributes of SVG, or the <link rel="stylesheet"> construct in XHTML. This sets up a situation in which a CSS processor would have to know about all grammars that might use it. It also raises the question of whether some stylesheets apply to the whole document and others only to the parts that directly reference the stylesheets.
Different XML dialects have different mechanisms for timeline synchronization. SMIL has time containers for specifying "time zero" for encapsulated items. SVG defines a single "time zero" for the entire document. XHTML, to the best of our knowledge, has no concept of "time zero" or synchronization, and has a tradition of progressive layout, so that each element's "time zero" is effectively its load time. So what happens when you have different hosting and hosted dialects? Furthermore, is the behavior different when hosted documents are referenced rather than embedded?
There are two main options here. The first is that the hosted document honors the hosting document's concept of time zero. So if a SMIL document is hosting an SVG document within a time container, the SVG document's time zero is whatever the SMIL time container says it is. If XHTML is hosting SVG, time zero is as soon as the SVG document loads. If an SVG document is hosting an SVG document by referencing it with the <image> element, then the hosted SVG document's time zero is when the hosting document finishes loading. But what if the hosted document "isn't ready yet"?
The other main option is that the hosted document determines its own time zero, and informs the hosting document of it. This seems very natural for SVG hosting SVG by reference, but seems backwards in the case of SMIL hosting.
Many XML dialects have ways of hosting other documents by reference right now. XHTML has the <img> and <object> tags, SVG has the <image> and <foreignObject> tags. But how do you embed and how do you validate the results? Currently, you wind up having to write custom DTD's every time you want to host a new kind of XML by embedding if you want to validate. You could use "ANY" for the content model of embedded types, but then you couldn't validate embedded documents.
Even so, for pragmatic reasons, we often have to define how an XML dialect should be hosted by some other. Consider the case of SVG and XHTML. It was SVG that had to specify what was the "right" way for it to be hosted by XHTML (using the <img> or <object> tag). Had this not been specified, then every browser would do it differently. Even so, this is not a consistant definition. If <img> works, why not the background image attribute? Worse yet, popular implementations currently only support one of these methods (the <object> tag). This seems rather ad-hoc and painful. It would be much nicer if there were one cross-language way of referencing and embedding, just like xlink.
Also, do you allow hosting document fragments or just complete documents? If you allow hosting document fragments, you could wind up with "tag soup". But there are very natural cases where you would just want to host a fragment. SVG already allows hosting just fragments of other SVG documents. Many SVG elements can use a URL to reference just a part of another document. It seems very natural to have libraries of markers, patterns, symbols, etc. packaged up in single files.
This document is intended to raise issues and requirements for further SVG language design and for the more general design of cross-language features. We do not attempt to propose detailed solutions to every problem raised. However, we will suggest a general direction for the SVG Working Group to take.
To handle the issues we have raised obviously requires a closer degree of cooperation between the software modules processing each of the documents involved (the hosting document and the referenced or embedded document). Modules have to advise each other of their capabilities and restrictions, preferably through a standard API. Software can communicate capabilities while running, but XML can only be used to encode requested behavior - it is up to the software to resolve how that information is used. For instance, enclosing SVG may advise embedded XHTML of a skewed viewport, but it is up to the XHTML processor to decide whether or not it can skew. The reality is that each grammar will have its own unique requirements, so one must allow a range of behaviors to be requested and have defined fallbacks, rather than mandating a single behavior in each case.
The problem of software co-operation for document embedding is not a new one. It has been solved before in various component oriented document models such as OpenDoc and Bonobo. It is our opinion that we should follow these examples in coming up with our model for document embedding and co-operation.
We believe that there is a need for a separate cross-language XML specification for interaction of documents and document fragments in different namespaces, much as there are other cross-language specifications such as XLink, XPointer, DOM and XML Namespaces themselves. Right now, the SVG Working Group is facing these problems, and SVG may have some mechanisms to address these issues. Therefore the SVG Working Group may be appropriate initiators of this effort.
"Document Object Model Level 2 Events Specification", Tom
Pixley. W3C Recommendation 13 November, 2000
Available at http://www.w3.org/TR/DOM-Level-2-Events/