This partial draft discusses the architecture of the relation between instances and schemas, and outlines mechanisms for instance-to-schema association.
This is partial draft of the expected TF report.
This note purports to outline key points of agreement regarding the use of XML schemas to validate namespace qualified elements, including document elements, the corresponding rules for validation, and additional instance document constructs which can be used to approximate the capabilities of system and public identifiers when referring to XML schemas.
Key features of the proposed design are:
<schema xmlns='http://www.w3.org/1999/09/23-xmlschema/' targetNS='myNamespaceURI'> ... </schema>Note that this XML schema itself may be stored in a document of arbitrary name and accessed via an arbitrary URI on the Web. During validation of an instance element qualified with
myNamespaceURI, the processor locates (using processor-dependent means) the XML schema document illustrated above. By inspection of the
targetNSattribute, the processor determines the namespace for elements and attributes being declared by the XML schema.
Also note that this allows for an XML schema to be defined post-hoc for XML documents with no namespace declaration at all: by convention these could be schema-validated by an XML schema with
standalone='no'in the XML 1.0 declaration.
This does not of course preclude processing such instances without any schema processing at all: e.g. editors are free to edit instances as such without reference to schemas at all.
<schemadoc xmlns='http://www.w3.org/1999/09/23-xmlschema/'> <schema targetNS='myNamespaceURI1'> ... </schema> <schema targetNS='myNamespaceURI2'> ... </schema> </schemadoc>The sample above illustrates a single XML schema document which contributes definitions to two XML namespaces. Nothing in the rest of this design requires such an extension to the XML schema language, but it is a fundamental philosophical change, and a point which must be resolved before we can fully settle the means by which instances prefer to XML schema documents (see below).
Note that this settles the long-standing
dc:creatorproblem, i.e. how it is possible for an XML schema to allow elements from a namespace whose namespace URI does not point to an XML schema. Now we can simply write
<schemadoc xmlns='http://www.w3.org/1999/09/23-xmlschema/'> <schema id='dc' targetNS='http://purl.org/metadata/dublin_core'> <element name='creator'>...</element> </schema> <schema targetNS='myNamespaceURI2'> <element name='mybook'> <archetype order='all'> <element ref='creator' schemaName='#dc'/> ... </archetype> </element> </schema> </schemadoc>
As noted above, the core XML schema language provides no single means for locating an XML schema document during processing. As described in point (3) however, we wish to propose one or two standard mechanisms to promote interoperability in the common cases. Here, we consider two such mechanisms.
The first such mechanism we discussed in our call is the seemingly obvious one in which the namespace URI is dereferenced in an attempt to discover either an XML schema document itself, or some sort of external package or directory which might be used to find an XML schema. The design above is consistent with such a convention, and we are considering recommending its use. Detailed means by which the retrieved resource would be inspected, its type determined (mime type?), etc., have not been resolved at this time.
NOTE: We have discussed a problem with this approach: requiring the dereferencing of arbitrary namespace URI may be beyond the purview of the XML Schema WG. Why? The current Namespaces recommendation places no burdens at all on the inventor of a namespace beyond URI syntactic correctness. In particular, there is NO requirement that a resource exists associated with a namespace URI, or if there is that retrieving it produces anything meaningful or up to date. The proposed design suggests that anything retrieved which appears to be a valid XML schema or package is indeed valid and trustworthy. We should consider whether coordination with other XML workgroups is required to make this the case, and whether in fact we can do this retroactively given that the Namespaces recommendation has already been issued.
A variation on this theme is one that Microsoft is currently using. Specifically, Microsoft dereferences only those URIs which contain a reserved URI scheme prefix. Consider an instance document containing an element E:
x-schema: URI scheme grants permission for the processor to
dereference the URL, and asserts that it will find an XML schema at
the other end.
NOTE: We don't know whether Microsoft has registered this URI scheme, or whether we would have to register one to make it part of the XML Schema REC. An introduction to URI schemes and their registration is available from the W3c, including pointers to the official registration authorities.
To handle the common case in which the author of an instance document knows the exact absolute or relative URL of a DTD, XML provides for system identifiers in the instance document. In the case of documents using multiple namespaces, to be validated against multiple possible XML schema documents, something more elaborate and robust is required.
Two related proposals have been offered to provide similar flexibility for locating an XML schema.
Consider first the simple case where either no namespace declaration is involved at all, or a single namespace declaration appears on the top element which is qualified with that namespace. In these cases an attribute from the XML Schema namespace can be used to provide a URL of an XML schema, e.g. as follows
<myDoc xmlns='myNamespaceURI' xmlns:xsd='http://www.w3.org/1999/09/23-xmlschema/' xsd:schemaLoc='myschema.xsd'> ... </myDoc>
This approach depends on the proposal above that an XML schema contains the URI of the namespace for which it provides definitions.
Extending this approach to cases where more than one namespace is declared
used on a single element requires allowing
xsd:schemaLoc to contain a
list (space-separated) of URLs.
Alternatively, to give greater flexibility than a purely attribute-based approach allows, we could define an element, again in the XML Schema namespace, whose content specified either one or more XML schema URLs, or, if we decide to pull back from the proposal above where an XML schema identifies the namespace it is about, one or more mappings from namespaces to XML schema URLs. This element in turn would then be pointed to from an attribute from the XML Schema namespace:
<a:rootel b:someattr="1" xmlns:xsd='http://www.w3.org/1999/09/23-xmlschema/' xsd:schemaLoc='#sds'> <xsd:schemaBindings id='sds'> <xsd:namespacebinding ns="URIForA" systemFile="http://myorg.com/somefileA.xsd"/> <xsd:namespacebinding ns="URIForB" systemFile="http://myorg.com/somefileB.xsd"/> </xsd:schemadocs> ...real content... </a:rootel>
namespaceBinding provides a URL for an XML schema, in the spirit of
the XML system identifier, for each of the namespaces used in the
document. Note that these associations must apply also to the root
element in which they are contained.
We have a fundamental choice to make regarding how firm a stand we take on the connection between elements/documents and XML schemas. We've opted for a layered story above, in which what you do with an XML schema once you've got one is fixed, but how you get one is left flexible.
If we agree with that story, there are further questions to be resolved. Most importantly, we must consider the various options as to whether an individual XML schema remains co-extensive with a namespace. For example, we should consider the <schemadoc> packaging of multiple XML schema declarations into a single document. We must also decide how many of the 4 options canvassed above (any namespace URI may point to an XML schema, directly or indirectly; special URI scheme prefix; xsd:schemaLoc attribute; xsd:schemaBindings element) we explicitly describe, and whether we indicate a prioritisation of the list.
$Log: composition-tf.html,v $ Revision 1.2 1999/09/24 19:55:48 connolly fixing HTML validation and links Revision 1.1 1999/09/24 18:18:56 hugo Initial version Revision 220.127.116.11 1999/09/23 13:08:02 ht typo in $id :-( Revision 18.104.22.168 1999/09/23 12:56:14 ht fixed URI for schema per dan connolly, added ID and LOG