This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3817 - Processor profiles for following/not following schemaLocation
Summary: Processor profiles for following/not following schemaLocation
Status: CLOSED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Structures: XSD Part 1 (show other bugs)
Version: 1.1 only
Hardware: PC Windows 2000
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2006-10-11 22:58 UTC by Noah Mendelsohn
Modified: 2007-03-21 22:26 UTC (History)
0 users

See Also:


Attachments

Description Noah Mendelsohn 2006-10-11 22:58:42 UTC
I had always assumed that when we specified processor conformance profiles, I.e. what's now captured as Checklist of implementation-defined features and Terminology for implementation-defined features (§D), we would include an axis for schemaLocation hints.  These might be captured in something like an:

Appendix D.2.5: xsi:schemaLocation policies

Unconditionally follow xsi:schemaLocation
Applies to a processor that dereferences every supplied xsi:schemaLocation, and which reflects a (fatal) processor-specific error if any one or more such references fail to resolve to schema documents for the appropriate namespace.

Conditionally follow xsi:schemaLocation
Same as above, but no error is reflected if any one or more such references fail to resolve, resolve to something other than a schema document, or to a schema document for the wrong namespace.  If any of those conditions occur, then that schemaLocation is treated as if it were not supplied.

Unconditionally ignore xsi:schemaLocation
Applies to a processor which in all cases ignores xsi:schemaLocation attributes in instance documents.


Maybe or maybe not we would have a similar:

Appendix D.2.6: Policies for schemaLocation attributes on xsd:import

I think we should briefly discuss the possibility have including such terminology.  We obviously have users who wish to have schemaLocation hints reliably followed or reliably ignored, and I think that providing this terminology will help in a) the documentation of processors providing such features and b) the specification of systems that use XML schema and that depend on particular policies for schemaLocation handling.

Noah
Comment 1 C. M. Sperberg-McQueen 2006-10-14 13:36:31 UTC
This information (unconditionally or conditionally follow, or
unconditionally ignore, schemaLocation hints) looks at first
glance as if it were mostly covered by the current appendix D.2
(both in the most recent public working draft, and in the status
quo documents).  The list of possible component sources in D.2.1
includes schemaLocation hints, and the introductory prose in D.2
says "General-purpose processors SHOULD ... provide user control
over which methods are used and how to fall back in case of
failure."  That covers the distinction drawn in the sample text
in the description, does it not?

But since the originator of the comment is clearly familiar with
appendix D, I assume the comment is asking for something that is
not in fact already there.  I don't know what, though.  Could you
elaborate?

D.2 does not provide terminology for "how to fall back in case of
failure"; is the proposal in essence that appendix D should
define standard terminology to distinguish fatal errors,
non-fatal errors, warnings, or silence on the part of processors?

I wonder: is something like that really needed?  The topic
worries me; I fear that discussion on that topic would prove to
be a tar-pit; members of the WG and readers of WG minutes will
recall that even the distinctions made in 5.2 between strict
wildcard validation and lax wildcard validation struck some WG
members as saying too much about the context within which
validators operate.  Trying to provide standard terminology for
behavior when a URI does or doesn't resolve, or resolves to
something unexpected, seems like a very high-cost, and relatively
low-benefit, errand.  If serious readers believe that D.2 as
currently drafted requires either that references always succeed
or that failures of reference never be errors, then it might be
worth adding a sentence or two to dispel that confusion.

If on the other hand the proposal is that we should provide such
terminology, but ONLY for use in describing behavior vis-a-vis
schemaLocation hints, and not for use when other locations are
consulted, then I don't understand the motive for the lack of
orthogonality.

(In reviewing the relevant text just now, I note two points that
need correction: in the intro to D, for "and to provide user
control" read "and provide user control".  And in the list of
component sources, either edit the entry for schemaLocation hints
to cover schemaLocation hints in schema documents [hints in the
case of import, at least], or add a separate entry for them.)
Comment 2 C. M. Sperberg-McQueen 2007-03-19 22:34:16 UTC
During the WG telcon on 16 March, the WG adopted a wording proposal
which addresses this issue by adding new terms to the appendix
on terminology for process-variable behavior in schema construction.

So I'm marking this as FIXED.

Noah, as the originator, please change the status from RESOLVED
to CLOSED to indicate your assent to the decision; if you don't do 
so, in a couple of weeks someone else will on your behalf. 
Comment 3 Noah Mendelsohn 2007-03-19 22:57:52 UTC
I believe that there is in principle an opportunity to do better if we had the time and were so inclined, but I think that what the workgroup has agreed is an acceptable compromise.  As a signal of the sorts of things I think one could do, one could have properties such as:

schemaLocationHintsIgnored:  if true, the processor guarantees not to dereference a URI as a result of its appearance in an instance schemaLocation.

There are some other variants possible in principle.  Given my feelings, I would have some temptation to resolve the issue with status LATER, but for several reasons I will mark it CLOSED:

1) I think this is in the spirit of what the workgroup has agreed, and I have no objection at all to that agreement.  While this is not the only possible or most aggressive resolution, it is a reasonable one IMO.

2) The terminology in Appendix D in no way restricts the conformance profiles that others may choose to document.  Accordingly, if any other conventions prove important, no changes to the recommendation will be necessary.

3) There is always the opportunity to open a new issue if new information obtained as a result of experience with Schema 1.1 suggests that further work would be beneficial.

Noah
Comment 4 C. M. Sperberg-McQueen 2007-03-19 23:52:50 UTC
For the record, the reason the editors did not propose, and WG did 
not consider, adding a keyword like schemaLocationHintsIgnored with the
semantics described in comment # 3 is that the proposition "this
process ignores schema location hints in the instance" is already
expressible with the existing vocabulary defined in appendix D
(specifically, as noted in comment #1, in appendix D.2.1).

Comment 5 Noah Mendelsohn 2007-03-21 14:03:56 UTC
I'm feeling dense.  I've looked at comment #1 and Appendix D.2.1 of the editor's draft, both before posting comment #3 and again.  I am seeing how to say "Try to dereference hints" (schemaLocation hints in D.2.1);  I am seeing how to say "There  exist non-schemaLocation sources of some schema documents to be used (hard-coded schema locations, named pairs, schema documents, etc ); I am seeing that some "schemas" (I presume for things like HTML) can be built into a validator (hard-coded schemas).  What I'm still missing is:  if you see a schemaLocation in an instance, perhaps for some namespace not addressed by the other mechanisms, or perhaps for a namespace for which some declarations already were brought in by those other mechanisms, you MUST NOT dereference the schemaLocation URI and in any case MUST NOT use it as inspiration to change the schema you would have otherwise constructed.  What am I missing?
Comment 6 C. M. Sperberg-McQueen 2007-03-21 17:06:19 UTC
Perhaps we are imagining different uses for the terms defined here.  I
expect them to be used in describing processors, and the requirement
to be allowing the behaviors described in the initial description, in
comment #3, and in comment #5 to be described in English prose using
the terms defined in the spec.  

So to use the existing terminology to say that schema location hints
in the instance are not followed, it would suffice to say something
like

   Schema location hints in the document instance are not
   followed.

or

   The --nohints option means that the processor should not
   follow schemaLocation hints in the document instance, even if
   no components for the namespace in question are available.

If you want something more elaborate, I can offer the following sample
documentation for an imaginary processor named Figment which can be
invoked with run-time options directing the various behaviors
described.

... Figment schema construction options ...

Figment assembles a schema by looking for schema components in
different places.  In general, for each namespace used in the input
document as the namespace of any element or attribute, Figment looks
for schema components.

The user may control where Figment looks for components by means of
the --where and --how options:

  --where=LOCATION  

    LOCATION may be any of:  
      cache:  look in Figment's local schema cache
      cli:  look for a location passed on the command line, using
        the --load option
      ask:  ask the user by means of a prompt on stderr
      ns:  dereference the namespace name
      hints:  look in the locations indicated in xsi:schemaLocation
        attributes in the input
      
    The --where option can be given more than once on the command 
    line.  If no --where options are specified, the default behavior 
    of Figment is equivalent to --where=cache --where=hints --where=ns.
    If any --where options are specified, Figment will look only 
    in the indicated locations.
      
  --how=METHOD

    METHOD may be:
      literal: Figment will attempt to dereference the URI given 
        as a location.  If that produces a schema document, Figment
        will read the schema document and load the components it
        defines.
      catalog: means that Figment will look up the URI in the
        Oasis XML catalog at /usr/local/Figment/catalog and
        attempt to dereference the location given by the catalog,
        if any.
      rddl: means that if dereferencing a URI produces a RDDL
        document, Figment will look for the well-known purpose
        Figment-validation, and follow the link given.

    The --how option can be given more than once.  The order of
    options determines the order in which the methods are tried. 

    If --eager=yes is specified, then all methods will be tried 
    for each namespace; if --eager=no is specified, then later
    methods will be tried only if earlier methods don't succeed
    in finding a schema document.  The default is --how=catalog
    --how=literal --how=rddl

    The --how option does not affect searching in the Figment cache.

The --eager option controls what Figment does when it succeeds in
finding a schema document which defines components for the namespace
in question.

  --eager=yes means Figment will read and process the schema 
    document it has found, and then continue looking for more
    components in the namespace, using other methods or in other 
    locations, until there are no more places to look.  

  --eager=no means Figment will read and process the schema document
    it has found, and stop looking for components for the namespace.

The --onfailure option controls what Figment does when searching for
components for a given location fails to produce any schema documents
for the namespace being sought.

  --onfailure=continue means that Figment will try the next location
    on the list.
  --onfailure=halt means that Figment will stop looking for components
    for this namespace and move on to the next namespace
  --onfailure=error means that Figment will stop looking for 
    components, issue an error message, and move on to the next
    namespace 
  --onfailure=fatal means that Figment will stop looking for 
    components, issue an error message, and exit.  No validation will
    be performed.

These options can be used to produce a variety of behaviors.  The
following examples are drawn from discussions of schema construction
in public records of the XML Schema Working Group.

1) Unconditionally follow xsi:schemaLocation

   Applies to a processor that dereferences every supplied
   xsi:schemaLocation, and which reflects a (fatal) processor-specific
   error if any one or more such references fail to resolve to schema
   documents for the appropriate namespace.

   --where=hints --how=literal --onfailure=fatal

2) Conditionally follow xsi:schemaLocation

    Same as above, but no error is reflected if any one or more such
    references fail to resolve, resolve to something other than a
    schema document, or to a schema document for the wrong namespace.
    If any of those conditions occur, then that schemaLocation is
    treated as if it were not supplied.

    --where=hints --how=literal --onfailure=continue

3) Unconditionally ignore xsi:schemaLocation

    Applies to a processor which in all cases ignores
    xsi:schemaLocation attributes in instance documents.

This one can be achieved using any set of options that does not
include --where=hints.  For example:

    --where=cache

    --where=cache --where=namespace --how=catalog --how=literal
    --eager=yes --onfailure=continue

... End of Figment schema construction options ...

I think example 3) illustrates that what is requested in comment #3
and comment #5 is possible.  Or am I missing something?

Comment 7 Noah Mendelsohn 2007-03-21 22:26:19 UTC
Michael Sperberg-McQueen writes:

> Perhaps we are imagining different uses for the
> terms defined here.  I expect them to be used in
> describing processors, and the requirement to be
> allowing the behaviors described in the initial
> description, in comment #3, and in comment #5 to
> be described in English prose using the terms
> defined in the spec.

Ooops, yes we are.  Were it earlier in the process I would suggest that would be a reason for reopening the issue.  Given where we are in going to Last Call, I can let it go.  FWIW, my intended use of the D.2 terminology would be more along the lines of saying in my processor documention:

"This processor implements schema location strategies XXX, YYY and ZZZ", where XXX, YYY ZZZ are terms from D.2.x.

Interestingly, taken together, the text seems mildly contradictory on this point.  D.2 says: "Conforming processors may implement any combination of the following strategies for locating schema components, in any order. They may also implement other strategies.", suggesting that terms like "hard coded schemas" are not just handy noun phrases for use in constructing sentences in conformance prose, they are something you can conform to, as I had expected.  Then D.2.1 says: "Some terms describe how a processor identifies locations from which schema components can be sought:" which says, as you suggest, "these are just terms with definitions."

Anyway, I think it's late for rototilling this.  I'm a little disappointed not to have noticed what appears to be a deeper misunderstanding than my original question about schemaLocation hints.  Still, unless this exchange leads you or others to want to discuss again and clarify the draft, I'm willing to let it go in the interest of moving on.

Noah