This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28011 - Redefining RFC 2119 may and must
Summary: Redefining RFC 2119 may and must
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Functions and Operators 3.1 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Michael Kay
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-13 20:48 UTC by Patrick Durusau
Modified: 2016-12-16 19:55 UTC (History)
1 user (show)

See Also:


Attachments
The history of RFC2119 usage at W3C and the redefinition mistake. (19.89 KB, text/plain)
2015-02-13 20:48 UTC, Patrick Durusau
Details

Description Patrick Durusau 2015-02-13 20:48:23 UTC
Created attachment 1574 [details]
The history of RFC2119 usage at W3C and the redefinition mistake.

Section 1.6.3 Conformance terminology redefines may and must and does not follow RFC 2119. 

I have traced the error back to  Extensible Markup Language (XML) 1.0 (Second Edition), http://www.w3.org/TR/2000/REC-xml-20001006, which used the definitions of may and must found in FO31. 

Those definitions were abandoned in  Extensible Markup Language (XML) 1.0 (Third Edition), http://www.w3.org/TR/2004/REC-xml-20040204, citing RFC 2119 instead and that practice, of citing RFC 2119 has continued to date.

Unfortunately,  XML Schema Part 2: Datatypes Second Edition, http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/, fell between XML 1.0 2nd edition and XML 1.0 3rd edition. 

The redefining of RFC 2119, abandoned for ten (10) years now, should be avoided here for the sake of interoperability. 

I have attached a longer document that treats the history of this error more extensively.
Comment 1 Michael Kay 2015-02-15 10:57:05 UTC
For anyone like me who was totally bewildered by the title of the attached document, "Will Robinson" apparently refers to a popular US TV programme: https://en.wikipedia.org/wiki/Danger,_Will_Robinson
Comment 2 Michael Kay 2015-02-15 11:40:37 UTC
A couple of additional points:

(a) The defined term "for compatibility" is never referenced as such; the definition should probably be removed.

(b) Section 1.6.3 seems misplaced; it would make more sense to move it to 1.1

(c) I think it's inconsistent to say on the one hand that conformance rules are defined by the host language, and on the other hand to use RFC (or RFC-like) "must" language. I'm not sure we've ever really had a clear view on how much a host language is allowed to modify the spec. One could argue that any spec can choose to use parts of another spec selectively; if a spec A wants to cherry-pick arbitrarily from a spec B, there is nothing that B can say to prevent this. My feeling is that F+O DOES define conformance requirements (functions must behave as described), but it is up to other specs whether to adopt those requirements or not. 

A significant problem in switching to the RFC 2119 definition of "must" in F+O is that we often use the word in sentences like "The primary format token is always present and must not be zero-length." Here the requirement is not on the implementor of the spec, but on the user. The correct reading of the RFC definition is open to debate, but my interpretation is that it is always a requirement on the implementation, and because it is defined as an "absolute requirement" there is no room for discussion about what an implementation should do ("raise an error") if the requirement is not satisfied.
Comment 3 Patrick Durusau 2015-02-15 14:15:18 UTC
Michael,

When you say:

****
A significant problem in switching to the RFC 2119 definition of "must" in F+O is that we often use the word in sentences like "The primary format token is always present and must not be zero-length." Here the requirement is not on the implementor of the spec, but on the user.
****

Although phrased as a requirement on a user, isn't the requirement on an implementation to not accept input that failed to conform to the requirement?

In this case, an implementation must fail/reject input where the primary format token is absent or is of zero-length. 

Yes?
Comment 4 Patrick Durusau 2015-02-15 15:00:07 UTC
Michael,

Lest we forget, the IETF covers the "user" case (if that is the case here) with this language from the RFC Editors Guidelines, http://www.rfc-editor.org/policy.html: 

****
Some standards-track documents use certain capitalized words ("MUST", "SHOULD", etc.) to specify precise requirement-levels for technical points. RFC 2119 (BCP 14) [BCP14] defines a default interpretation of these capitalized words in IETF documents. If this interpretation is used, RFC 2119 must be cited (as specified in RFC 2119) and included as a normative reference. Otherwise, the correct interpretation must be specified in the document.

    Avoid abuse of requirement-level words. They are intended to provide guidance to implementors about specific technical features, generally governed by considerations of interoperability. RFC 2119 says, "Imperatives of the type defined in this memo must be used with care and sparingly. In particular, they MUST only be used where it is actually required for interoperation or to limit behavior which has potential for causing harm (e.g., limiting retransmissions). For example, they must not be used to try to impose a particular method on implementors where the method is not required for interoperability." To simply specify a necessary logical relationship; the normal lower-case words should be used. On the other hand, if the capitalized words are used in a document, they must be used consistently throughout the document. 
****

So, uppercase MUST for implementation requirements and lowercase must for "necessary logical relationship."

I persist in thinking your example is a requirement on the implementation but offer this as another aspect of the issue.
Comment 5 Michael Kay 2015-02-19 17:56:49 UTC
We can certainly regard the sentence "The primary format token is always present and must not be zero-length." as a paraphrase for "The implementation must raise a dynamic error if the primary format token is absent or is zero-length."

The question is whether we need to say explicitly that we are adopting this convention, and whether this would itself be a variation on the RFC definition of "must".
Comment 6 C. M. Sperberg-McQueen 2015-03-03 23:59:04 UTC
For the record, it is not quite true that XML 1.0 2e was the first XML spec not to cite RFC 2119; the first edition also did not cite RFC 2119, but defined 'may' and 'must' in words modeled on those in ISO specs the editors knew and trusted.

One reason to be cautious about the definitions in RFC 2119 is the casual circularity of the definition of 'MUST', which may perhaps be best illustrated if we replace the crucial words with words we do not in fact already know:

  1. MAUN   This word, or the terms "FARBLED" or "GRANFALLOON", mean that the
     definition is an absolute farblement of the specification.

OK.  So we use the word FARBLED if the definition is a FARBLEMENT.  And a FARBLEMENT, we may infer, is something that is FARBLED.  How that relates to anything else in the world is not clear to this reader.  It's not hard to do better; though I say it myself, I think the XML spec did.

Another reason is that the definitions in RFC 2119 are made awkward by the text's strenuous effort not to say what kinds of things it is talking about -- an effort which however fails in the course of the paragraph on MAY, which finally gives in and mentions 'implementations'.  It is hard for this reader to see how to apply the definitions given in cases where a spec governs something other than software. 

The only reason that the flaws in RFC 2119's definitions are not catastrophic is that as far as I can tell no one ever, ever pays any attention whatsoever to the definition of these terms, but merely uses them the way other specifications do.

That said, the XML spec refers to documents and processors because the spec defines conformance requirements both for documents and processors.  Copying the definition into a spec which does not define a class of documents may not have been the best way to start.
Comment 7 Michael Kay 2015-04-20 23:24:44 UTC
It's certainly true that (a) the use of "must" within F+O is a mess, (b) that it's not easy to sort out, and (c) that the mess does very little practical harm.

The messiness arises in a number of separate ways.

(a) The definitions of the terms are not directly linked to RFC2119; indeed, RFC2119 is not cited, normatively or otherwise.

(b) There are three different renditions used for the word "must": normal text, bold text, and hyperlinked text. The use of bold and hyperlinked rendition appears to be interchangeable.

(c) The word "must" sometimes refers to the implementation/processor, and sometimes to the caller of a function (as in, "a '$' sign must be escaped as '\$'".) The latter usage "A must be B" is shorthand for "if A is not B, the function raises a dynamic error". This is a very convenient shorthand, but it's not closely related to the usage described in the RFC definition nor the F+O definition of "must".

(d) The use of "must" isn't directly linked to conformance criteria for the spec. This relates to the fact that F+O doesn't actually have any conformance criteria, on the theory that it is designed to be referenced from other specs rather than to be free-standing.

I will propose a way forward in the next message.
Comment 8 Michael Kay 2015-04-21 08:28:43 UTC
I propose to resolve these issues as follows:

(a)/(c) I propose to retain local definitions of the terms "must" and "may" rather than simply deferring to RFC2119. The RFC definitions, as pointed out by MSMcQ, are too imprecise to be useful in the context of this specification without elaboration. However, I intend to add a non-normative reference to the RFC. Specifically, I propose using the following definitions:

* The auxiliary verb MUST, when rendered in small capitals, indicates a precondition for conformance.

** When the sentence relates to an implementation of a function (for example "All implementations MUST recognize URIs of the form ...") then an implementation is not conformant unless it behaves as stated.

** When the sentence relates to the result of a function (for example "The result MUST have the same type as $arg") then the implementation is not conformant unless it delivers a result as stated.

** When the sentence relates to the arguments to a function (for example "The value of $arg MUST be a valid regular expression") then the implementation is not conformant unless it enforces the condition by raising a dynamic error whenever the condition is not satisfied. 

* The auxiliary verb MAY, when rendered in small capitals, indicates optional or discretionary behavior. The statement "An implementation MAY do X" implies that it is implementation-dependent whether or not it does X.

Note: These definitions of the terms MUST and MAY are consistent with the definitions in RFC2119, but expressed in language appropriate to the subject matter of this particular specification.

(b) I propose to use small caps for rendering the formal instances of MUST and MAY, as is done in the XSLT spec, replacing use of bold and hyperlink rendition.

(d) Since MUST and MAY rely for their meaning on some notion of "conformance", I propose to rephrase section 1.1 to include such a notion. In particular I propose to be more precise about the freedoms available to a host language:

* This recommendation contains a set of function specifications. It defines conformance at the level of individual functions. An implementation of a function conforms to a function specification in this recommendation if all the following conditions are satisfied:

** For all valid inputs to the function (both explicit arguments and implicit context dependencies), the result of the function meets the mandatory requirements of this specification

** For all invalid inputs to the function, the implementation signals (in some way appropriate to the calling environment) that a dynamic error has occurred.

** For a sequence of calls within the same EXECUTION SCOPE, the requirements of this recommendation regarding the STABILITY of results are satisfied (see section XXX).

Other recommendations ("host languages") that reference this document may dictate:

** subsets or supersets of this set of functions to be available in particular environments

** mechanisms for invoking functions, supplying arguments, initializing the static and dynamic context, receiving results, and handling dynamic errors

** a concrete realization of concepts such as EXECUTION SCOPE

** which versions of other specifications referenced herein (for example, XML, XSD, or Unicode) are to be used

Any behavior that is discretionary (implementation-defined or implementation-dependent) in this specification may be constrained by a host language.

Note: Adding such constraints in a host language, however, is discouraged because it makes it difficult to re-use implementations of the function library across host languages.
Comment 9 Michael Kay 2015-05-13 08:14:40 UTC
These changes were accepted and have been applied.