When defining a specification, it seems important to define the ways an agent is supposed to deal with a non-conformant document/message, so as to ensure a better interoperability. What are the different approaches? How to choose between them?

This strongly relates one of the TAG issues about error handling... Which seems to relate this issue to the TAG mailing list and the Web arch document. This is not a ProxyTopic as long as this page serves as documenting the pro/con of the various approaches.

Defining error handling for a language means:

defining how the semantics of an instance of this language are affected by a syntax error
binding it to the way a "processor" of this language should cope with it (see also MeaningVsBehavior)

Social considerations

Error handling is a very delicate part of a technology, since it goes over the separation of the well-defined and limited domain of discourse of a given specification (e.g. for SOAP, exchanging messages) to the real world.

Having well-defined semantics of errors handling is key to enabling trust in a technology.

mustUnderstand vs mustIgnore

One of the typical case where one needs to define error handling is when an agent interpreting a document stumbles upon some content it does not know how to interpret. There are typically 2 known approches (cf webarch): mustUnderstand (the agent rejects a content it cannot handle) or mustIgnore (the agent simply ignores any content it doesn't understand). The approach chosen by the specification is strongly tied to its extensibility model ; according to the TAG, a good way to handle these 2 approaches is to have a way in the syntax to distinguish which behavior is expected from the agent (see for instance the mustUnderstand attribute in SOAP 1.2.

@@@ Other approaches? more examples?

Strict vs loose

XML has a strict error handling model for parsers (reject anything that is not conformant content), while others have a very loose one, e.g. CSS, where an agent is expected to ignore just the parts that cannot be parsed. (relates to Postel law, "be liberal in what you accept, be strict in what you produce").

@@@ Pro/cons of approaches

Consistency of error handling

How much consistency is desirable in error handling:

between 2 agents
between 2 similar situations (ie, error stimulus) for one given agent

How to reflect that in a specification? What's the pro/cons of each approach?

QA