Arch/Extensibility Design Choices

Document To Do: add some concrete examples!

Contents

Background and Terminology
Managing Overlap
1. Choice: How Do Extension-Creators Discover Overlap?
2. Random Questions
Fallback Procedures
Strawman
Example Extensions
References

1. Background and Terminology

1.1. Basic Terms

A RIF Document is an XML document with a root element called "Document" in the RIF namespace (http://www.w3.org/2007/rif#). In general, RIF documents are expected to convey machine-processible rules, data for use with rules, and metadata about rules.

A RIF System is anything which might produce or consume a RIF Document. Typical RIF systems include rule authoring tools and rule engines. These systems may consist of a non-RIF subsystem and a RIF translation subsystem; taken as a whole, they form a RIF system.

A RIF Dialect is an XML language for RIF Documents. Each RIF Dialect defines semantics for the set of RIF Documents which conform to its syntax definition. Dialects may overlap other dialects; that is, a given document may be an expression in multiple dialects at the same time.

A Language Conflict occurs when multiple dialects specify different meanings for the same document. That is, if there can exist a RIF Document which syntatically conforms to two dialects, and a system can be conformant to one of the dialects without also being conformant to the other, then there is a language conflict between the dialects.

A RIF Extension is a set of changes to one dialect (the "base" dialect) which produce another dialect (the "extended" dialect), where the extended dialect is a superset of the base dialect.

A RIF Profile is the complement of a RIF Extension; it is a set of changes to one dialect (the "base" dialect) which produce another dialect (the "profile" dialect), where the profile dialect is a subset of the base dialect.

A system is Backward Compatible if it accepts old versions of its input language. All systems are backward compatible for languages which change only by incorporating extensions (that is, by growning). In a large, decentralized system (like the Web), backward compatibility is extremly important because new system will almost certainly have to read old-version data (either old documents, or documents recently written by old software).

A system is Forward Compatible if it behaves well when given input in future or unknown languages. In a large, decentralized system (like the Web), if the systems are not all forward compatible, new language versions are extremely difficult to deploy. Systems which are not forward compatible will behave badly when they encounter new-version data, so the users of these systems will tend to push back on the people trying to publish the new-version data. If a large enough fraction of the user base is using such systems, the push back becomes too great and migration to new versions is prevented. In small, controlled environments, the software for all the users can be upgraded at once, but that is not practical on the Web.

A Fallback mechanism provides forward compatibility by defining a transformation by which any RIF document in an unknown (or simply unimplemented) dialect can be converted into a RIF document in an implemented dialect. In many cases, fallback transformations will have to be defined to be lossy (changing the semantics of the document). Fallback mechanisms can be simple, like saying that certain kinds of unrecognized language constructs are to be ignored (as in CSS), or they can be complex, invoking a Turing complete processor (as in XForms-Tiny).

Impact is information about the type and degree of change performed by a fallback transformation. For instance, a fallback transformation which affects performance might be handled differently from one which will cause different results to be produced. This difference is considered impact information.

1.2. Extensibility

An Invisible Extension defines a dialect which has exactly the same syntax as some other dialect but different semantics. This is sometimes desirable when the different semantics are related in a practical and useful way, such as reflecting the different capabilities of competing implementation technologies. Deployment, testing, and the definition of conformance for invisible extensions require out-of-band information, which may be problematic. For example, there is a subset of OWL-Full which has the same syntax as OWL-DL, but which has more entailments (ie different semantics). This subset of OWL-Full is an invisible extension of OWL-DL; its presence (and thus the different intended semantics) cannot be determined by inspection and must be conveyed out-of-band in any applications where the semantic difference might matter.

Extensible systems may support User Extensions (Vendor Extensions), Official Extensions or both. A user extension is one which can be defined and widely (and legitimately) deployed without coordinating with a central authority (such as W3C or IANA). Official extensions are those produced under the auspices of the organization which which produced the base dialect (in this case W3C). Some people consider user extensibility to be required for a system to truly be extensible. The RIF Charter extensibility requirement concerns user extensions.

Some partially-formed ideas about dependencies:

An extension A requires extension B if the base dialect of A is a superset of the extended dialect of B.
Extensions A and B are independent if, for all dialects D which can be a base dialect for both A and B, the dialects D+A+B and D+B+A are well-defined and identical.
An extension A is compatible with extension B if A requires B, B requires A, or A and B are independent.
Dialects are incompatible if either there is no dialect which can be a base for both of them or if they are not compatible.

1.3. Motivation Scenario

Acme Widget Co. has a complex pricing structure, with bulk discounts, high-volume customer discounts, periodic sales, overstock sales, and multiple shipping options. They encode their pricing structure in a RIF ruleset which they want to give to customers so that the customers can computationally determine their best timing and grouping of of orders. This provides a mutual advantage, as long as Acme designs their pricing structure to accurately reflect their costs and business goals.

Unfortunately, at the time of this effort, Acme finds there is no standard RIF dialect which supports pricing structures varying over time. They can publish a simplified version of their rules, without time-varying parts, but that version would be missing some important information. So they meet with their two biggest customers, who they know are rules-savvy and design an extension to a RIF dialect which gives them this functionality.

Some questions arise:

Do they need permission from anyone to do this? (That is, is RIF user extensible or not?)
What should the syntax look like? What namespace? (How do they avoid language conflict?)
What happens if RIF-WG wants to make it a standard later? Will the namespace have to change?
Can they still publish just one ruleset, and have it work for both the users understanding the extension and those not? (That is, is there an effective fallback mechanism?)

1.4. Goals

These are some of the less obvious goals. Maybe they should all be enumerated here.

User Extensibility: user communities should be able to deploy dialects without any sort of approval from anybody (eg W3C)

Let user extensions function as prototypes for standards: it should be possible to prototype possible standard dialects using the user extensibility mechanism, so that (for instance) some community can gain practical experience with a feature before W3C incorporates it into a standard.

2. Managing Overlap

The straightforward way to provide for backward compatibility is to ensure that there are no language conflicts. This requires that designers of extensions never accidentally use the same syntax, and that they are careful to use the same semantics when they do use the same syntax.

2.1. Choice: How Do Extension-Creators Discover Overlap?

Because RIF uses an XML syntax with XML namespaces (which are URIs), there are several options here.

2.1.1. One Namespace + Best-Effort Coordination

The "do-nothing" solution is for all extensions to stay in the RIF namespaces and to ask people to try to coordinate their efforts as best they can, such as by publishing web pages and/or papers about their work, and doing a thorough search for any work that might use the same syntax.

Con:

Accidental re-use may occur, with no one at fault, eg due to language differences
Unclear how to resolve disputes

2.1.2. Central Authority

Another approach is to require that all extensions be coordinated through the W3C (such as by the RIF-WG).

Con:

Expensive for W3C and RIF-WG partipants
Sets a high barrier on creating extensions
Keeps W3C in the critical path

2.1.3. One Namespace + Informal Central Registry

Between those two options is a third: have W3C run a mailing list and/or wiki page and require that all extension creators discuss their work their in advance, as part of claiming some syntax.

Con:

Potential for problems if disputes arise. (Might end up being expensive for W3C and participants in any relevant WG.)
Keeps W3C in the critical path

2.1.4. Independent Namespace Authorities

(Also called URI-based extensibility)

The last option is to use the namespace URIs as a point of coordination. This is equivalent to options 2 or 3 for W3C namespaces, but allows anyone else to mint a namespace URI and use it for their extension. Then they would not need to coordinate with W3C on their extension.

They should, however, be required to provide for people extending their work. That is, available via the namespace URI there should be some sort of registry, informal or not, which allows users wishing to build beyond their extension to coordinate their syntaxes.

Con:

More complicated

2.2. Random Questions

Does every syntactic element bless by RIF-WG go in one w3.org namespace, multiple w3.org namespaces, or can some of them be in non-w3.org namespaces?

implies: Do user extensions have to change namespaces if they become part of a W3C recommendation?

Do we ever allow invisible extensions?

How do you know when extensions are compatible?

Do people ever need to specify dialects, or can they just specify extensions from a null core? Maybe BLD is a package of 6 extensions? Or maybe it's just one extension? Is that a useful concept?

Do we recommend/allow/forbid document-level flags to change the meaning of syntactic elements? e-mail.

3. Fallback Procedures

3.1. Choice: What Fallback Mechanisms Are Mandated?

If a system receives a document which does not conform to the syntax of any dialect it implements, what should it do?

3.1.1. Trim To Fit

The simplest fallback procedure is to ignore XML subtrees which fall outside the grammar of some implemented dialect.

Pro:

Very simple to implement
Needs no information about the extensions or dialects used to do the fallback (but still does for Impact)

Con:

May fall back very far, making huge changes to document

3.1.2. Incremental Trim To Fit

A variation on Trim-to-Fit is to recursively replace the roots out-of-grammar subtrees with their content, potentially encountering some usable in-grammar elements. This is the approach famously taken by HTML.

Same Pros/Cons as Trim-to-Fit. Determining which approach would be better for RIF will require analysis of actual extension syntaxes.

3.1.3. Fixed Substitution in Fallback Data

There can be some fixed text to be substituted for particular branches of the extended syntax.

3.1.4. Fixed Substitution in Instance Data

The document itself, at the point of using an extension, might include alternative text to use if the extension is not known. Somewhat HTML's [http://www.w3.org/TR/html401/struct/objects.html#h-13.3 <object>] specification, which says that if you can't render the object, you should try to render the provided content in its place.

For RIF, this might be provided as a "Extension" element, explicitely giving the text you should use if you implement the extension and the text you should use if you do not.

This approach segregates extended dialects from other dialects. If an extension were incorporated as part of a standard dialect, its syntax would have to change.

See XMLSchema WG proposal for FallbackElement.

Pro:

Very straightforward

Con:

Clutters up instance documents, possibly making them very large

3.1.5. Template Substitution in Fallback Data

The fallback might be a set of single-pass substitution rules, where the new text is given with variables which are bound with values from the old text.

3.1.6. XSLT

For a full, Turing complete, fallback mechanism, one could use XSLT. That is, for each extension, there can be an XSLT "stylesheet" (transformation ruleset) which rewrites documents using the extension into documents not using the extension. Having proper independence may be difficult -- it may be hard for XSLT to do its job properly in the presense of other extensions.

Pro:

Powerful enough to handle rewriting away syntactic sugar (so such extensions become automatic)
A W3C Recommendation with considerable support and uptake; fairly mature technology

Con:

Unknown how hard it will be to write these transforms

3.1.7. BLDX

Another option for fallback would be to use a RIF ruleset in some fixed dialect (called "BLDX" as a placeholder here, and assumed to be similar to BLD). This might be Turing complete (like BLD) or not (eg the datalog subset of BLD). While it could operate on the documents at the XML tree level here, it's probably better to take advantage of the frame/object model of the syntax and operate at that level. That is, rules map from a frame facts about the syntactic content of the input document to frame facts about the output document. (Probably needs NAF; might like access to modular databases or something.)

Pro:

Powerful enough to handle rewriting away syntactic sugar (so such extensions become automatic)
Closer to the problem domain than XSLT
Operating at the frame/object level should make independence of extensions easier to support
A Cool demonstration of RIF (eating our own dogfood)

Con:

Less flexible about arbitrary XML that XSLT
Unproven technology
A fairly high bar on implementation. (Hopefully, BLDX would be not far beyond Core, so it doesn't raise the bar much.)

3.2. Choice: Where Is The Extension Metadata?

Again, if a system receives a document which does not conform to the syntax of any dialect it implements, what should it do? Specifically, how can it learn which elements belong to which extensions? How can it learn what Impact is associated with each fallback option it has? How do I learn which fallbacks to perform?

Some of these options parallel the Approaches to Discovering Overlap, but there are some others available here, too.

All the net-access approaches involve some possible security risks. Perhaps the links could optionally include a secure-hash (if you want to make sure no one changes the extension metadata) or a public key (if you want to let the extension metadata change but are worried about imposters changing it).

3.2.1. Best Effort (Publish+Search)

Basically, use Google to try to find the on-line version of the specification of the extension. Hard to automate reliably.

3.2.2. Central Authority

Look in some w3.org registration database. Keeps W3C in the loop.

3.2.3. Independent Namespace Authorities

Dereference the namespace URIs (maybe with the element names added) to get the metadata.

This is a superset of "Central Authority", since W3C would be the namespace authority for the elements in the RIF namespace.

3.2.4. Inline

All the fallback metadata is in the document itself.

Pro: works off-line Pro: allows localized semantics (nice for local patching of bugs, experimentation) Con: might make RIF files prohibitively larger Con: allows localized semantics (fallback code more likely to diverge from standards)

3.2.5. Imports

The fallback metadata is on the web, in a location named in the document.

Allows for localized semantics like "Inline".

3.2.6. All Of The Above

It's possible to have all of these at the same time. Systems gather data from inside the document, from following import statements in the document, and by dereferencing namespace URIs. Some approach to conflicting information is needed -- maybe the document can say explicitely whether it means to override or be overriden by the namespace documents. (If you're providing the data so that applications still work offline, then you probably want the namespace document to be used if available. If you're providing the data in the document because you want localized semantics, then you don't care about the namespace document.) Maybe there can just be a way to turn off dereferencing of particular namespaces.

3.3. Choice: How complex an impact structure?

The design of the impact information structure depends on what you might do differently, based on the information.

Many prior extensible languages just use 1-bit of impact information. Sometimes (eg in SOAP) it is a "Must Understand" flag. Other times anything that is not "must understand" is considered to be metadata.

The rest of this section is from the Arch/Extensibility2 strawman.

Each fallback substitution has zero or more two-part impact flags. Each flag consists of a type and a severity, indicating what kind of effect performing that substitution will have. Based on the type and severity, different types of RIF-consuming systems can behave differently.

Repeats of the same impact SHOULD NOT have greater impact.

soundness : performing this substitution will make the resulting ruleset produce incorrect answers and/or behaviors.
completeness : performing this substitution will make the resulting ruleset produce fewer distinct answers and/or behaviors than it otherwise would. If the results would have been complete before, they no longer will be.
performance : performing this substitution may cause rule processing systems to handle this ruleset with significantly degraded performance
presentation : this substitution only affects aspects of the ruleset intended for human readers.

Under certain conditions, impact flags may be interrelated. For instance, if a negation-as-failure component is being used, a completeness impact flag being set should automatically raise the soundness impact flag.

Systems MUST NOT silently perform any fallback substitution which has even a slight chance of producing incorrect answers or behavior. Instead, software SHOULD inform users of fallback substitutions which have minimal affects and SHOULD require confirmation from users before performing fallback substitutions which may have greater affects. RIF Software MUST NOT lead a reasonable user to think that errors stemming from fallback substitutions are due to faulty input.

Suggested Response

This table indicates suggested handling of impact information by a system which answers queries for users, using reasoning on a rulebase provided in RIF.

	Soundness	Completeness	Performance	Presentation
Major	Serious Error	Warning	Warning	Ignore
Intermediate	Serious Error	Warning	Warning	Ignore
Minor	Warning	Warning	Ignore	Ignore

3.4. Random Questions

Can we use the namespace+element URI to look up the fallback information?

Must we support off-line fallback?

Do we have a simple flag indicating non-semantic (metadata) extensions? (Do we consider metadata an extension?)

Is there a way to make the fallback processing extensible? Could we make only Trim-to-Fit manditory, but have XSLT or BLDX be encouraged and motivated?

4. Strawman

4.1. Overview

All official dialect elements go into rif: namespace. That namespace will dereference just like everyone else's SHOULD:

pointers to fallback/impact information
pointers to documentation
pointers to community resources

User extensions go in separate own namespaces (which might happen to be on w3.org, or purl.org, or whatever). If they become official, the namespace has to change. But fallback substitution between the two should mean implementations and data don't have to change.

extensible fallback options, up to fixed-replacement.

no invisible extensions

having in-line data or in-line imports is an extension.

no specific notion of metadata -- it's just extensions for elements which can be ignored with minimal impact.

4.2. Input Processing Procedure

Proposed, that all systems which consume RIF MUST do this:

You have a RIF document to process
You try to parse it (or schema validate it) according to each of the dialects you know. If you succeed with any of them, you're done. Otherwise...
You parse the document using the all-RIF schema to obtain the dialect metadata.
The metadata (or its absence) will indicate whether you should do web-based fallback processing. If so, you must either do it, or give the user an error message. If it fails, the user must be given an error message. If it succeeds, you have more dialect metadata.
The dialect metadata allows you to construct additonal XML schemas. The document must be schema valid with respect to at least one of them.
The dialect metadata also includes fallback/impact information, back to a base dialect you probably implement. You must do this transformation or warn the user. Based on the impact, you may have to warn the user

4.3. Dialect Metadata

Whole Dialect include grammar for dialect, and set of zero or more fallbacks to other dialects.

Extension includes a partial grammar to match against and additional branches to add to it. Also, zero or more fallbacks which transform from the modified grammar back to the original.

Names any element/attribute URIs which are not to followed.

Names any additional URIs which are to be followed.

Fallback procedures are named by URIs, which are to be followed unless also given in dialect metadata.

Namespace documents, fallback data, etc, is all in RDF/XML. (Alternatively, in some RIF dialect.)

@@ Can we add content hashes, somehow, later?

4.4. Fallback Functionality

xml tag/attr substitution [ with this impact ]
omit this subtree [ with this impact ]
replace this subtree with its content [ with this impact ]
replace this subtree with this value [ with this impact ]

Additional fallback mechanisms may be specified later; esp XSLT and BLDX. This is done by simply saying that if you don't understand some of the Extension data, you ignore it. (@@ Is that okay? Elsewhere, we seem to want schema validity.)

Impact information is as in previous strawman.

5. Example Extensions

See RIF Dialect Structure, and other versions of that....

5.1. Add Neg (classical negation) to BLD

5.2. Add SMNAF (Stable Model Negation-As-Failure) to BLD

5.3. Add WFNAF (Well-Founded Negation-As-Failure) to BLD

5.4. Add Lists to BLD

http://www.w3.org/2005/rules/wg/wiki/Core/List_Constructor

5.5. Add Object-Oriented Non-Monotonic Inheritance to BLD

email ,

6. References

David Orchard has edited a series of drafts on "versioning" for the TAG and XML Schema Working Group: