Arch/Extensibility2 - W3C RIF-WG Wiki

This is proposed text for the Extensibility section of Arch

It currently has some new sections and some old sections that need to be smooooshed together.

The design of RIF is intended to provide general extensibility, including support for both backward and forward compatibility. Software which is backward compatible can handle old formats, and RIF is designed to make this easy to do. Software which is forward compatible can, to some extent, handle future formats, defined after the software is written. Together, they make it practical for an interchange format to change and grow, to better serve the evolving needs of users, even when it is already widely deployed. In the case of RIF, the same design also allows language dialects which are not directly compatible to co-exist and to partially interoperate.

The RIF model of extensibility is based on the idea of having multiple RIF dialects, each with its own XML syntax. Whenever new features are needed, a new dialect can be published which supports the new features. Dialects may overlap in syntax, and when they do, they are required to have identical semantics for documents in the area of overlap. To the extent that new dialects reuse the syntax of previous dialects, maximizing the overlap, backwards compatibility is provided. A Web-based mechanism is provided by RIF to help coordinate overlap.

RIF supports forward compatibility by allowing each dialect to have a set of on-line "fallback" procedures, along with associated "impact" information. When RIF software encounters a document in an unknown syntax, it can attempt to retrieve this information and use it to transform the document into a dialect it supports. The effects of the transformation (which may change the meaning of the document) are described in the impact information, allowing users and higher-level software to be properly informed about the change. This forward compatibility design allows features to become used by a segment of the overall RIF user base while minimizing impact on the other users.

The on-line elements of the extensibility mechanisms, mentioned above and detailed below, rely on XML namespaces and use the Web as a decentralized information store. This approach leverages Web Architecture and Semantic Web concepts to provide high functionality at a low implementation cost and with no need for a central registration authority.

Linked Dialects (NEWER)

In order to coordinate between dialects (to manage overlap) and to allow implementations to download fallback information on-demand, we define "linked dialects" as dialects which are coordinated with Web content as defined in this section.

Not all RIF dialects need to be linked, but unlinked dialects do not support RIF's extensibility features. It is expected that dialects in wide use will be a linked, while dialects in early development and those used privately by individuals and small organizations will typically be unlinked.

Each linked dialect must have a single IRI, used to refer to the dialect. Dereference of that IRI, with the content type "application/rif+xml" MUST return a RIF document in [@@some dialect] which include [@@some information]. The dereference MAY include HTTP redirection steps.

Every IRI used in the dialect's abstract syntax (as identifiers for syntactic classes and properties) must similarly return RIF content. The RIF content, and the IRIs (after fragment truncation) MAY be the same for all syntactic elements and for one or more dialects.

The content for the syntactic elements must include a list of all the linked dialects which use the syntactic element. The person or organization responsible for maintaining that content MUST, on a request in good faith, include the reference IRI of any dialect which claims to use that syntactic element.

Syntactic Overlap (NEWER)

The key principle to providing scalable backward compatibility is that the semantics of a given linguistic expression must not change between dialects. As long as this principle is followed, implementations can process input documents without any concern about what dialect they might be written in. Systems can simply try to parse each document as being written in one or more of the dialects they do implement, and if it does parse, then it can be correctly treated as if it had been written with that dialect in mind. The meaning is, by definition, the same.

This principal of non-conflict between dialects where there is syntactic overlap has been practiced for many years in data formats and computer languages as they have evolved over the years. It is relatively obvious and easy to manage if the dialects are all developed by the same organization, or if the dialects have very little overlap. (Most XML based languages, if they use namespaces, can be seen as non-conflicting. They only share syntactic elements in the "xml" pseudo namespace, and the semantics of those parts of the syntax are managed by W3C.)

Non-conflict is ensured like this: every dialect definition MUST include normative references to every prior dialect which uses the same syntactic elements. It must state, prominently, that the semantics are the same in the areas of syntactic overlap. By doing this, any contradiction between the semantics of the dialect stated elsewhere in its definition and prior dialects becomes an internal inconsistency in the specification. This is more readily apparent, and less subject to debate than possible consistency with other specifications. Of course, internal inconsistency may be debated, and internal inconsistencies may be discovered years after a dialect is deployed. This is an unfortunate fact of life. [ heh ]

More practically, each dialect SHOULD include an extensive test suite. An implementation SHOULD be tested against all the tests provided with all linked dialects reachable from the implemented dialects. The vocabulary for linking to tests and describing types of tests is specified in our test suite document [???].

Dialect Identification and Overlap (OLDER)

Conceptually, each RIF document is written in some RIF dialect. If it is syntactically valid, it conforms to the syntax of that dialect. To the extent the document conveys knowledge, it does so according to the semantics of that dialect.

Because there may be overlaps between dialects, however, the intended dialect for a document may not be apparent. For example, consider two dialects D1 and D2. Each includes all the Core components; D1 also includes components C1 and C2, while D2 includes components C1 and C3. On receipt of a document which uses the Core components and C1, a system cannot determine whether the dialect in use is D1, D2, or some other dialect which includes C1.

This apparent dilemma is solved by constraining dialects to have identical semantics in areas of overlap. To the extent the semantics of a dialect are simply the aggregate semantics of its components, this is automatic. When specification of the dialect semantics requires additional work, managing areas of overlap may also require additional work. It is the responsibility of the publisher of a component (who owns the IRI namespace of the component's syntactic elements) to coordinate these areas of overlap by at least maintaining an ordered list of normative dialect specifications. (In the case of W3C standard RIF extensions and dialects, this responsibility rests with W3C.)

Because of this overlap rule, systems are free to interpret a document using any known dialect which contains the components present in the document. In the example above, a receiving system would be equally conformant in using a subsystem which implements either D1 or D2, and it should always get the same results.

This somewhat non-traditional design is expected to allow gradual evolution, graceful fallback, and third-party extensions better than simply identifying dialects, because the implementation attention can be given to the most essential components.

Identifying Components (OLDER)

RIF components each include one or more new elements added to the syntax. These elements must each, in the abstract syntax, be a class, property, datatype, or evaluable function. As such, each one is identified by an IRI. Each of these elements is considered as belonging to exactly one component.

The set of components in use in an XML RIF document can be determined by traversing the XML tree, forming an IRI out of the tag for each element (following the pattern specified in the abstract-syntax-to-XML mapping), and gathering the resulting IRIs. These IRIs can then be used (as below) to determine which component owns each one, resulting in a set of in-use components.

Locating Component Declarations (OLDER)

Each component used in a document MUST either be declared in the document or have a declaration available on-line. The declarations can be extracted without in-depth parsing of the RIF document, and can be used to construct an XML schema against which to validate the RIF document.

If an internal declaration is used, it must occur prior to any use of that component in the document.

If no internal declaration for an component has been read by the point in the document where an element belonging to that component is encountered, consuming systems SHOULD attempt to obtain an external (on-line) declaration by performing a web dereference operation on the syntactic element IRI. Any redirects should be followed, including "303 See Other" redirects, and the most-preferred media type of the request MUST be "application/rif+xml". If content of that media type is returned, it SHOULD be parsed as RIF which includes declarations of that component. Other contents of the document MAY be ignored and MUST NOT be directly merged with the ruleset. The web retrieval operations SHOULD use web caching.

If the consuming system is unable to obtain a valid declaration by this method, the syntactic element is not part of a known component, and the RIF document is not valid.

Declaring Components (OLDER)

The abstract syntax for declaring each type of component is as follows:

@@@ this section needs more work...

class Component
   property syntacticElement: SyntacticElement*

class SyntacticElement
   property fallback: Fallback*

   subclass SyntacticClass
      property subClassOf: SyntacticClass

   subclass SyntacticProperty
      property use: SyntacticPropertyUse*

   subclass Datatype

   subclass EvaluableFunction

class SyntacticPropertyUse
   property onClass: SyntacticClass
   property allValuesFrom:  @@@ # SyntacticClass or Datatype
   property minCardinality: int?
   property maxCardinality: int?
   # property ordered: boolean

class Fallback
   property impactType: ImpactType
   property impactSeverity: ImpactSeverity
   property replacementContent: @@@  # instance or property/value pair?

Maybe-Obsolete Note: when a syntactic-class is declared, all its direct superclasses must be stated, but not its properties. This allows subclasses and properties to be introduced later, as more components.

Maybe Obsolete Note: when a syntactic-property is declared, all its uses must be declared. It is not possible to have an component which consists of using an existing property in a new way. This restriction allows us to maintain the one-component / one-IRI / one-XML-element-tag parallel structure, which allows easy scanning for components, as well as external declarations.

Maybe: a new class may introduce a new use for an existing property. as long as the syntactic element is present, that forces one to find the declarations for the associated component.

Extension Handling (NEWER)

Extension handling is optional in two ways:

If a dialect is unlinked, documents which use it it cannot be
- handled gracefully by systems which do not support the dialect. Such implementations MUST give a fatal error message indicating that such a document is writtin a RIF dialect which is not fully supported by its publisher.
If the RIF input handler does not fully implement RIF
- extension handling, it MUST, upon receipt of a message not in the syntax of any dialect it implements, issue a fatal error message indicating that the system cannot process the document because the system does not fully implement RIF's extension handling and that some extension handling is necessary to process this document.

There is a default "trim" procedure, which cuts until it's syntactically valid.)

Input Processing (OLDER)

To be a valid RIF document, the document must have all components declared (internally or externally) and the component declarations must provide some path of fallback substitutions which results in a RIF Core document.

There may, however, be multiple paths from the document to RIF Core, based on which substitutions are performed and in which order. This flexibility is needed to allow for falling back to other components.

Each RIF systems MUST perform fallback substitutions until it reaches a set of components it implements. It SHOULD perform substitutions which have the least impact for the type of software it is.

Handling Impact

Each fallback substitution has zero or more two-part impact flags. Each flag consists of a type and a severity, indicating what kind of effect performing that substitution will have. Based on the type and severity, different types of RIF-consuming systems can behave differently.

Repeats of the same impact SHOULD NOT have greater impact.

soundness : performing this substitution will make the resulting ruleset produce incorrect answers and/or behaviors.
completeness : performing this substitution will make the resulting ruleset produce fewer distinct answers and/or behaviors than it otherwise would. If the results would have been complete before, they no longer will be.
performance : performing this substitution may cause rule processing systems to handle this ruleset with significantly degraded performance
presentation : this substitution only affects aspects of the ruleset intended for human readers.

Under certain conditions, impact flags may be interrelated. For instance, if a negation-as-failure component is being used, a completeness impact flag being set should automatically raise the soundness impact flag.

Systems MUST NOT silently perform any fallback substitution which has even a slight chance of producing incorrect answers or behavior. Instead, software SHOULD inform users of fallback substitutions which have minimal affects and SHOULD require confirmation from users before performing fallback substitutions which may have greater affects. RIF Software MUST NOT lead a reasonable user to think that errors stemming from fallback substitutions are due to faulty input.

Suggested Responses

This table indicates suggested handling of impact information by a system which answers queries for users, using reasoning on a rulebase provided in RIF.

	Soundness	Completeness	Performance	Presentation
Major	Serious Error	Warning	Warning	Ignore
Intermediate	Serious Error	Warning	Warning	Ignore
Minor	Warning	Warning	Ignore	Ignore