Rule Language Standardizations: Report from the W3C Workshop on Rule Languages for Interoperability

Summary:

In April 2005, the W3C held a two-day workshop to gather data and explore options for establishing a standard web-based language for expressing rules. Over eighty representatives from various vendors, user communities, and research groups attended and reported on their views, experience, and ideas.

More than a dozen use cases were presented for rule language standardization, and about a half-dozen candidate technologies were presented and discussed. The workshop confirmed the differences among types of rules, such as "if condition then action" rules and "if condition then condition" ones. It also reviewed some of the difficulties in uptake that rules technologies have had in the past, but there was a general sense of new opportunity.

The workshop gave many indications that a W3C Recommendation here would be useful, but it was less clear what sort of standard would satisfy a sufficient base of users. In any Activity Proposal following from this workshop, a Working Group should be given a clear and narrow scope, making it easy to determine its relevance to various parts of the greater rules community.

1. Introduction

On 27 and 28 April, 2005, the W3C held a workshop in Washington, DC, to gather information about standards work related to rule languages. The Call for Participation was released on 15 February. It covered the background and goals of the workshop, and explained that attendance was limited and position papers were required. (Multiple people associated with a position paper were allowed to attend, space permitting.)

After review, the program committee accepted 71 papers and selected a subset of them for presentation, as listed in the program. 82 people attended, representing a wide range of interest. W3C is grateful to ILOG for hosting the workshop and DARPA for additional financial assistance.

2. The Program

The workshop was organized into nine sessions. The first two sessions were devoted to introducing the various communities to each other and to collecting general requirements. The next two sessions were concerned with the presentation and discussion of existing or developing technologies, either candidates technologies for a standard rule language for the Web, or rule standards developed for different purposes in other organizations. Most of the second day was devoted to use cases, with the exception of an additional session on candidate technologies and a panel on next steps which concluded the workshop.

Each session (except the closing panel) consisted of three to six presentations of 15 or 20 minutes, based on selected position papers, followed by about half an hour of discussion.

2.1. Introductory Sessions

The first session introduced everyone to some of the range of backgrounds and goals in the room. There were presentations from three people coming from three different backgrounds: business rules, logic programming, and the Semantic Web. It is tempting to divide the workshop participants into these three camps, or along some other lines, but it would be a mistake: behind the presentations and people's immediate concerns, there were often much deeper similarities than differences. It often seemed as if each participant had a deep interest in end users and domain applicability, in understanding and improving the mechanics of rule systems, and in making open distributed information environments. More than anything else, everybody seemed eager to learn about each others' use of rules and their requirements. The different areas of professional emphasis therefore suggest compatibility and synergy, rather than an underlying conflict.

The second session had two presentations proposing scopes for a standard, and one on the W3C approach to standardization. Both scope/requirements presentations suggested that no single rule language would cover all the requirements but that there could be a common core to a family of languages. They came from different perspectives and the workshop reactions were mixed. Much of the discussion was focused on the 95% issue.

2.2. Candidate Technologies

Seven candidate technologies were presented in two sessions: WSML, RuleML, SWSL, N3, SWRL, Common Logic, TRIPLE. These are primarily academic efforts; all but RuleML are concerned explicitly with knowledge representation, mainly or only for the Semantic Web (except CL); none deals directly with production rules (if condition then action).

The discussion revolved largely around formal issues and semantic features, especially with respect to handling defaults, Negation-As-Failure (NAF), and the Closed World Assumption (see issues). What are the requirements, what constitutes a candidate technology for standardization, what kind of specification is needed for a rule language? Perhaps a model theory; perhaps something less formal, with test cases. Recent W3C Recommendations in related fields (RDF and OWL) do both.

The RuleML presentation claimed a slightly different ground, focusing more on the exchange format and interoperability. It also contained a proposal for the scope for the initial 9 months of a Working Group (LP expressiveness including Datalog Horn + NAF + logical functions and some additional features). The other main line of discussion was thus about what is feasible in a short time and what should be the scope of the standard. Some of the participants argued for a simple set of features to start with instead of a very rich and complex language (80/20 argument, see issues). However, warned the audience, a simple language (e.g. Datalog + NAF) must come with extensibility in mind (learning from the MathML experience and others).

A point was raised and came up again several times about testing with business cases. In general, the candidate technologies have not been tested on commercial rule bases. The common use case in the business rules community (EU-Rent) was proposed, as well as a collection of production rules used as expressiveness tests in PRR.

2.3. Related Standards

Three standards (at different stages of development) were presented: the Production Rule Representation (PRR) meta-model under development at OMG, a standard Java API for rule engines (JSR 94) and the Semantic of Business Vocabulary and Business Rule meta-model (SBVR, aka the "semantic beaver"), which was developed as an answer to OMG's Business Semantics of Business Rules RFP. They are/were developed by groups consisting of mostly commercial organizations and they are more concerned with business applications than formal knowledge representation.

The current work on PRR is limited to forward-chaining and sequential rule processing, most typically found in business rules environments/engines (ECA rules can be considered a subclass). It is meant to define behaviors, not for generic knowledge representation. The motivation is to make production rules a first-class citizen in UML models, e.g. enabling rule modeling in tools such as Rational Rose. The focus is on modeling. A compatible standard rule language — possibly a concrete syntax for the meta-model — is required for run-time rule exchange.

The lightweight JSR-94 API does not specify the behavior of the engine: it relies on the underlying rule language — explicitly out of scope of the standard — to determine the unambiguously the result of executing a rule set. The message is that JSR-94 really needs a standard rule language.

SBVR is for business modeling by business users, in their own terms, independent of implicit or explicit information technology (IT) consideration or design decision. It provides structured English for business rules from which the meaning can be extracted as formal logic.

Although not presented at the workshop, the SPARQL query language for RDF, a work in progress at W3C, is obviously related.

2.4. Use Cases

The three use case sessions drew a lot of interest, positive feedback, and many questions. Real world scenarios helped illustrate the need for rules and ontologies (including anatomical knowledge to label brain parts, situation awareness using OWL and Rules, and others such as RDF in the automotive industry, in access control, and rules for geospatial applications). The regulatory compliance and mortgage scenario talks went even further to demonstrate a clear need for rules interoperability.

Almost all the applications presented in the use case sessions were using existing standards (RDF and OWL) and/or proposed languages (SWRL). This was encouraging and will help validate the direction of this effort. However, most of them have additional requirements or comments about the usability of these standards/languages for their purpose and in the context of their applications. That raised some fundamental questions regarding the adequacy of the languages/standards for the purpose of developing certain kinds of rule-based applications. This was another sign of the demand for a rule language.

In the first use case session, we learned about applications using rules and ontologies, and discussed the level of expressiveness that is needed in such cases. The use case about content rating of large web sites was a very good illustration of rule-based applications affecting all of us. HP's presentation on Jena showed a promising set of features that seemed important to many participants.

The second session showed the need for functionality beyond what most vendors offers (fuzzy logic was cited a few times). In this session, we also learned to take into consideration the need for non-declarative or procedural functionality (see issue description). We also learned that rules are actively used in the automotive industry, and that they are a key component in compliance and regulatory systems where interoperability is a must have. IBM's use case showed the need to deal with the 80/20 rules (highly expressive languages are not always needed). The mortgage industry is one of the most important industry verticals using rule-based technology.

The final use case session provided both a Business Rules and a Semantic Web perspective. Oracle presented their work on rules technology, defining it as a strategic component to lower project risks and cut costs. Other use cases confirmed the need for expressive rules for privacy systems, and the integration of rules and ontologies (again) in government applications.

More abstract use cases, like the general Semantic Web use case where rules and data are made available on the internal or public web and then re-used without additional overhead, were discussed briefly in various sessions, including a special unofficial evening session on the Semantic Web.

3. Issues

Throughout the workshop, a variety of recurrent topics appeared. Most of them hinge on the relative importance of feature and use cases; what one person views as a core requirement, another may see as no more than a desirable feature. Some of the hard decisions raised by these issues will need to be addressed in the chartering process, some can be addressed later, and some may give way to technical solutions or an evolving understanding of the use cases and technologies.

3.1. Negation-As-Failure (NAF), Defaults, and the Closed World Assumption (CWA)

This cluster of issues, mostly called "naf" at the workshop, appears to result from a broad uncertainty about how rules on the Web are different from what people are used to. Many features of the Web (including search engines) report failure for inscrutable and unpredictable reasons. Contrast this with most traditional database systems or rule-based business applications, where the results of a query are expected to be complete. In those systems, if a fact is not returned, then we can safely assume it is not true. If a book is not listed in the inventory database, then we conclude we don't have it. On the Web, however, if a book isn't found by a search engine, it might just mean the search engine failed to crawl the appropriate part of the Web.

Rule systems often provide for negation, defaults, or rule priorities, founded on this "closed world" assumption of complete information. In the developing future, a rule like "if book1 is not in stock then recommend book2" may have to be parametrized by exactly what mechanism and what document or knowledge base scope is used to find book1 in the stock listings. The term Scoped Negation As Failure (SNAF) was proposed to indicate NAF where the scope of the search failure is well defined.

3.2. Knowledge Representation (KR) vs. Applications Programming

Another set of issues is raised by the use of rules for specifying and controlling behavior as opposed to representing knowledge.

The commercial rule vendors and some business rules users are primarily interested in forward-chaining if-condition-then-action rules (production rules). Formal logic is much happier with if-condition-then-condition. The notion was expressed that these may be duals, of a sort, with the help of assert/retract/modify actions, or procedural attachments in the condition. How will this play with users and with implementers?

One of the issues was the expressiveness of the action part (the programming capabilities). Most commercial rule vendors are very flexible with respect to what goes in the action part of the rules. No one suggested that a standard rule language should include the full power of a programming language, but that requirement must be addressed one way or another.

Another issue was declarativeness: how to guarantee that the rules will produce the same behavior in different execution environments when the result depends on the engine's rule ordering and conflict resolution strategy? Can the users be expected to explicitly declare the complete execution context/expected behavior of their rules, without relying on the underlying engine's idiosyncrasy? Or the implementers to translate the engine's strategy into additional context in the rules? Or, at the other end of the spectrum, should the standard (or subsets of the standard, modules, etc) attempt to specify the behavior of engines (same results by guaranteed same execution in different environments)? Provide tagging for different control strategies/classes of engines?"

3.3. Relationship to Description Logics (OWL)

A new Semantic Web stack was proposed with 'Rules' sitting next to OWL, on top of RDF. In some versions, the two boxes overlapped. Is this appropriate? How would a rule language relate to OWL? Mix-and-match RDF triples, or as in SWRL? Users want a language where they can represent both rules and ontologies. This topic came up in nearly every session.

3.4. Uncertainty Reasoning and Fuzzy Logic

Several use case presentations (situation awareness, DoD applications, Telecom applications, geospatial scenarios) described an interest in uncertainty reasoning and fuzzy logic. It was asserted engines which support these can work with more traditional data and rules quite effectively, as well as with rules written to use these features. (This assertion was greeted with some skepticism, however.) One workshop paper directly addressed this topic.

No one suggested that these features were necessary, but rather that they could be quite useful and fairly easy to add.

3.5. Tagged Co-Existing Rule Languages

A few participants proposed that instead of recommending one language for expressing rules, W3C could recommend a way which rules could be packaged and tagged with identifiers for their syntax and intended semantics. It was unclear whether this approach, proposed mostly as an interim measure, would provide significant benefit, and how the packaging format would be different from XML.

3.6. Syntax Options

People want rules in many different styles of syntax, largely driven by who (or what) they expect will be reading and writing rules.

There were some indicators that consensus is possible around use of XML for the sole or primary normative interchange syntax, perhaps based on an abstract syntax.

3.7. The 95% Solution

The terms "95%" and "80/20" were used in somewhat different ways at the workshop, but generally refer to the overall scope of a possible standard language.

While no one suggested the language should be more complex than necessary, the term "95%" first came up in arguing that a larger, more complete language actually makes the application programs and rule sets much simpler. It was suggested that perhaps prolog failed in the marketplace because its standard built-ins and libraries were relatively meager. So while 95% of each application may be simple, the last 5% need sophisticated language features.

In contrast, the term "95%" came up again in the claim that most users need a very simple language. Not only do they not need complexity, but anything complex (such as prolog!) will drive them off.

In more concrete terms: to provide cross-vendor portability, a standard needs to cover at least the essential features present in the intersection of the vendor feature sets. Which features in that intersection are essential?

4. Conclusions

The most obvious conclusion from the workshop is that there was significant interest in establishing a standard language for expressing rules. Each use case presented clear and immediate requirements for this work, and the overall attendance was impressive. It was less obvious what the scope of a first Recommendation should be. People have been thinking about rules in many different forms for many years, with certain commonalities but also important differences. Unfortunately, there is not even consistent terminology about the differences, so the use cases could not directly apply themselves to the differences. Can we bridge the gaps and make a unifying core which can be cleanly extended to address nearly all needs? Or will we have to pick a manageable subset of use cases and set the others aside?

The concluding panel at the workshop made some observation about our situation and how to proceed from here:

From here, the W3C team should work with members and prospective members to find a Working Group scope which is broad enough to address a significant set of use cases yet narrow enough keep the Working Group oriented on timely delivery of a practical Recommendation.

Rule Language Standardization

Report from the W3C Workshop on Rule Languages for Interoperability

See the Workshop Website.