data type extenibility (ISSUE-34)

[ This completes ACTION-460, to propose a solution to ISSUE-34,
extensibility of datatype support. ]

In many languages, such as RDF and OWL, datatypes are an extensibility
point.  Data values are transmitted by being serialized according to
some datatype definition, and then that serialization (a character
string) is transmitted along with a URI which identifies the datatype
used.  In this way, the language definition is separated from the
definitions of any particular datatypes.  In practice, the language
reuses datatype definitions, by reference, from other standards.  The
vision is, I think, that new datatypes will be defined and used
without any need to change the language definition.

The use case is this: two parties want to exchange rules which use
data values not included in the standard set for RIF BLD.  Of course,
like any private parties they can make their own versions of a
language.  But (1) they want to use standard RIF software with only
minimal changes to support the new datatype, and (2) they want to be
able to include third-party consumers who will not have this datatype
implemented.

* OPTION 1: Unrestricted Datatypes

This is a "do nothing" solution.  RIF could specify which datatypes
are mandated and says you may implement more.  It doesn't say how to
respond when receiving a RIF document which contain unimplemented
datatypes.  Implementors can best decide for themselves whether to
ignore the rules, replace the data values with empty strings, give
warnings, whatever.

The danger here is in getting uncontrolled, unpredictable behavior.

* OPTION 2: Require Errors

RIF could specify exactly what you have to do: if any unimplemented
datatypes are present in a ruleset, you MUST NOT continue without
explicit confirmation from a higher level (like the user, saying how
to handle it).  You MAY simply abort with a fatal error.

The danger here is that introducing new datatype becomes probitively
difficult, in public/open systems.  You have to know all the consumers
have implemented it before the producer can include it, but in
Web-like systems that's very difficult.   Imagine if web browsers
gave a fatal error on receiving a page which didn't exactly conform to
some frozen version of HTML.

* Option 3: Require Ignoring

RIF could specify exactly what you have to do: if any unimplemented
datatypes are present in a ruleset, you MUST ignore the ... ruleset?
rule?  I don't really know how to flesh this out, exactly.

* OPTION 4: XTAN Fallback

Use XML Transform-as-Needed (what I demonstrated in Paris at F2F9) to
identify places where a datatype is used which is not in BLD and not
implemented.  If one is found, use the fallback/impact mechanism to
rewrite it and/or determine the seriousness of the error.

If someone wants to actually pick a candidate datatype here, and the
WG is encouraging, I'll do an implementation.

* Some concerns/issue recorded at F2F7:

> whereas defining new dialects using an extensibility mechanism is
> (potentially) hard to do and requires a lot of work, extensibility
> of datatype support is easy; it would certainly be overkill to
> define a new dialect for each data type one additionally wants to
> support  

It's pretty easy to add, using XTAN, to define a new datatype, I
think, and specify what the fallback procedures and impact will be.

It's a little hard to be sure without having some good, realistic
ideas about what new datatypes are likely to be needed.

> having extensibility of datatypes support does not change the syntax
> or semantics of the language 

Nor does XTAN (option 3).

> RDF and OWL users do not seem to have a problem with this kind of
> extensibility (which is present in both languages) 

I don't think datatype extensibility has been used successfully on the
open web in RDF or OWL (or anything else) yet.  I could ask around.
My sense is that the Semantic Web community is still small enough that
consumers and producers can talk to each other and coordinate which
datatypes will be used.  But of course that doesn't scale and enable a
real Web-based system.

       -- Sandro

Received on Tuesday, 29 April 2008 01:50:38 UTC