Re: Can't live with TDL and untidy literals

Patrick Stickler wrote:
> 
> On 2002-01-25 19:22, "ext Sergey Melnik" <melnik@db.stanford.edu> wrote:
> 
> > In other words, the literals would have to carry along the
> > properties they are used with and/or the schema class(es) used as the
> > range of such properties.
> >
> > That is, developers would have to make literals complex objects.
> 
> Ummm... well, you have three choices:
> 
> 1. Define type locally.
> 2. Define type globally, by some schema.
> 3. Define type globally, by application environment.
> 
> You seem to be taking the third choice, and then demanding
> that the interpretation imposed by the environment be
> supported by all logical queries on an RDF graph. I don't
> find that the least bit reasonable.
> 
> If you wish to reliably interchange knowledge between
> disparate applications and environments, then all knowledge
> must be explicitly defined, either globally or locally,
> and thus while choice 3 may work for a single, tightly
> controlled application environment, it cannot be the basis
> for global interchange of knowledge.
> 
> What you may assert some literal means for your application
> may not be known by some other application which attempts
> to interpret your knowledge (possibly unbeknownst to you)
> therefore typing external to RDF is inherently non-portable.
> 
> And, yes, local typing does produce complex objects. What else
> would you expect?

As someone on the periphery of this argument, I'd say Patrick's summary
here puts it all together for me! Literals are complex objects.
Sometimes their complexity may be hidden by implicit definition but not
by *just* declaring them all to be strings and all to have the same
equivalence rules.

> 
> > Example 2: Storage
> > ------------------
> >
> > Currently, the storage backends for RDF graphs can benefit substantially
> > from the fact that RDF graphs are tidy on literals.
> 
> But are current storage backends presently based on tidy literal graphs? 

Speaking as one application writer, I'd say there is no argument for
dogmatically tidying literals in this context. The process of serving up
metadata in an industrial context consists greatly in increasing the
distinctions between textually similar data on reception of extra data.

A string passed down a news feed may (on agreement) be elevated to date
type and be queried against the equivalence classes implied by a date
type. Storage may be saved (at the lowest level) and allocated a pointer
or an id, BUT no semantic implication is drawn by the query engine -- or
at least it shouldn't be, unless supported by explicit data type
information or implicit design of the application. In this last case it
is acknowledged that the application is closed to further transfer of
information.

There is a distinction between casual storage saving and semantic
tidying of the graph. It's up to the application writer to be very
careful not to confuse the two.

Anyway the storage-space argument is certainly not sufficient to argue
for semantic literal tidying.

> > In contrast, having untidy literals would imply in a general case that
> > each occurrence of a literal needs to be stored using a different
> > integer ID. As a consequence, the database size explodes, and the
> > queries become prohibitively expensive.
> 
> I agree. And this is one argument in favor of tidy literals, but no
> problem for TDL.
> 

-- 
Martyn Horner <martyn.horner@profium.com>
Profium, Les Espaces de Sophia,
Immeuble Delta, B.P. 037, F-06901 Sophia-Antipolis, France
Tel. +33 (0)4.93.95.31.44 Fax. +33 (0)4.93.95.52.58
Mob. +33 (0)6.21.01.54.56 Internet: http://www.profium.com

Received on Wednesday, 30 January 2002 05:19:45 UTC