Semantics for the Rest of Us (Keynote)

Sandro Hawke (sandro@w3.org), W3C / MIT
ISWC, 26 October 2009, SemRUS Workshop
http://www.w3.org/2009/Talks/1026-semrus/

Outline

Four Parts:

What do you mean, "Semantics"?
- Practical, intuitive definition
Why do you want (Standard) Semantics?
- Three general use cases
Non-Standard Semantics
- Subsets and supersets
- Staying safe
- Key design principle
Ideas for the Future
- Web + Inference

Time for Discussion

Semantic Web

We did "Semantic Web"

Linked Data

Let's revisit "Semantic"

Semantic Web

We communicate via RDF Graphs, published on the Web
We use a common vocabulary of IRI terms; some terms will be recognized by a given program, others wont be.
What clever things can we make computers do?
Let's observe a few things about graphs.... (very simple)

(In)Consistency

Some graphs could never be true...

"I was born in both 1965 and in 1975"

:y foaf:birthday "07-10". 
:y foaf:birthday "07-11".
foaf:birthday rdf:type owl:FunctionalProperty.

```
:x rdf:type "Hello World".
```

What about this:

:z foaf:name "Sandro Hawke".
:z foaf:firstName "John".

Equivalence

Some graphs say the same thing:

Graph A Graph B

:Aubrey :age "7"^^xs:int :Aubrey :age "07"^^xs:int
It does not matter how old Aubrey is; they still say the same thing.

Graph A	Graph B
:Aubrey :age "7"^^xs:int	:Aubrey :age "07"^^xs:int

Entailment

Sometimes "logic" tells us whenever A is true, B must also be:

Graph A (Given)	Graph B (Entailed)
Everything which has an "age" is a Person. The "age" of Aubrey is 7.	Aubrey is a Person.
:age rdfs:domain :Person. :Aubrey :age "7"^^xs:int	:Aubrey rdf:type :Person

Given Graph A, we can answer:

Is Aubrey a Person?
What is Aubrey?
What things are of type Person?

So what are these terms like rdfs:domain?

RDF Logic Languages

Same Syntax (RDF), Different Semantics

Thus these are equivalent:

Which RDF logic language are you using?
Which semantics are you using?
Which entailment regime are you using?
Which inference rules are you using?
What must a conformant reasoner do?

Reasoner = Data Source

A reasoner takes what you know, and tells you more

Entirely predictable, unchanging

All correct reasoners for the same semantics give the same answers

Intuition: using a semantics = using another data source, with just another URI

... except it's rules.

... or it's just triples but:

It's Telepathic (it knows what you know)
It never has an original thought (it knows ONLY what you know)
It often tells you the obvious.

Remind you of anyone?

Goofy Example. See what wisdom the counselor offers?

local

Part 2: Why Use a (Standard) Reasoner?

What Reasoners Do:

Check Consistency
Check Entailments, Query Answering with Entailment

In other words:

Find inconsistencies (mistakes and disagreements)
Finding entailments (more "true" facts)

They give us more useful knowledge! (Yay!)

Usage Scenarios

Private Use
Single Source
Multi Source

Think about:

When do we need a standard language?
What else might be standardized?

Private Use

One person / organization

Apple, managing their iPod product line...

apple:iPodNanoGen1 rdfs:subClassOf pdx:MP3Player
apply:iPodNanoGen1 rdfs:subClassOf [
    rdf:type owl:Restriction;
    owl:onProperty apple:memorySizeMeg;
    owl:hasValue "1024";
]

... but privately.

They decide, internally, which reasoner(s) to use

Standard language promotes healthy market

Single Source

Alice published a graph:

:Alice :selling :item22
:item22 rdf:type apple:iPodNanoGen1.

Apple (vocabulary provider) says:

apple:iPodNanoGen1 rdfs:subClassOf pdx:MP3Player

This entails:

:item22 rdf:type pdx:MP3Player.

Charlie is looking for an MP3 Player!

Who does the Reasoning?

If Alice does it, no standard needed
- More buyers ++
- Bigger data set -
- More work -
If Charlie does it, requires standard
- More Products ++
- More work -
- Unpredictable work -
ONE of them has to do the reasoning
Maybe negotiate?

Multi Source

Actually, Charlie is looking for an MP3 Player than can run RockBox.

Bob runs a RockBox website....

apple:iPodNanoGen1 rdfs:subClassOf rockbox:SupportedDevice

Alice+Bob Entails:

?x selling [ a rockbox:SupportedDevice ]

Who does the Reasoning?

Alice doesn't know (or care) about Bob

Bob doesn't know (or care) about Alice

tree diagram showing conclusion of what Alice and Bob said

So, it has to be Charlie

... and they need a standard (RDFS here)

Part 3: Non-Standard Reasoning

Did they really need RDFS?
What else might they need?
What happens if Charlie uses a non-standard reasoner?

Why Deviate

Easier implementation
Faster reasoning
No sufficiently expressive standard (more information!)
No elegant solution using standard
Local conventions (firstname is givenName)

Subsets

Why?

Easier, Smaller, Faster, Simpler
Charlie can use some fast bit of code that does only subclassOf

What?

Fewer entailments and inconsistencies found
Charlie will miss some entailments, wont notice some inconsistencies

Result:

Charlie might miss Alice's item
But his faster reasoner might find others!

It's like deciding whether to consider other search results

When to Subset

Okay if data is already incomplete

Good if resources better used elsewhere

Not okay to present as complete

Not okay if negated

If no one is selling an MP3Player, then...

Supersets

What:

More entailments and more graphs defined to be inconsistent
Charlie will get more information, catch more "errors"

More information is good, right?

Only if it's right.

Beware unwarranted assumptions

Superset Exampls

Some possible supersets:

Items described in Apple namespace are inferred to be of type apple:Product
Because apple:memorySize has range apple:iPod, Alice' data is considered inconsistent for not providing a memorySize
Because Alice lists between 100 and 10,000 items for sale, Alice is classified as MidSizeVendor
Because Alice is selling an iPod, and she probably wont remember to erase her music from it, and copying music is illegal, Alice is classified as a Criminal.

So when are superset semantics okay?

Safe Supersets

Did the source say that?

Did the source mean to imply that?

Or was it just something you inferred, on your own?

Safety is in how you present results:

Alice is a Criminal.
According to the Evil-Music-Consortium definitions, Alice is a Criminal.

Sorry, I didn't mean to imply you were a criminal...

The hard part: what was or was not implied?

Does this...	Imply this?
:x rdf:type :A. :A rdfs:subclassOf :B	:x rdf:type :B.

Does this...	Imply this?
:z foaf:name "Sandro Hawke".	:z foaf:firstName "Sandro".

If the shoe fits

PROPOSED: If you use the URI (in the right graph pattern), you're endorsing use of the published semantics.

Use "extensible semantics"; never make that ambiguous or contradictory.

Any additional semantics must be triggered by use, by the author, of additional syntax

Example: OWL 2

If DL graph pattern, then use either semantics (they're the same)
If not DL graph, then use RDF-Based Semantics

Okay for W3C Recommendations, but what about for all semantics with a web page?

Part 4: Some Ideas for Standards

class and property dereferencing

importing RIF

reasoner negotiation

downloading

PCID

Property and Class Identifier Dereferencing (PCID)

Recursively dereference every RDF class and property IRI, and merge the results.
More limited than dereferencing all term IRIs; typically small
Looks like another entailment regime to me
TimBL's been suggesting this for years
I think it'll work well....

RIF

RIF meant to be used with RDF, but import from RDF

use owl:import? use PCID?

Additional RIF dialects, FOPL, LP, ....

Reasoner Negotiation

Charlie tells Alice the entailment regimes he'll be using, so she can skip doing those.

Skip-Inference: OWL-DIRECT

Alice tells Charlie which entailment regimes she has (completely) used, so he can skip doing those

Did-Inference: OWL-RL

... or put in in the graph, or in a metadata graph.

Downloadable Semantics

Dave wants a lessThan predicate:
{ _:x owl:sameAs 7; _:x dave:lessThan 3. } is inconsistent, in Dave's semantics
At the dave:lessThan IRI, Dave puts a RIF Core ruleset which "implements" his semantics
That ruleset, combined with a graph, is inconsistent when the lessThan relation is violated, and entails all the graphs for which it holds.
Reasoners doing PCID+RIF_Core will now automatically implement dave:lessThan
Such reasoners get OWL-RL for free?

Downloadable Plugins

"That's just using RIF Core as a programming language", you say.
Okay, then: lets also do it for Java bytecode and Javascript
At dave:lessThan would be links to the appropriate code
A small number of RDF reasoner plugin APIs would be needed
Maybe also try to download from your reasoner supplier.

Advice to Users

Data Source Providers SHOULD include all useful entailments

Data Source Providers MUST publish only consistent data

Data Consumers SHOULD check for consistency across all included data sources.

Data Consumers SHOULD compute all entailments they'll find useful.

Reasoners MUST report the semantics they are using (including whether algorithm is incomplete/unsound)

Entailment Regimes MUST use Extensible Semantics.

Summary

What do you mean, "Semantics"?
- A semantics (or a logic, or an ER) is a spec for a set of equivalent reasoners
- Intuitively: a (published) collection of inference rules
Why do you want (Standard) Semantics?
- Private Use: Better tools for ontologists
- Single Source: Save on bandwidth
- Multi Source: Emergent, synthesized knowledge
Non-Standard Semantics
- Subset are okay if you're incomplete anyway
- Use supersets at your own risk
- Alice States, Alice Implies, Charlie Infers.
- Never define semantics which conflict someone else's
Ideas for the Future
- Promote deployment of PCID
- Downloading Semantics: Standardize the Glue
- Document Best Practices, Transition Strategies
- More Workshops!