Artificial Intelligence and the Semantic Web

Tim Berners-Lee, <timbl@w3.org>

MIT CSAIL

Artificial Intelligence and the Semantic Web

http://www.w3.org/2006/Talks/0718-aaai-tbl/

Tim Berners-Lee

Decentralized Information Group
MIT Computer Science and Artificial Intelligence Laboratory

AAAI, 18 July 2006


AI and SW

  1. SW is not AI and AI is not SW
  2. AI is a field; SW is a project
  3. SW owes a debt: used much from AI
  4. SW should be a great playground for AI
  5. AI projects should use SW to interoperate

What is AI anyway?

What is SW?

The RDF data bus

The RDF bus connects data sources and applications

db to sw

Sem Web architecture 101

Define symbols:

Chis Welty/IBM: "In the Semantic Web, it is not the Semantic which is new, it is the Web which is new".

Web architecture 101


Example


http://www.w3.org/People/Berners-Lee/card#i

http://www.w3.org/People/Berners-Lee/card  (in N3, summarized):

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <#>.

:i a foaf:Person;
foaf:family_name "Berners-Lee";
foaf:givenname "Timothy";
foaf:homepage <http://www.w3.org/People/Berners-Lee>;
foaf:mbox <mailto:timbl@w3.org>.

URI + HTTP architecture 1

The hash is an operator which joins a local identifier to a document URI to give a global identifier.

http://example.com/foo#bar

  1. Strip off #bar
  2. Look up http://example.com/foo using HTTP:
    1. Look up example.com  giving 128.0.0.1
    2. Request foo from 123.0.0.1
    3. 200 OK is returned
    4. Parse the result according to the Internet content type
  3. This gives you information about bar

URI + HTTP architecture 2

Post TAG resolution of HTTPRange-14, an optional possible operation is:
given http://example.com/foo/bar

  1. Strip off #bar
  2. Look up http://example.com/foo using HTTP as amended by TAG:
    1. Look up example.com  giving 128.0.0.1
    2. Request foo from 123.0.0.1
    3. You get a redirection 303 See Other response, indicating that the URI did not denote an information resource, but mentioning a new resource http://example.com/foo-schema.rdf
    4. Request http://example.com/foo-schema.rdf
    5. Get a 200 OK response
    6. Parse the  result according to the Internet content type
  3. This gives you information about <http://example.com/foo/bar>
Not recommended by me.
Other issues:  Content negotiation between HTMl and RDF.  LSIDs

Breadcrumbs ethos



Mythbusting

Myth: "The Semantic Web technology is Description Logic"

No, OWL is one semantic web language.

It is important that applications which need different expressiveness can use it.

But other languages must interoperate to the greatest extent possible.

They should use URIs

They should not reinvent functionality already provided by standards.

SW Arch: Same symbols, multiple languages


architectural layers

Mythbusting: Not just public data


Its like a metro, the way the lines of common concepts connect the stations of different applications


For example in biopax

When will the patterns all connect?Venn diagram showing ontologies overlapping by certian common terms

[Diagram: Joanne Luciano, Predictive Medicine; Drug discovery demo using RDF, Sideran Seamark and Oracle 10g]

Other myths


"Ontology": two patterns

SchemaTaxonomy
E.g. Bank statement: account, date, amount E.g. Human anatomy and diseases
Simple ontology. Often documents existing practice Complex ontology, difficult to make
Changes rarelyChanges continually
Domain knowledge is in the data not the ontology. Domain knowledge is in the ontology itself
Use now in enterprise and science IT Specific fields e.g. life sciences

Both patterns are important. Both patterns use OWL.

NLP vs Semantic Web

NLP Semantic Web
Words Terms of logic
Meaning is use Meaning is defined in words or code or specific use.
Word reused by everyone, no ownership URI ownership - go get your own
"Hydrogen" http://example.com/foo.rdf#Hydrogen
Defining words in terms of ontology complex, unsatisfying, never complete and a waste of time. Defining terms using words is never perfect but useful.
Natural language constantly changing Ontologies basically static
Can't benefit from injected logic Can't benefit from cloudy  stats from corpora data
Machine can find stuff Machine can make widespread inference

Distractions: Meaning of meaning

Meaning as definitionCommunity StandardsMeaning as Use
Everyone uses the same precise definition of each term. Specific communities agree to share good-enough definitions Language changes with time
Works in small close systems If communities overlap, can be global Works for poetry, makes rich natural language.
Very hard work to set up Finite work to set up No effort up front, much afterward.

When we build a system (the SW), this is a choice, not an observation

The fractal tangle


Total Cost of Ontologies (TCO)

Assume :-) ontologies evenly spread across orders of magnitude; committee  size as log(community), time as committee^2, cost shared across community.
Scale Eg Committee size Cost per ontology (weeks) My share of cost
0 Me 1 1 1
10 My team 4 16 1.6
100 Group 7 49 0.49
1000 10 100 0.10
10k Enterprise 13 169 0.017
100k Business area 16 256 0.0026
1M 19 361 0.00036
10M 22 484 0.000048
100M National, State 25 625 0.000006
1G EU, US 28 784 0.000001
10G Planet 31 961 0.000000

Total cost of 10 ontologies: 3.2 weeks. Serious project: 30 ontologies, TCO = 10 weeks.
Lesson: Do your bit. Others will do theirs.
Thank those who do working groups!

User Interface challenges

Domain-specific user interfaces are blossoming... but what about generic ones?

Goals for Rules

Web attitude




Artificial Intelligence as powerful systems

When you get home

Thank you

http://www.w3.org/2006/Talks/0718-tbl/

Tim Berners-Lee

CSAIL, MIT




END

You have gone too far.

Components: Adapting random files

Keep your existing systems running - adapt them

db to sw

Components: Triple store

Virtual severs actually figure stuff out as well as look up data

db to sw

Adapting SQL Databases

Keep your existing systems running - adapt them

db to sw

Adapting XML

Remember- RDF on an HTTP server can always be virtual

db to sw

Adapting XML: GRDDL

Remember- RDF on an HTTP server can always be virtual

db to sw

Components: Smart servers

Virtual severs actually figure stuff out as well as look up data

db to sw



Evolution: Urges and Capacities

Evolution selects for  survival and reprodcutions, which in turn require various capacities

A selection

The bits computers find hard

Emergent Systems?

FAQ: Will the WWW produce emergent phenomena?

Example: The corporation

We are not necessarily in control of things we create.