Cover page images (keys)

Questions (and Answers) on the Semantic Web

$Date: 2007/06/05 14:57:53 $

Ivan Herman, W3C

We all know that, right?

WRONG!!!!

Goal of this presentation…

 

Is the Semantic Web AI on the Web?

No!

Picture of a hype article with a text on it saying 'Beware of the Hype'

So what is the Semantic Web?

Example: Automatic Airline Reservation

Example: data(base) integration

Example: data integration in life sciences

Left side: data silos, each its own representation on a screen, with scientist interpreting; right side: same silos, converted to rdf and co, scientist doing data right away.

And the problem is real

screen dump of three different Life Science databases with mutually different interfaces

So what is the Semantic Web?

The Semantic Web is… the Web of Data

And what is the relationship to AI?

A possible comparison

Smarter machines
  • teach computers to infer the meaning of Web data
    • natural language, image recognition, etc.
  • …this is the Artificial Intelligence approach
Smarter data
  • Make data easier for machines to find, access and process
    • express data and meaning in standard machine-readable format
    • support decentralized definition and management, across the network
  • …this is the Semantic Web approach

(I know, all comparisons are wrong, but it may still help…)

 

All right, but what is RDF then?

RDF

RDF (cont.)

 

But isn’t RDF simply an (ugly) XML application?

RDF is a graph!

A Simple RDF Example

A Simple RDF Graph with full URI-s
<rdf:Description rdf:about="http://www.ivan-herman.net">
    <foaf:name>Ivan</foaf:name>
    <abc:myCalendar rdf:resource="http://…/myCalendar"/>
    <foaf:surname>Herman</foaf:surname>
</rdf:Description>

Yes, RDF/XML has its Problems

Use, e.g., Turtle if you prefer…

<http://www.ivan-herman.net>
  foaf:firstName "Ivan";
  abc:myCalendar <http://.../myCalendar>;
  foaf:surname "Herman".

 

But what has RDF to do with data integration?

Consider this (simplified) bookstore data set

ID Author Title Publisher Year
ISBN 0-00-651409-X id_xyz The Glass Palace id_qpr 2000

 

ID Name Home page
id_xyz Amitav Ghosh http://www.amitavghosh.com/

 

ID Publisher Name City
id_qpr Harper Collins London

Export your data as a set of relations…

The previous table in an RDF format

Add the data from another publisher…

The French and English data side by side

Start merging…

The merged data with nodes with identical URI-s pointed out

Simple integration…

The merged data with one of the nodes merged with common URI

Note the role of URI-s!

 

So what is then the role of ontologies and/or rules?

A possible short answer

This is where we are…

The merged data with one of the nodes merged with common URI

Our merge is not complete yet…

Better merge: richer queries are possible!

The merged data with extra nodes identified as a result of identifying same as properties

What we did: we used ontologies…

And then the merge may go on…

The merged data with a reference to a Wikipedia entry on the author

…and on…

The merged data with a reference to a Wikipedia entry on the author plus other books he wrote

…and on…

The merged data with a reference to a Wikipedia entry on the author plus other books he wrote plus a reference to Calcutta refereeing to the google map entry

Is that surprising?

It could become even more powerful

You remember this statement?

And this?

Tradeoffs

concentric arcs with RDF, RDFS, OWL Lite, DL, and Full

Also…

 

So what does “inference” means on the Semantic Web? How do you “deduce” things?

Remember the “same as”?

 

Where do the data and ontologies come from?

(Should we really expect the author to type in all this data?)

Pure RDF data: not always a solution…

Data may be around already…

Data may be extracted (a.k.a. “scraped”)

Formalizing the scraper approach: GRDDL

<html xmlns="http://www.w3.org/1999/">
  <head profile="http://www.w3.org/2003/g/data-view">
    <title>Some Document</title>
    <link rel="transformation" href="http:…/dc-extract.xsl"/>
    <meta name="DC.Subject" content="Some subject"/>      
    ...
  </head>
  ...
  <span class="date">2006-01-02</span>
  ...
</html>
<rdf:Description rdf:about="…">
  <dc:subject>Some subject</dc:subject>
  <dc:date>2006-01-02</dc:date>
</rdf:Description>

GRDDL (cont)

Another Future Solution: RDFa

RDFa example

<div about="http://uri.to.newsitem">
  <span property="dc:date">March 23, 2004</span>
  <span property="dc:title">Rollers hit casino for £1.3m</span>
  By <span property="dc:creator">Steve Bird</span>. See
  <a href="http://www.a.b.c/d.avi" rel="dcmtype:MovingImage">
  also video footage</a>…
</div>
<http://uri.to.newsitem>
  dc:date             "March 23, 2004";
  dc:title            "Rollers hit casino for £1.3m;
  dc:creator          "Steve Bird";
  dcmtype:MovingImage <http://www.a.b.c/d.avi>.

Linking to SQL

Common in RDFa and GRDDL

And for Ontologies?

There are already ontologies around…

“Core” vocabularies

A mix of ontologies/vocabularies (life sciences)…

diagram showing a large number of HC related ontologies bound via a RFD-like graph

 

How do I extract triplets from and RDF Graph? Ie: how do I query an RDF Graph?

Querying RDF graphs

Simple SPARQL Example

SELECT ?cat ?val # note: not ?x!
WHERE { ?x rdf:value ?val. ?x category ?cat }
a simple graph with two tree like subgraphs
a simple graph with two tree like subgraphs left subgraph highlighted a simple graph with two tree like subgraphs with selected nodes in the left subgraph highlighted a simple graph with two tree like subgraphs with selected nodes in the right subgraph highlighted a simple graph with two tree like subgraphs with selected nodes in the right subgraph highlighted

Other SPARQL features

SPARQL usage in practice

SPARQL as a federating tool

diagram showing a sparql that can be connected to an rdf datafile, a document via grddl, and to a database via an sparql/sql bridge

 

Isn't This Research Only?

(or: does this have any industrial relevance whatsoever?)

Not any more…

Not any more… (cont)

Network effect

Metcalfe’s Law:

the value of one node is proportional to the number of other nodes

Small community: niche applications

Some RDF deployment areas

Library metadata Defence Life sciences
Problem to solve? single-domain integration yes, serious data integration needs yes, connections among genetics, proteomics, clinical trials, regulatory, …
Willingness to adopt? yes: OCLC push and Dublin Core initiative yes: funded early DAML (OWL) work yes: intellectual level high, much modeling done already.
Motivation light strong very strong
Links to other library data phone calls records, etc chemistry, regulatory, medical, etc
Showcase? limited not at all yes, model for other industries.

Some RDF deployment areas (cont)

The “corporate” landscape is moving

Applications are not always very complex…

The Active Semantic Doc picture: a doctor's file with annotations

Data integration

Example: antibodies demo

Antibodies' demo screen dump

There has been lots of R&D

MuseoSuomi Application dump Traditional Chinese medicine example dump

Portals

Vodafone screen dump

Improved Search via Ontology: GoPubMed

GoPubMed Application dump

Adobe's XMP

XMP Application dump

Baby CareLink

Baby care link application dump

Other Application Areas Come to the Fore

Summary

 

Thank you for your attention!

These slides are publicly available on:

http://www.w3.org/People/Ivan/CorePresentations/SW_QA/

in XHTML and PDF formats; the XHTML version has active links that you can follow

 

Thank you for your attention!