Semantic Web: Building on what exists
Building on what exists
http://www.w3.org/2006/Talks/0404-mit-tbl
Tim
Berners-Lee
MIT Computer Science & Artificial Intelligence
Laboratory (CSAIL)
Decentralized Information Group
Director, World Wide Web Consortium (W3C)
This talk
- The Semantic Web Introduction
- Integrating existing systems
- The structure of data
Semantic Web motivation
- All the data not on the web
- Spreadsheets, relational DB, XMLDB, application files
- An end to scraping
- For data, we are pre-web
SW: Everything has a URI
Don't
say "colour" say
<http://example.com/2002/std6#col>
The relational database
The element of the Semantic Web
- Can be encoded in XML
- Simplicity and mathematical consistency
- This is called Resource Description Framework (RDF)
Semantic web includes tables,...
...trees
... everything
RDF data...
...merges just like that.
Subject and object node using same URIs
RDF: Semantic links - "Joining the Web"
Verb/predicate/Property using same URIs
What will it be like?
- A very general technology: used in many, many interlocking
ways
- Imagine pre-web imaging post-web: How will it be used?
- Initial motivators will be different from later mainstream
use.
Why does it take time?
- Paradigm shift all over again
- Data is trickier, esp. to design logic languages
- Need for smaller incubator like HEP
- Data is less exciting with no browser
- Fear of having to make ontologies
Roadmap: Stack of expressive power
The Semantic Web Wave
Practical Semantic Web
- A web of data.
- Don't change existing practices
- Instrument and augment
- Use standards: (RDF, OWL, SPARQL*, RIF**)
Practical Semantic Web
- Take inventory of your data to see what you have
- Modeling data to see how it connects
- Map each thing into URI space
- Connect on RDF views, SPARQL services
- Agree on ontologies
Bottom-up ontology design
- Start with existing SQL databases
- Add information about how keys and foreign keys connect
- Remove other artefacts of the DB schema
- Note relationships to other people's concepts
RDF views of data
RDF is to data what HTML is to documents
- Technique: PHP scripts accessing relational DB
- XSLT or XQ scripts accessing XML DB
- Looking up a URI for something gives you info about it
- Relations with other things expressed using their URIs
SPARQL access to data
Query interface
- Use the same mapping as the RDF views
- XSLT or XQ scripts accessing XML DB
- Looking up a URI for something gives you info about it
- Relations with other things expressed using their URIs
SPARQL - the universal query service
- How many Web Services ask for info?
- Each can be SPARQL
- Extensible - without re-architecting
- Independent of database schema/XML schema
- Combinable
- Optimizable - mapping, caching, federating
Clients of the RDF bus
New data applications can be built on top of RDF bus, for
example:
Components: Adapting random files
Keep your existing systems running - adapt them
Components: Triple store
Virtual severs actually figure stuff out as well as look up
data
Adapting SQL Databases
Keep your existing systems running - adapt them
Adapting XML
Remember- RDF on an HTTP server can always be virtual
Adapting XML: GRDDL
Remember- RDF on an HTTP server can always be virtual
Components: Smart servers
Virtual severs actually figure stuff out as well as look up
data
Communities and Vocabularies
Universal WWW must include communities on many scales
- Communities communicate with languages
- Languages form barriers
- Barriers are essential to the community
- Communicting with other communities is expensive
- Developing wider languages is expensive
- For data web, communities map to ontologies
Applications connected by concepts
Fractal Web of concepts
- Across boundaries of scale -- personal, group, global
- Varying access levels
- Tension between local and global standards
- Society is a fractal tangle, so must SW be.
- Personal interactions on multiple scales
The semantic web is about allowing data systems to change by
evolution
not revolution
Total Cost of Ontologies (TCO)
Assume :-) ontologies evenly spread across orders of magnitude;
committee size as log(community), time as committee^2,
cost shared across community.
Scale |
Eg |
Committee size |
Cost per ontology (weeks) |
My share of cost |
0 |
Me |
1 |
1 |
1 |
10 |
My team |
4 |
16 |
1.6 |
100 |
Group |
7 |
49 |
0.49 |
1000 |
|
10 |
100 |
0.10 |
10k |
Enterprise |
13 |
169 |
0.017 |
100k |
Business area |
16 |
256 |
0.0026 |
1M |
|
19 |
361 |
0.00036 |
10M |
|
22 |
484 |
0.000048 |
100M |
National, State |
25 |
625 |
0.000006 |
1G |
EU, US |
28 |
784 |
0.000001 |
10G |
Planet |
31 |
961 |
0.000000 |
Total cost of 10 ontologies: 3.2 weeks. Serious project: 30 ontologies,
TCO = 10 weeks.
Lesson:
Do your bit.
Others will do theirs.
Thank those who do working groups!
Adopting standards
|
Costs |
Benefit (standard fails) |
Benefit (standard succedes) |
Plan A |
- Standards group participation
- Product transition
- Standard promotion (?)
|
- conformance to a sidelined standard
|
- Market size jump
- Market share jump
|
Plan B |
- Normal product development
- Normal product promotion
|
|
- Market share loss
- Catch-up cost
|
Often, Participation carries the
least risk
(more...)
Timing strawman
- 2006. Have your data modelled. Build RDF and SPARQL access
for analytics and CEO questions as a start.
- 2007. Build added value on
top of your data web:
- Analysis - using rules,
programs on RDF API
- Visualization
- Sanity checks - OWL,
Rule-based, etc.
- Offer filtered RDF data
to partners
- 2008 demand your partners to give you RDF for data which is
important to the relationship
- 2009 Build new applications on top of semantic web base
- 2011 Start to replace legacy systems with semantic-web
native systems
Good news
- Logic discussions are getting done (OW, SPARQL, RIF,...)
- Life sciences is an incubator community
- TCO is finite
- Startups
- Major vendors are moving it into products
- We have some ideas about actually making a user interface!
Future: Policy aware, Transparent web of data
- Much more data integration power
- much more policy awareness required
- systems with transparency ("Why? How do you know
that?")
- Computer analysis of data much more powerful
- work needed on emergent properties, stability, and
- in general relationship of µrules
to Mphenomena;
- New dreams, new systems, new rules
- New field: Web Science
- MIT CSAIL
Decentralized Information Group
Thank You
More:
w3.org
Thank you for your attention
http://www.w3.org/2006/Talks/0404-mit-tbl