Questions (and Answers) on the Semantic Web
W3C and the Semantic Web, June 20, 2005, Wien, Austria
Ivan Herman, W3C
Usually, one gives an introduction to SW…
…and then, questions are asked
But this audience already knows the introduction…
…so let us move to questions right away!
So…
Questions?
Is the Semantic Web AI on the Web?
NO!!!
RDF and OWL are relatively simple things (compared to AI, that is…)
They offer:
a simple way to express and store metadata
a way to “structure” and characterize the terms
means to make some inference within a restricted framework
and that is it!
One goal in SW is to keep things relatively simple and not necessarily seek absolute completeness (the famous 80/20 rule…)
RDF (Resource Description Framework)
Remember: RDF is a set of statements , that can be modeled (mathematically) with:
Resources : an element, a URI, a literal, …
Properties : directed relations
between two resources
Statements: “triples” of two resources bound
by a property
usual terminology: (s,p,o) for subject, property, object
RDF is a general model for such statements
OWL (Web Ontology Language)
OWL refines the usage of RDF by:
defining the terminology used in a specific context (ontologies)
imposing constraints on properties (e.g., cardinality constraints)
characterizing the logical characteristics of properties (e.g., transitivity, functionality)
defining the equivalence of terms across ontologies
etc.
(to be precise: these are done by RDFS+OWL)
OWL and Logic
OWL expresses a small subset of First Order Logic
it has a “structure” (class hierarchies, properties, datatypes…),
and “axioms” can be stated within that structure only
i.e., OWL uses FOL to describe “traditional” ontology concepts … but it is not a general logic system per se!
Inference based on OWL is within this framework only
it seems modest, but has proved to be remarkably useful…
Some things are missing
There are lots of things RDF/OWL cannot express, eg:
the “uncle” relationship: ∀x,z: ((∃y: (y parent x) ∧ (y brother z)) ⇒ (z uncle x))
temporal and spatial reasoning
fuzzy logic
…
Some of these may find their way, eventually, to SW (see later)
But AI is more…
(Some would say: if something is not yet solved in Computer Science, it is AI…)
More seriously, there are things that are not part of the SW:
associative thinking
recognition of images, text content, gestures, …
complex decision procedures (like Big Blue…)
etc.
Just as Prolog is not AI but merely a useful tool for it, SW might be a good tool for AI
Where is the “Web” in SW?
The “Web” is in the URI-s!
On the SW, resources are identified by URI-s, e.g.:
URL-s
http://www.ivan-herman.net
ftp://ftp.cwi.nl
URN-s
urn:ISBN:0-395-36341-1
urn:lsid:ensembl.org:homosapiens_gene:ensg00000002016
Anybody can create metadata on any resource on the Web
It becomes easy to merge and share metadata and ontologies
some would say: “if it is not shared, it is not on the Semantic Web…”
URI-s ground RDF into the Web
Related Question…
Q: People have misused HTML’s meta
elements… why would that be different?
A: The meta
elements are in the HTML source
i.e., only the authors can set them
on the SW, anybody can define metadata
so one can get around misuse…
Isn’t the RDF Model way too complex?
(look how complex RDF/XML is …)
RDF is a graph!
An (s,p,o) triple can be viewed as a labelled edge in a graph
i.e., a set of RDF statements is a directed, labelled graph
both “objects” and “subjects” are the graph nodes
“properties” are the edges
the formal semantics of RDF is also described using graphs
One should “think” in terms of graphs, and…
…RDF/XML is only a tool for practical usage!
RDF authoring tools often work with graphs, too (XML is done “behind the scenes”)
If one thinks in graphs, things become simple!
RDF/XML has its Problems
RDF/XML was developed in the “prehistory” of XML
e.g., even namespaces did not exist!
Coordination was not perfect, leading to problems
the syntax cannot be checked with XML DTD-s
XML schemas are also a problem
encoding is verbose and complex (simplifications lead to confusions…)
but there is too much legacy code
Don’t be influenced (and set back…) by the XML format
the important point is the model , XML is just syntax
other “serialization” methods may come to the fore
Other Encoding Examples…
Turtle, n3, N-triples (variants of one another):
:object :pred [:pred2 :val1; :pred3 :val2; ]
<triple>
<subject uri="..."/>
<predicate uri="..."/>
<object>A Literal</object>
</triple>
Class(animate)
Class(animateMotion)
Class(animationEntity complete
unionOf(animate animateMotion …)
)
Again: these are all just syntactic sugar!
Why should I use RDF?
(Couldn’t I simply use XML with XML Schema instead?)
(or: Couldn’t I simply use a relational database instead?)
It Depends…
XML’s model is
a tree, i.e., a strong hierarchy
applications may rely on hierarchy position (e.g., li
in HTML)
relatively simple syntax and structure
not easy to combine trees
RDF’s model is
a loose collections of relations
applications may do “database”-like search
not easy to recover hierarchy
easy to combine relations in one big collection (great for the integration of heterogeneous information
RDF’s Force is its Flexibility
If you want to modify your XML structure:
you have to modify your DTD or Schema…
you may not have access and/or permission to those…
tools depending on the hierarchy (e.g., XSLT) might go wrong…
Similar problems with a DBMS:
you have to modify the database record definition
you may not have the right to do so…
In the triple store model you just merge…
Extra Bonus: OWL
You may not use OWL reasoning yet…
…but you may in future, RDF leaves the door open!
Finding New Relationships
RDF(+OWL) helps in finding new relationships
e.g., in Life Sciences:
most of the drug experiments are unsuccessful
but the information from each experiment may be valuable
by “binding” this information new insights can be gained
(currently, life sciences are very excited by the
prospects of the Semantic Web!)
Sharing and aggregation of data becomes easier
may be determinant for future R&D, for example
great tool for general community building
But... RDF Does Not Make XML Obsolete!
Do not try to describe an HTML page in terms of triplets:
it is technically doable…
but things would be much more complicated!
I.e.: the choice depends on what you want to do!
With huge ontologies on the Web, does this scale?
It May Be a Problem, But…
Yes, reasoning over huge ontologies may be a problem
combination of ontologies may lead to this
DL systems shown to work for ≈100k concepts already
albeit with a simple structure
there are already applications with large ontologies (see later)
lots of R&D is happening here… but it is indeed still a challenge
But… “a little semantics can take you far” (Jim Hendler)
i.e., small OWL ontologies may lead to useful applications
applications can also be developed with “ontology islands”
loosely connected ontologies bound by an application…
… via, e.g., a P2P architecture
(e.g., M.-C. Rousset’s paper at ISWC2004)
You Can Also Choose
OWL provides layers, namely Lite, DL, Full:
increasing expressability, though increasing complexity
choose what is right for you!
(new layers might come to the fore in future)
Applications may add their own modules to a general reasoner:
the extra module “knows” about the application’s specificities
can complement the general general reasoner
But: you are not obliged to use OWL to be a good SW citizen!
see CC/PP, RSS, thesauri with SKOS, …
Where does the metadata and ontologies come from?
(Should we really expect the author to type in all this metadata?)
It May Be Around Already…
Part of the metadata information is present in tools … but thrown away at output e.g., a business chart can be generated by a tool…
…it “knows” the structure, the
classification, etc. of the chart
…usually, this information is
lost
Storing it in metadata would be easy!
“SW-aware” authoring tools will be of a great help (e.g., Adobe’s XMP)
RDF Can Also Be Generated
There might be conventions to use in XHTML…
e.g., by using class names
… and then generate RDF automatically (e.g., via an XSLT script)
there are tools and developments in this direction, like GRDDL
An interesting direction is in XHTML2 :
it has two “metadata” modules
the metadata can then be extracted via a tool, e.g., to add Dublin Core metadata to a document:
<span property="dc:date">March 23, 2004</span>
<span property="dc:title">High-tech rollers hit casino for £1.3m</span>
By <span property="dc:creator">Steve Bird</span> …
Ontology Developement
The hard work is to create the ontologies in general
requires a good knowledge of the area to be described
some communities have good expertise already (e.g., librarians)
OWL is just a tool to formalize ontologies
Large scale ontologies are often developed in a community process
leading to versioning issues, too
OWL includes predicates for versioning, deprecation, “same-ness”, …
Sharing ontologies may be vital in the process
saves the energy of re-inventing the wheel…
There is also R&D in generating them from a corpus of data
still mostly a research subject
Isn't This Research Only?
(or: does this have any industrial relevance whatsoever?)
Not Any More…
SW has indeed a strong foundation in research results…
…but we see more and more companies embracing it!
Remember:
the Web was born at CERN…
…was first picked up by high energy physicists…
…then by academia at large…
…then by small businesses and start-ups…
“big business” came only later!
network effect kicked in early…
Semantic Web is now at #4, and moving to #5!
Lots of Tools
(Graphical) Editors:
Programming Environments:
Jena (for Java, includes OWL reasoning),
RDFLib (for Python),
Redland (in C, with interfaces to Tcl, Java,
PHP, Perl, Python, …), SWI-Prolog, IBM’s Semantic Toolkit, …
Jena (for Java, includes OWL reasoning),
RDFLib (for Python),
Redland (in C, with interfaces to Tcl, Java,
PHP, Perl, Python…),
SWI-Prolog, IBM’s Semantic Toolkit, …
Triple based database systems:
RDF and OWL validators:
You can always start looking at W3C’s RDF developer site
“You can take stuff from the shelf and put a prototype out fast!”
SW Applications
Large number of applications emerge:
first applications were RDF only…
…but recent ones use ontologies, too
huge number of ontologies exist already, with proprietary formats
converting them to RDF/OWL is a significant task
(but there are converters)
Most applications are still “centralized”, not many decentralized applications yet
For further examples, see, for example, the SW Technology Conference
not a scientific conference, but commercial people making real money!
Data integration
Semantic integration of corporate resources or different databases
RDF/RDFS/OWL based vocabularies as an “interlingua” among system components
(early experimentation at Boeing, see, e.g., a WWW11 paper )
Similar approaches: Sculpteur project, MITRE Corp., MuseoSuomi, …
There are companies specializing in the area
Oracle's Network Data Model
An RDF data model to store RDF statements
Java Ntriple2NDM converter for loading existing RDF data
An RDF_MATCH
function which can be used in SQL to find graph patterns (similar to SPARQL)
Will be release as part of Oracle Database 10.2 later this year
Vodaphone's Live Mobile Portal
Search application (e.g. ringtone, game, picture) using RDF
better search: page views per download decreased 50%
increased revenue: ringtone up 20% in 2 months
RDF was key factor in making this possible
Sun's SwordFish
Sun provides assisted support for its products, handbooks, etc
Public queries go through an internal RDF engine for, eg:
Nokia has a somewhat similar support portal
IBM – Life Sciences and Semantic Web
IBM Internet Technology Group
focusing on general infrastructure for Semantic Web applications
Develop user-centered tools
power of Semantic Web technologies, but hide the underlying complexity
Integrated tool kit (storage, query, editing, annotation, visualization)
Common representation (RDF), unique ID-s (LSID), collaboration, …
Focus on Life Sciences (for now)
but a potential for transforming the scientific research process
Adobe's XMP
Adobe’s tool to add RDF-based metadata to all their file formats
used for more effective organization
supported in Adobe Creative Suite (over 700K desktops!)
support from 30+ major asset management vendors
The tool is available for all!
Does the SW Replace Web Services?
SW and WS are Complementary
Two facets of machine-to-machine communication
service based (“Web of applications”)
metadata based (“Web of data”)
A widely deployed Web Services infrastructure may be the most compelling business case
for the Semantic Web
The synergy of Semantic Web and Web Service will hugely benefit for the wide deployement of both!
Examples for Potential Synergies
Semantic Web based search engines for Web Services
search based on complex constraints
e.g., “find the most elegant Schrödinger equation solver”
“Match-making”
i.e., combining various services based on their semantics
Examples for Potential Synergies (cont)
RDF Database services with complex Queries
queries and query results transmitted in, e.g., SOAP
query facilities described in WSDL
Ontology services
“provide a Web Service to make logical deductions on my behalf”
(e.g., on complex metadata with an ontology)
find and manage equivalences
make logical deduction of terms
check SW description for validity
etc
“provide a Web Service to make logical deductions on my behalf”
(e.g., on complex metadata with an ontology)
find and manage equivalences
make logical deduction of terms
check SW description for validity
etc.
SW-WS Synergy Example
Baby CareLink
centre of information for the treatment of premature babies
provides an OWL service as a Web Service
combines disparate vocabularies like medical, insurance, etc
users can add new entries to ontologies
complex questions can be asked through the service
Convergence (at W3C and Elsewhere)
Lots of discussions on convergence at W3C
Both areas are represented at W3C, too
mapping of WSDL2.0 to RDF
Web Choreography development in terms of RDF
initiatives already exist, e.g., the OWL-S Member Submission
discussions on “WS Features and Properties”
there is a “Semantic Web Services” Interest Group
Workshop on “Frameworks for Semantics in Web Services” in June 2005
Discussions on UDDI being expressed in RDF
…
Are we done?
Not Yet…
The “core” infrastructure is around
New technical issues come up:
querying RDF data
specialized vocabularies (e.g., SKOS)
rules
…
There is also a need for a very strong outreach:
outreach to user communities (life sciences, geospatial information systems,
libraries and digital repositories, …)
intersection of SW with other technologies (Web Services, Privacy issues, …)
There is a separate Working Group on “Deployment and Best Practices” (see Thomas Baker’s presentation later today, including SKOS)
Querying RDF Graphs
In practice, complex queries into the RDF data are necessary
The fundamental idea: use graph patterns to define a subgraph:
a pattern contains unbound symbols
by binding the symbols, subgraphs of the RDF graph may be matched
if there is such a match, the query returns the bound resources or a subgraph
This is how SPARQL (Query Language for RDF) is defined
based on similar systems that already exist, e.g., in Jena
is programming language-independent query language
still in a working draft phase (Recommendation in 2006?)
Simple SPARQL Example
SELECT ?cat ?val
WHERE { ?x rdf:value ?val. ?x category ?cat }
Returns:
[["Total Members",100],["Total Members",200],…,["Full Members",10],…]
Note the role of ?x
: it helps defining the pattern, but is not returned
Other SPARQL Features
Add functional constraints to pattern matching
Define optional patterns
Return a full subgraph (instead of a list of bound variables)
Construct a graph combining a separate pattern and the query results
Use datatypes and/or language tags when matching a pattern
…
Remember: SPARQL is still evolving!
SPARQL Usage in Practice
Locally , i.e., bound to a programming environment like RDFLib or Jena
details are language dependent
Remotely , i.e., over the network, possibly connecting to a database
very important: there are a growing number of RDF depositories…
separate documents define the protocol and the result format
return is in XML: can be fed, e.g., into XSLT for direct display
An application pattern evolves: use (XHTML) forms to create a SPARQL Query to a database and display the result in HTML (eg, W3C’s Talk database)
There are lots of SPARQL implementations already!
Rules
OWL can be used for simple inferences
Applications may want to express domain-specific knowledge, e.g.:
(prem-1 ∧ prem-2 ∧ …) ⇒ (concl-1 ∧ concl-2 ∧ …)
e.g.: for any «X», «Y» and «Z»: “if «Y» is a parent of «X», and «Z» is a brother of «Y» then «Z» is the uncle of «X»”
using a logic formalism (Horn clauses): ∀x,z: ((∃y: (y parent x) ∧ (y brother z)) ⇒ (z uncle x))
Lots of research has happenend to extend RDF/OWL
(Metalog,
RuleML,
SWRL,
cwm, …)
note: cwm, for example, defines Horn predicates in terms of graph patterns
there is a connection to SPARQL here…
Interest from rule system vendors, financial services, business rules, …
the community seems to need some sort of a rule language
W3C’s Rules Workshop
W3C held a Workshop in April 2005
Lots of issues identified:
relationship to RDFS+OWL: Side by side? On the top? Replace? Independent?
open World vs. closed World assumption
“if something cannot be proved, we do not know” vs.
“if something cannot be proved, it is false”
OWL relies on the former, rule languages usually on the latter…
uncertainty/probabilitistic reasoning, fuzzy logic
syntax issues (XML? RDF? Abstract Syntax?)
declarative rules, vs. rules with functional/programmable actions
There is a public archive on further discussions
W3C may work on an “Activity Proposal”
Trust
Can I trust a metadata on the Web?
is the author the one who claims he/she is? Can I check the credentials?
can I trust the inference engine?
etc.
Some of the basic building blocks are available (e.g., XML Signature/Encryption) but
much is missing, e.g.:
how to “express” trust? (e.g., trust in context.)
how to “name” a full graph
a “canonical” form of triplets (in RDF/XML or other) (necessary for unambiguous signatures)
exhaustive tests for inference engines
protocols to check, for example, a signature
It is on the “future” stack of W3C and the SW Community …
A Number of Other Issues…
Lot of R&D is going on:
improve the inference algorithms and implementations
improve scalability, reasoning with OWL Full
temporal & spatial reasoning, fuzzy logic
better modularization (import or refer to part of ontologies)
procedural attachments
…
This mostly happens outside of W3C, though
W3C is not a research entity…
Now For Real…
Other Questions?