What will I talk about?
- The history of the Semantic Web goes back to several years now
- It is worth looking at what has been achieved, where we are, and where we might be going…
Let us look at some results first!
The basics: RDF(S)
- We have a solid specification since 2004: well defined (formal) semantics, clear RDF/XML syntax
-
Lots of tools are available. Are listed on W3C’s wiki:
- RDF programming environment for 14+ languages, including C, C++, Python, Java,
Javascript, Ruby, PHP,… (no Cobol or Ada yet !)
- 13+ Triple Stores, ie, database systems to store (sometimes huge!) datasets
- converters to and from RDF
- etc
- Some of the tools are Open Source, some are not; some are very mature, some
are not :
it is the usual picture of software tools, nothing special any more!
-
Anybody can start developing RDF-based applications today
The basics: RDF(S) (cont.)
- There are lots of tutorials, overviews, and books around
- again, some of them good, some of them bad, just as with any other areas…
- Active developers’ communities
- Large datasets are accumulating. E.g.:
- Some mesaures claim that there are over 107 Semantic Web documents… (ready to be integrated…)
Ontologies: OWL
- This is also a stable specification since 2004
- Separate layers have beed defined, balancing expressibility vs. implementability (OWL-Lite, OWL-DL, OWL-Full)
- Looking at the tool list on W3C’s wiki again:
- a number programming environments (in Java, Prolog, …) include OWL reasoners
- there are also stand-alone reasoners (downloadable or on the Web)
- ontology editors come to the fore
- OWL-DL and OWL-Lite relies on Description Logic, ie, can use a large body of accumulated research knowledge
Ontologies
- Large ontologies are being developed (converted from other formats or defined in OWL)
- eClassOwl: eBusiness
ontology for products and services, 75,000 classes and 5,500
properties
-
the Gene Ontology: to
describe gene and gene product attributes in any organism
-
BioPAX, for biological pathway data
-
UniProt: protein
sequence and annotation terminology and data
Vocabularies
- There are also a number “core vocabularies” (not necessarily OWL based)
-
Dublin Core: about
information resources, digital libraries, with extensions for rights,
permissions, digital right management
-
FOAF: about people and
their organizations
-
DOAP: on the descriptions
of software projects
-
MusicBrainz: on
the description of CDs, music tracks, …
-
SIOC: Semantically-Interlinked Online Communities
-
vCard in RDF
- …
- One should never forget: ontologies/vocabularies must be shared and reused!
A mix of vocabularies/ontologies (from life sciences)…
Querying RDF: SPARQL
- Querying RDF graphs becomes essential
- SPARQL is almost here
- query language based on graph patterns
- there is also a protocol layer to use SPARQL over, eg, HTTP
- hopefully a Recommendation end 2007
- There are a number of implementations already
- There are also SPARQL “endpoints” on the Web:
- send a query and a reference to data over HTTP GET, receive the result in XML or JSON
-
applications may not need any direct RDF programming any more, just a SPARQL endpoint
SPARQL as the only interface to RDF data?
SELECT ?translator ?translationTitle ?originalTitle ?originalDate
FROM <http://…/Translations.rdf>
FROM <http://…/tr.rdf>
…
WHERE {
?trans rdf:type trans:Translation;
trans:translationFrom ?orig;
trans:translator [ contact:fullName ?translator ];
dc:language "fr";
dc:title ?translationTitle.
?orig rdf:type rec:REC;
dc:date ?originalDate;
dc:title ?originalTitle.
}
ORDER BY ?translator ?originalDate
A word of warning on SPARQL…
- It is not a Recommendation yet
- New issues may pop up at the last moment via reviews
- a query language needs very precise semantics and that is not that easy
- Some features are missing
- control and/or description on the entailment regimes of the triple store (RDFS? OWL-DL? OWL-Lite?…)
-
modify the triple store
- …
postponed to a next version…
Of course, not everything is so rosy…
- There are a number of open issues, problems to solve
- how to bind to different communities (e.g., the “digital library world”)
- how to get RDF data
- missing functionalities: rules, “light” ontologies, fuzzy reasoning, necessity to review RDF and OWL,…
- misconceptions, messaging problems
- need for more applications, deployment, acceptance
- etc
Simple Knowledge Organization System (SKOS)
- Goal: porting (“Webifying”) thesauri: representing and sharing classifications, glossaries, thesauri, etc, as developed in the “Print World”. For example:
- The system must be simple to allow for a quick port of traditional data
-
This is where SKOS comes in
Example: Entries in a Glossary (1)
- “Assertion”
- “(i) Any expression which is claimed to be true. (ii) The act of claiming something to be true.”
- “Class”
- “A general concept, category or classification. Something used primarily to classify or categorize other things.”
- “Resource”
- “(i) An entity; anything in the universe. (ii) As a class name: the class of everything; the most inclusive category possible.”
(from the RDF Semantics Glossary)
Example: Entries in a Glossary (2)
Example: Taxonomy (1)
Illustrates “broader” and “narrower”
- General
-
- SemWeb
-
(From MortenF’s weblog categories. Note that the categorization is arbitrary!)
Example: Thesaurus (1)
- Term
- Economic cooperation
- Used For
- Economic co-operation
- Broader terms
- Economic policy
- Narrower terms
- Economic integration, European economic cooperation, …
- Related terms
- Interdependence
- Scope Note
- Includes cooperative measures in banking, trade, …
(from UK Archival Thesaurus)
SKOS Core Overview
- Classes and Predicates:
- Basic description (
Concept
, ConceptScheme
, …
)
- Labelling (
prefLabel
, altLabel
, prefSymbol
, altSymbol
…)
- Documentation (
definition
, scopeNote
, changeNote
,
…
)
- Semantic relations (
broader
, narrower
, related
)
- Subject indexing (
subject
, isSubjectOf
, …
)
- Grouping (
Collection
, OrderedCollection
, …
)
- Subject Indicator (
subjectIndicator
)
- Some simple inference rules (a bit like the RDFS inference rules) to define some semantics
Why Having SKOSand
OWL?
- OWL’s precision not always necessary or even appropriate
- “OWL a sledge hammer/SKOS a nutcracker”, or “OWL a Harley/SKOS a bike”
- complement each other, can be used in combination to optimize cost/benefit
- Role of SKOS is
- to bring the worlds of library classification and Web technology together
- to be simple and undemanding enough in terms of cost and required expertise
- A typical example: the Glossary of project of W3C stores all terms in SKOS (and extracted from W3C documents)
- But we have heard about other usage at this conference already!
How to get RDF data?
- Of course, one could create RDF data manually…
- … but that is unrealistic on a large scale
- Goal is to generate RDF data automatically when possible and “fill in” by hand only when necessary
Data may be around already…
- Part of the (meta)data information is present in tools … but thrown away at output
- e.g., a business chart can be generated by a tool: it “knows” the structure, the
classification, etc. of the chart, but, usually, this information is lost
- storing it in web data would be easy!
- “SW-aware” tools are around (even if you do not know it…), though more would be good:
- Photoshop CS stores metadata in RDF in, say, jpg files (using
XMP)
-
RSS1.0 feeds are
generated by (almost) all blogging systems (a huge amount of RDF data!)
- …
- There are a number of projects “harvesting” and linking data to RDF (e.g., “Linking Open Data on the Semantic Web” community project)
Data may be extracted (a.k.a. “scraped”)
- Different tools, services, etc, come around every day:
- get RDF data associated with images, for example:
- XSLT scripts to retrieve microformat data from XHTML files
- scripts to convert spreadsheets to RDF
- etc
- Most of these tools are still individual “hacks”, but show a general tendency
- Hopefully more tools will emerge
Getting structured data to RDF: GRDDL
- GRDDL is a way to access structured data in XML/XHTML and turn it into RDF:
- defines XML attributes to bind a suitable script to transform (part of) the data into RDF
- script is usually XSLT but not necessarily
- has a variant for XHTML
- a “GRDDL Processor” runs the script and produces RDF on–the–fly
- A way to access existing structured data and “bring” it to RDF
- a possible link to microformats
Getting structured data to RDF: RDFa
- RDFa (formerly RDF/A) extends XHTML with a set of attributes to include structured data into XHTML
- an XHTML1 module is being defined
- Makes it easy to “bring” existing RDF vocabularies into XHTML
- Uses namespaces for an easy mix of terminologies
- It can be used with GRDDL but RDFa aware systems can manage it directly, too
- no need to implement a separate transformation per vocabulary
GRDDL & RDFa example: Ivan’ home page…
…marked up with GRDDL headers…
…and hCard microformat tags…
…yielding; …
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dataview="http://www.w3.org/2003/g/data-view#"
xml:base="http://www.w3.org/People/Ivan/">
<c:Vcalendar xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:c="http://www.w3.org/2002/12/cal/icaltzd#"
xmlns:h="http://www.w3.org/1999/xhtml">
<c:prodid>-//connolly.w3.org//palmagent 0.6 (BETA)//EN</c:prodid>
<c:version>2.0</c:version>
<c:component>
<c:Vevent r:about="#ac06">
<summary xmlns="http://www.w3.org/2002/12/cal/icaltzd#" xml:lang="en">W3C@10,
W3C AC Meeting and W3C Team day</summary>
<dtstart xmlns="http://www.w3.org/2002/12/cal/icaltzd#"
r:datatype="http://www.w3.org/2001/XMLSchema#date">2006-11-28</dtstart>
<dtend xmlns="http://www.w3.org/2002/12/cal/icaltzd#"
r:datatype="http://www.w3.org/2001/XMLSchema#date">2006-12-03</dtend>
<url xmlns="http://www.w3.org/2002/12/cal/icaltzd#"
r:resource="http://www.w3.org/Member/Meeting/2006ac/November/"/>
<location xmlns="http://www.w3.org/2002/12/cal/icaltzd#" xml:lang="en">Tokyo, Japan</location>
<geo xmlns="http://www.w3.org/2002/12/cal/icaltzd#" r:parseType="Resource">
<r:first r:datatype="http://www.w3.org/2001/XMLSchema#double">35.670685</r:first>
<r:rest r:parseType="Resource">
<r:first r:datatype="http://www.w3.org/2001/XMLSchema#double">139.770813</r:first>
<r:rest r:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#nil"/>
</r:rest>
</geo>
</c:Vevent>
</c:component>
…
(see the full file if interested…)
…marked up with RDFa tags…
…yielding; …
<rdf:RDF xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
<foaf:Person rdf:about="http://www.w3.org/People/Ivan/#me">
<foaf:mbox rdf:resource="mailto:ivan@w3.org"/>
<foaf:workInfoHomepage rdf:resource="http://www.w3.org/Consortium/Offices"/>
<foaf:workInfoHomepage rdf:resource="http://www.iw3c2.org"/>
<foaf:workInfoHomepage rdf:resource="http://www.w3.org/2001/sw"/>
<foaf:name>Ivan Herman</foaf:name>
<foaf:workplaceHomepage rdf:resource="http://www.w3.org"/>
<foaf:schoolHomepage rdf:resource="http://www.elte.hu/"/>
…
(see the full file if interested…)
Linking to SQL
- A huge amount of data in Relational Databases
- Although tools exist, it is not feasible to convert that data into RDF
- Instead: SQL ⇋ RDF “bridges” are being developed:
- a query to RDF data is transformed into SQL on-the-fly
- the modalities are governed by small, local ontologies or rules
- An active area of development, on the radar screen of W3C!
- (remind you again of “Linking Open Data on the Semantic Web” community project)
SPARQL as a unifying point?
Missing features, functionalities…
- Everybody has a favorite item, ie, the list tends to infinite…
- W3C is a standardization body, and has to look at where a consensus can be found
Rules
- OWL-DL and OWL-Lite are based on Description Logic; there are things that DL cannot express
- a well known examples is Horn rules (eg, the “uncle” relationship):
- (P1 ∧ P2 ∧ …) → C
- e.g.: for any «X», «Y» and «Z»: “if «Y» is a
parent of «X», and «Z» is a brother of «Y» then «Z» is the
uncle of «X»”
- there are a number of attempts to combined these: RuleML,
SWRL,
cwm, …
- There is also an increasing number of rule-based system that want to interchange rules
- a new type of data (potentially) on the Web to be interchanged…
Some typical use cases
- Negotiate eBusiness contracts across platforms: supply vendor-neutral representation of your business rules so that others may find you
- Describe privacy requirements and policies, and let clients “merge” those (e.g., when paying with a credit card)
- Medical decision support, combining rules on diagnoses, drug prescription conditions, etc,
- Extend RDFS (or OWL) with rule-based statements (e.g., the uncle example)
In the real World…
- Rule based systems can be very different
- different rule semantics (based on various type of model theories, on proof systems, etc)
- production rule systems, with procedural references, state transitions, etc
RIF “core”: only partial interchange
- Specification of the “core” is the first step
- It also forms a logic language to be used, eg, with OWL, RDF, XML data, …
RIF “variants”
Possible variants: F-logic, production rules, fuzzy logic systems, …; none of these have been finalized yet
“Light” ontologies
- For a number of applications RDFS is not enough, but even OWL Lite is too much
- There may be a need for a “light” version of OWL, just a few extra possibilities v.a.v. RDFS
- There are a number of proposals, papers, prototypes around: RDFS++, OWL Feather, pD*,…
- pD*, for example, has property characterization (symmetric, transitive, inverse),
class and property equivalence, and property restrictions with some or all values
- This might consolidate in the coming years
Other items…
- Fuzzy logic
- look at alternatives of Description Logic based on fuzzy logic
- alternatively, extend RDF(S) with fuzzy notions
- Probabilistic statements
- have an OWL class membership with a specific probability
- combine reasoners with Bayesian networks
- Security, trust, provenance
- combining cryptographic techniques with the RDF model, sign a portion of the graph, etc
- Ontology merging, alignment, term equivalences, versioning, development, …
- etc
(Need a new PhD topic?)
A major problem: messaging
- Some of the messaging on Semantic Web has
gone terribly wrong . See these statements:
- “the Semantic Web is a reincarnation of Artificial Intelligence on the Web”
- “it relies on giant, centrally controlled ontologies for "meaning" (as opposed to
a democratic, bottom–up control of terms)”
- “one has to add metadata to all Web pages, convert all relational databases, and XML data to
use the Semantic Web”
- “it is just an ugly application of XML”
- “one has to learn formal logic, knowledge representation techniques, description logic, etc,
to use it”
- “it is, essentially, an academic project, of no interest for industry”
- …
- Some simple messages should come to the fore!
RDF ≠ RDF/XML!
-
RDF is a model, and RDF/XML is only one possible serialization thereof
- lots of people prefer, for example, Turtle
- a good percentage of the tools have Turtle parsers, too!
- The model is, after all, simple: interchange format for Web resources.
That is it !
RDF ≠ RDF/XML! (cont.)
- RDF/XML is indeed a very complex serialization format
- Certainly not the nicest possible XML application
- good to know that it was created when XML was not yet final…
- Again: it is only syntactic sugar!
- One has to emphasize: RDF is not an XML application!
RDF is not that complex…
- Of course, the formal semantics of RDF is complex
- But the average user should not care, it is all “under the hood”
- how many users of SQL have ever read its formal semantics?
- it is not much simpler than RDF…
-
People should “think” in terms of graphs, the rest is syntactic sugar!
Semantic Web ≠ Ontologies on the Web!
- Formal ontologies (like OWL) are important, but use them only when necessary
- you can be a perfectly decent citizen of the Semantic Web if you do not use Ontologies, not even RDFS…
- remember the “light ontologies” issue?
SW Ontologies ≠ some central, big ontology!
- The “ethos” of the Semantic Web is on sharing, ie, sharing ontologies (small or large)
- A huge, central ontology would be unmanageable
- OWL includes statements for versioning, for equivalence and disjointness of terms
- a revision of those may be necessary, but the goal is clear
- The practice:
- SW applications using ontologies always mix large number of ontologies and vocabularies (FOAF, DC, and others)
- the real advantage comes from this mix: that is also how new relationships may be discovered
Semantic Web ≠ an academic research only!
- SW has indeed a strong foundation in research results
- But remember:
- (1) the Web was born at CERN…
- (2) …was first picked up by high energy physicists…
- (3) …then by academia at large…
- (4) …then by small businesses and start-ups…
- (5) “big business” came only later!
- network effect kicked in early…
- Semantic Web is now at #4, and moving to #5!
Some RDF deployment areas
- Some communities that are coming to the fore: defense sector, health care, bioinformatics, eGovernment, energy sector (oil industry), financial services, digital libraries…
- Health care and life science sector is now very active
- also at W3C, in the form of an Interest Group
The “corporate” landscape is moving
- Major companies offer (or will offer) Semantic Web tools or systems using Semantic
Web: Adobe, Oracle, IBM, HP, Software AG, webMethods, Northrop Gruman, Altova,…
- Some of the names of active participants in W3C SW related groups: ILOG, HP, Agfa, SRI International, Fair Isaac Corp., Oracle, Boeing, IBM, Chevron, Siemens, Nokia, Merck, Pfizer, AstraZeneca, Sun, Citigroup,…
- “Corporate Semantic Web” listed as major technology by
Gartner in 2006
- The Semantic Technology Conference series also attract lots of participants
- speakers in 2006: from IBM, Cisco, BellSouth, GE, Walt Disney, Nokia, Oracle, …
- not all referring to Semantic Web (eg, RDF, OWL,…) but semantics in general
- but they might come around!
Data integration
- Data integration comes to the fore as one of the SW Application areas
- Very important for large application areas (life sciences, energy sector, eGovernment, financial institutions),
as well as everyday applications (eg, reconciliation of calendar data)
- Life sciences example:
- data in different labs…
- data aimed at scientists, managers, clinical trial participants…
- large scale public ontologies (genes, proteins, antibodies, …)
- different formats (databases, spreadsheets, XML data, XHTML pages)
- etc
- We already heard yesterday: “libraries realize they are not alone…”: similar issues arise in that area
Example: antibodies demo
- Scenario: find the known antibodies for a protein in a specific species
- Combine (“scrape”…) three different data sources
- Use SPARQL as an integration tool (see also demo online)
There has been lots of R&D
Portals
Improved Search via Ontology: GoPubMed
-
Improved search on top of pubmed.org
- search results are ranked using the specialized ontologies
- extra search terms are generated and terms are highlighted
- Importance of domain specific ontologies for search improvement
Other Application Areas Come to the Fore
- Knowledge management
- Business intelligence
- Linking virtual communities
- Management of multimedia data (e.g., video and image depositories)
- Content adaptation and labeling (e.g., for mobile usage)
- etc
One last word…
- The Semantic Web is not done by W3C…
- … it is a community project developed by everybody, including you, we only coordinate
- Think about joining the various fora, possibly join W3C and then various W3C groups
- It is important to have your voice heard!
Thank you for your attention!
These slides will be publicly available on:
http://www.w3.org/2007/Talks/0223-Bangalore_IH/
in XHTML and PDF formats; the XHTML version has active links that you can follow