10 Years of Success

10th International Conference on Semantic Systems (SEMANTiCS 2014), September 4 - 5, Leipzig

In my keynote talk at the tenth SEMANTiCS conference, I wanted to make three points:

we have 10 years of success to be proud of;
the technology stack is mature and well implemented, W3C will contine to work with the community to fill in the gaps;
looking to the future, I believe that we need to make the Web of Data a utility, a service, a product that can be used by regular Web developers who we should regard as part of our customer base so that they become our advocates, not our detractors.

The Breadth of Success

2014 is a terrific year …

25 years of the Web;
20 years of W3C;
SEMANTiCS 10.

Wow.

What does success look like? I don't think we have to look very far to see it.

Semantic Web Technologies, including Linked Data, are proving successful in a wide range of areas.

A very simple example is the adoption of the DCAT vocabulary that allows portals like datos.gob.es and publicdata.eu to harvest metadata from distributed portals. See the relevant section of the report from the recent Share-PSI 2.0 workshop on uses of Open Data for Efficiency in Government for more and the EC's DCAT Application profile.

Cultural Heritage

We see other metadata provision and aggregation efforts in cultural heritage, such as
Europeana
Deutsche Digitale Bibliothek
The British Library.

Health Care & Life Sciences

At the other end of the scale from simple data publication and harvesting are the life sciences.

OpenPHACTS, originally a research project, now a non-profit, has 470 million triples decribing pharmacological and chemical data (drawn from several open sources).

The Monarch Initiative has a wealth of information about model organisms, in vitro models, genes, pathways, gene expression, protein and genetic interactions, orthology, disease, phenotypes, publications, and authors with semantic similarity calculations that suggest novel relationships based on the information content of data descriptions.

Efforts like DNA Digest setting out to make vast amounts of genomic data available in a privacy-aware environment, perhaps using the eXframe platform developed largely by (commercial) Partners Healthcare company.

And, for the OWL specialists here in the room, you may be interested in the bioinformatics analyst post at Craig Venter's lab in La Jolla.

Linked Open Government Data

Linked Data is perfect for publishing and sharing data between government agencies, and for improving services based on that data.

Government users of Linked Data include the UN's FAO that publishes its long established AGROVOC thesaurus in 20 languages, including links/mappings to many other datasets including Chinese Agricultural Thesaurus, the US National Agricultural Library Thesaurus, Library of Congress Subject Headings and more.

Made possible by SKOS.

European Union's Publications Office publishes a growing set of authoritative lists as SKOS concept schemes. Those URIs will soon be dereferenceable and the schemes already offer all labels in 23 languages.

Efficiency is a big driver within government. Services like legislation.gov.uk and The Gazette.

The study I contributed to on use of Linked Open Data in the Public Sector is published by the European Commission.

Linked Statistics

Key standard is the RDF Data Cube. See examples in Share-PSI 2.0 report from Samos.

Most ambitous project is Sarven Capadisli's work on 270a Linked Dataspaces. That allows you to query across multiple datasets and see visualisations and analyses with all the SPARQL hidden away. Data includes that from:

UNESCO
IMF
Swiss Federal Stats
Federal Reserve Board
OSCE
World Bank

More areas

Media (BBC, NY Times etc.)
Log analysis
Malware detection
Bart van Leeuwen's work on Emergecy Response data
The Snowden stuff
Multilingual Web
Financial industry

Geospatial Data

See, for example, the EC's INSPIRE Registry and their call for comments on the RDF and PIDs study.

OGC's interest goes well beyond GeoSPARQL. See, for example, their Geopatial Semantic Web Interoperability Experiment.

Hence planned joint OGC/W3C Working Group on spatial data.

eCommerce

I referred to Daimler's use of Linked Data and the GS1 Digital project.

schema.org

Maturity

Sincere thanks to the men and women who worked on updating RDF and SPARQL, on (finally) creating standards for triples, quads, TRiG, Turtle. To those working on LDP (almost complete).

And thanks to the people behind the master stroke of JSON-LD.

We already are, or soon will be, working on some of the gaps:

The CSV on the Web Working Group
The Data on the Web Best Practices Working Group
RDF Validation (data shapes)
Graph Normalization
Access control

The Future

We have some problems around skills and tooling.

The McKinsey Report I referred to that highlights the predicted skills gap is Big data: The next frontier for competition and I also talked about the LOD2 technology stack.

Web Developments

I referred to a number of W3C initiatives outside the Data Activity that I think are worth noting:

Automotive Business Group (Web APIs for your car)
Second Screen Community Group
Web of Things workshop report, now setting up a WoT Interest Group
Annotations WG
Web Components

Just as tools like Callimachus are sold as application servers, not as Linked Data tools, I believe we need to make our services available to regular Web developers in their terms. SPARQL is always there, yes, but we should aim for smart servers that allow regular Web developers to do smart things easily without needing to know that what they're doing is traversing a graph.

I referred to Markus Lanthaler and Ruben Verborgh's work in the HYDRA Comunity Group in this regard as well their work with Pieter Colpaert work on Linked Data Fragments. Sören Auer talked about 'Open Watson' as a possible future project that matches these ideas around Smart Servers and Smart Clients

I'm looking forward to celebrating 20 years if success in 2024!

Thanks

I am grateful to the following people who have made suggestions for this talk: Bernadette Hyland, Sören Auer, Tom Baker, Dan Brickley, Ivan Herman, Sandro Hawke, Eric Prud'hommeaux, Kerstin Forsberg and Fiona Nielsen.

http://www.w3.org/2014/Talks/0904_phila_semantics/

Phil Archer, Data Activity Lead <phila@w3.org>

@philarcher1