W3C

Semantic Web

In addition to the classic “Web of documents” W3C is helping to build a technology stack to support a “Web of data,” the sort of data you find in databases. The ultimate goal of the Web of data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The term “Semantic Web” refers to W3C’s vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS.

Linked Data Header link

The Semantic Web is a Web of data — of dates and titles and part numbers and chemical properties and any other data one might conceive of. RDF provides the foundation for publishing and linking your data. Various technologies allow you to embed data in documents (RDFa, GRDDL) or expose what you have in SQL databases, or make it available as RDF files.

Vocabularies Header link

At times it may be important or valuable to organize data. Using OWL (to build vocabularies, or “ontologies”) and SKOS (for designing knowledge organization systems) it is possible to enrich data with additional meaning, which allows more people (and more machines) to do more with the data.

Query Header link

Query languages go hand-in-hand with databases. If the Semantic Web is viewed as a global database, then it is easy to understand why one would need a query language for that data. SPARQL is the query language for the Semantic Web.

Inference Header link

Near the top of the Semantic Web stack one finds inference — reasoning over data through rules. W3C work on rules, primarily through RIF and OWL, is focused on translating between rule languages and exchanging rules among different systems.

Vertical Applications Header link

W3C is working with different industries — for example in Health Care and Life Sciences, eGovernment, and Energy — to improve collaboration, research and development, and innovation adoption through Semantic Web technology. For instance, by aiding decision-making in clinical research, Semantic Web technologies will bridge many forms of biological and medical information across institutions.

News Atom

The Provenance Working Groupbegan its activities with a charter naming some 17 concepts relevant to provenance, such as resource, process execution, use, derivation, version, etc

For the first 3 months leading to our first face to face meeting, we debated definitions for these concepts. Importantly, for the social cohesion of the group, we developed a common vocabulary shared by members to communicate.

Following the first face to face meeting, editors were tasked to produce a concrete document, against which the group could formally raise issues and make concrete proposals. In October, this document was released as a first public working draft. We were aware of its limitations, but it served an important purpose: it was setting the direction and scope of the model we were proposing to standardize.

Since then, the group has worked really hard at rationalising concepts of the PROV data model. Key hilights include:

  • introduction of the notion of responsibility, which may be assigned to agents, for the activities they participated in
  • a better characterisation of derivation, which represents, for example, the transformation of a raw data set into linked data
  • ability for the model to track how collections of data evolved
  • a relation which expresses that two different descriptions relate in some way to a same thing in the world
  • definition of a set of constraints, which allow humans and reasoners to determine whether a set of provenance assertions makes sense

The third working draft includes these changes, and we feel that the data model has reached some level of stability, and that from now on any release should be synchronised with PROV ontology definition and the PROV primer.

At our second face to face meeting, we debated intensively what identifiers of the model denote. A challenge one faces with provenance (as well as any form of metadata) is that provenance may no longer be valid if the subject of provenance changes. To make provenance assertions robust, a partial state of the subject has to be characterised in terms of time and attributes, and its provenance expressed.

However, a lot of current practice simply identifies the subject of provenance with a URI where nothing is said about the identified resource state. Thus, the prov-wg has decided that it will present the data model, to support this common usage. In a separate document, an upgrade path will be proposed: to produce a more robust form of provenance, extra assertions can make explicit the extent to which provenance assertions keep an interpretation when changes in subjects occur.

Work on the fourth working draft has already begun; when complete, I will blog again about it.

W3C has launched a new HTML validation site , the Nu Markup Validation Service . From a Semantic Web point of view it is important to note that the default setting of the validator validates HTML5 with RDFa 1.1. Lite and with microdata , the two simpler syntaxes to add structured data to HTML. Furthermore, the user can also choose the option of validating with RDFa 1.1 (instead of RDFa Lite) as well as microdata.

A report summarising the MultilingualWeb workshop in Limerick is now available from the MultilingualWeb site. Alongside the summaries are links to slides, video recordings, and the IRC log for each speaker and the discussion sessions.

Entitled “A Local Focus for the Multilingual Web”, the workshop surveyed and shared information about currently available best practices and standards that can help content creators and localizers address the needs of the multilingual Web. Attendees also heard about gaps that need to be addressed, and enjoyed opportunities to network and share information between the various different communities involved in enabling the multilingual Web.

This workshop also included a half-day Open Space discussion session run by Jaap van der Meer of TAUS, where attendees split into breakout groups to discuss topics of their own choosing.

You can also find links to videos, slides, etc as well as links to social media related to the event on the program pageof the workshop.

Preparations have now begun for the next workshop, to be held in Luxembourg, on March 15-16. It will be hosted by the Directorate General for Translation of the European Commission. See the Call for Participationto register.

The Provenance Working Groupjust had its second F2F meeting where we made substantial progress on a number of issues in creating a way to interchange provenance on the Web. We wanted to let the community know where were at and where we are going.

Overall, we have a good set of first drafts of the PROV family of documents but there’s still a ways to go in getting them all in-line with each other and well presented such that they are useful to the developer and user communities. This meeting focused on the issues of how we can make rapid progress while achieving that goal.

Simplify, Simplify, Simplify
We heard the response to our first working drafts and have been simplifying the PROV Data Model. Our mantra has been simplify, simplify, simplify. You’ll see some of it in our recent 3rd Working Draft of PROV-DM. But there’s still a ways to go to get the document where we want it to be, in particular, in terms of how constructs and concepts are explained.

While most of the constructs will remain the same, at the F2F meeting, we agreed to simplify the notion of account (a container for provenance) to focus on the use case of provenance of provenance.

Two Broad Use Cases
At the meeting, it became clear that one of the hard parts of devising a common interchange format was being able to support the group’s two broad use cases:

  1. The ability to use the PROV vocabulary to make provenance statements about existing things on the Web. Think for example adding simple provenance metadata (i.e. authorship) in a web page.
  2. The ability to exchange PROV information between provenance systems where a static or fixed view of data is key. This is common in current provenance tracking systems. Think exchanging information between version control systems or two scientific workflow systems.

This realization helped the group in thinking about how to best explain PROV. Since PROV supports both use cases, we will aiming to first explain how to use it in the broad case and then describe how one can use it in use cases that require a more exacting view.

Working with Dublin Core
At the working group we are always aware of existing provenance vocabularies on the Web. In particular, we are excited that Kai Eckert will be leading a best practice document on how PROV works with Dublin Coreone of the most widely use provenance vocabularies on the Web.

The Community
If you’re interested in PROV, we encourage you to first begin with the PROV-Primer. This is the best place to get an understanding of PROV. In the next two months, we’ll be producing updated working drafts. We hope to have a complete set for the community to review. In the meantime, we are always interested in your input. If your using PROV now, please let us know.

We are expecting talks from Microsoft, Wikimedia, Mozilla, Joomla, the European Commission and CNGL representatives at the MultilingualWeb workshop in Luxembourg, and we will be filling the remaining slots soon. The deadline for submission of talk proposals is 10th February, so if you want to speak at the event please registeras soon as possible. You can submit your proposal on the registration form.

We also recently announced that Ivan Herman , Semantic Web Activity Lead at the World Wide Web Consortium(W3C), will deliver the keynote talk.

This fourth MultilingualWeb workshopwill be held in Luxembourg, hosted by the Directorate-General for Translation (DGT) of the European Commission.

The MultilingualWeb project is looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. The project aims to raise the visibility of existing best practices and standards and identify gaps, with a view to helping content creators, localizers, tools developers, and others meet the challenges of the multilingual Web.

Participation is free. We welcome participation from both speakers and non-speaking attendees. For more information and to register, see the Call for Participation.

The Unicode Consortium has announced the release of Version 6.1 of the Unicode Standard, continuing Unicode’s long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world. A total of 732 new characters have been added.

This version of the Standard also brings technical improvements to support implementers. Improved changes to property values and their aliases mean that properties now have easy-to-specify labels. The new labels combined with a new script extensions property means that regular expressions can be more straightforward and are easier to validate.

Over 200 new Standardized Variants have been added for emoji characters, allowing implementations to distinguish preferred display styles between text and emoji styles. For example:

26FA FE0E TENT text style
26FA FE0F TENT emoji style
26FD FE0E FUEL PUMP text style
26FD FE0F FUEL PUMP emoji style

Among the notable property changes and additions in Unicode 6.1 are two new line break property values, which improve the line-breaking behavior of Hebrew and Japanese text. Segmentation behavior was also improved for Thai, Lao, and similar languages.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.1. These will be finalized in February:

UTS #10, Unicode Collation Algorithm
UTS #46, Unicode IDNA Compatibility Processing

The W3C RDF Web Applications Working Grouphas published three Last Call Working Drafts today:

Together, these documents outline the vision for RDFa in a variety of XML and HTML-based Web markup languages. RDFa Core 1.1 specifies the core syntax and processing rules for RDFa 1.1 and how the language is intended to be used in XML documents or in HTML. RDFa Lite 1.1 provides a simple subset of RDFa for novice Web authors. XHTML+RDFa 1.1 specifies the usage of RDFa in the XHTML markup language.

A number of improvements have been made to RDFa 1.1 over the past year by working closely with Google, Microsoft, Yahoo! and the other search engine developers. Public review and comments have resulted in a number of further refinements to the language that eases the learning curve for beginner Web authors.

The release of these documents as Last Call Working Drafts is a signal to the public that the Working Group believes that all of the technical requirements, public comments and reported issues have been addressed. It is also an open invitation to the general public to review and provide feedback on the finalization of this technology via the RDF Web Applications Working Group mailing list, by 21 February.

Tomás Saorín & Juan Antonio Pastor Sánchez have published a Spanish Translation of the W3C Linked Library Data Incubator Group’s report “Datasets, Value Vocabularies, and Metadata Element Sets” , under the title “ Conjuntos de Datos, Vocabularios controlados y Conjuntos de Elementos de Metadatos

I had a great time reading a paper on Semantic Search [1] . Although the paper is on the details of a specific Semantic Web search engine ( DERI ’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

One of the “associations” I had, maybe somewhat surprisingly, is with another paper I read lately, namely a report on basic profiles for Linked Data [2] . In that paper Nally et al. look at what “subsets” of current Semantic Web specifications could be defined, as “profiles”, for the purpose of publishing and using Linked Data. This was also a general topic at a W3C Workshop on Linked Data Patterns at the end of last year (see also the final report of the event) and it is not a secret that W3C is considering setting up a relevant Working Group in the near future. Well, the experiences of an engine like SWSE might come very handy here. For example, SWSE uses a subset of the OWL 2 RL Profilefor inferencing; that may be a good input for a possible Linked Data profile (although the differences are really minor, if one looks at the appendix of the paper that lists the rule sets the engine uses). The idea of “Authoritative Reasoning” is also interesting and possibly relevant; that approach makes a lot of pragmatic sense, I wonder whether this is not something that should be, somehow, documented for a general use. And I am sure there are more: In general, analyzing the experiences of major Semantic Web search engines on handling Linked Data might provide a great set of input for such pragmatic work.

I was also wondering about a very different issue. A great deal of work had to be done in SWSE on the proper handling of owl:sameAs. On the other hand, one of the recurring discussions on various mailing list and elsewhere is on whether the usage of this property is semantically o.k. or not (see, e.g.,  [3] ). A possible alternative would be to define (beyond owl:sameAs) a set of properties borrowed from the SKOS Recommendation , like closeMatch, exactMatch, broadMatch, etc. It is almost trivial to generalize these SKOS properties for the general case but, reading this paper, I was wondering: what effect would such predicates have on search? Would it make it more complicated or, in fact, would such predicates make the life of search engines easier by providing “hints” that could be used for the user interface? Or both? Or is it already too late, because the ubiquitous usage of  owl:sameAsis already so prevalent that it is not worth touching that stuff? I do not have a clear answer at this moment…

Thanks to the authors!

  1. A. Hogan, et al., “€œ Searching and Browsing Linked Data with SWSE: the Semantic Web Search Engin e”€, Journal of Web Semantics, vol. 4, no. December, pp. 365-401, 2011.
  2. M. Nally and S. Speicher, “ Toward a Basic Profile for Linked Data”, IBM developersWork, 2011.
  3. H. Halpin, et al. “ When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data”, Proceedings of the International Semantic Web Conference, pp. 305-320, 2010


Filed under: Semantic Web , Work Related Tagged: Linked Data , OWL , semantic search , Semantic Web , Web search engine

Ivan Herman , Semantic Web Activity Lead at the World Wide Web Consortium (W3C), will deliver the keynote talk at the upcoming MultilingualWeb workshop. This 4th MultilingualWeb workshop will be held in Luxembourg, hosted by the Directorate-General for Translation (DGT) of the European Commission.

Ivan will give an overview of the current work done at the W3C related to the Semantic Web, Linked Data, and related technical issues. The goal is not to give a detailed technical account but, rather, to give a general, and accessible, overview and use this is a basis for further discussions on how that particular technology can be used for the general issue of Multilingual Web.

Formerly head of the worldwide W3C Offices program, Ivan has been with the W3C since 2001, and also holds a tenure position at the Centre for Mathematics and Computer Sciences (CWI) in Amsterdam. He is a member of IW3C2 (International World Wide Web Conference Committee), and of SWSA (Semantic Web Science Association), the committee responsible for the International Semantic Web Conferences series.

The MultilingualWeb project is looking at best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. The project aims to raise the visibility of existing best practices and standards and identify gaps, with a view to helping content creators, localizers, tools developers, and others meet the challenges of the multilingual Web.

Participation is free. We welcome participation from both speakers and non-speaking attendees. For more information and to register, see the Call for Participation.