One of the exciting events of the past few months was the joint announcement of schema.org from three major search engine providers (Google, Yahoo, and Microsoft). It was a major step in the recognition that structured data, embedded in Web pages or otherwise, has a huge role to play on the Web. Put another way: structured data on web sites is definitely now mainstream.
The role of the schema.org site is twofold. It defines a family of vocabularies that search engines “understand”; although these vocabularies are still evolving, they reflect the areas that search engines consider as most important for average Web pages. Independent of the vocabularies, schema.org also defines the syntax that search engines understand, i.e., how the vocabularies should be embedded in an HTML page. At the moment the emphasis from schema.org is on the usage of microdata.
As with all such important events, the announcement of the schema.org site has generated lots of discussion on the blogosphere, on different mailing lists, twitter, and so on. The discussion crystallized around two, technically different set of issues:
- What is the evolution path of the schema.org vocabularies; how do they relate to vocabulary developments around the world that has already brought us such widely used vocabularies like Dublin Core, GoodRelations, FOAF, vCard, the different microformat vocabularies, etc?
- What is the role of RDFa and microformats for search engines; would search providers also accept RDFa 1.1 or microformats as an alternative encoding of structured data? This also raises the more general issue on how microdata and RDFa relate to one another as W3C specifications, and to microformats, independently of the specific vocabularies.
These issues will be discussed on the upcoming schema.org workshop in Mountain View, CA, on 21 September. They are also within scope of discussion within Semantic Web Interest Group (SWIG). Accordingly, as a result of a variety of discussions, I am proposing two new SWIG Task Forces to discuss these and flesh out solutions. Note that this is also related to a TAG request from June. Assuming the proposals are approved, the two Task Forces will be:
- Web Schemas Task Force, to be chaired by R.V. Guha (Google), concentrating on general vocabulary-related discussions. The Task Force’s focus should be on collaboration around vocabularies, mappings between them, and around syntax-neutral vocabulary design and tooling. Issues like convergence of various vocabulary schemas, use cases, tools and techniques, documentation of mappings and equivalences between schemas, should all be in scope for this Task Force.
- HTML Data Task Force, to be chaired by Jeni Tennison, should conduct a technical analysis on the relationship between RDFa and microdata and how data expressed in the different formats can be combined by consumers. This Task Force may propose modifications in the form of bug reports and change proposals on the microdata and/or RDFa specifications where they would help users to easily translate between the two syntaxes or use them together. The Task Force should also work on a general approach for the mapping of microdata to RDF, as well as the mapping of RDFa to microdata JSON.
Both Task Forces should be public, both in terms of joining the respective mailing lists or following the discussions via the public archives.
Everybody is welcome!