Use Case SEO
From Library Linked Data
Back to Use Cases & Case Studies page
Search engine Optimization
- Emmanuelle Bermes
- Antoine Isaac
Background and Current Practice
One of the key concerns of libraries is to make their data searchable through Web search engines. Library catalogues have been conceived like data silos, often part of the "deep web" or "hidden web" : those databases are only accessible to humans through a User Interface, but can't be crawled by bots. Hence, the data in these silos can't be retrieved through search engines. Libraries are aware that they need to make their data more accessible on the Web, both by adopting an architecture that is compatible with web crawling by bots, and by otpimizing the available content so that search engines can process it efficiently.
The Web major actors have demonstrated an interest in the use of semantic mark-up (RDFa or Microdata) to improve the presentation of data in results pages or other tools like social network pages : see Google Rich Snippets, Facebook Open Graph Protocol, schema.org. This trend seems to show that there is an interest from these actors to process structured data when it is available.
Using structured data that is available in web pages from a library service (catalogue, web site, digital library), a distant service provider is able to improve the presentation and retrieval of a resource.
Since the data is already expressed in structured way (even in RDF), it is fairly easy to add RDFa or Microdata tags in the (X)HTML page.
Formalized Goals: PUBLISH, SEARCH/BROWSE, REUSE-SCHEMAS
Service providers from the Web (search engines, social networks, etc.)
Use Case Scenario
A library provides its catalogue on the Web both as HTML pages and Linked Data services. The HTML pages embed RDFa or Microdata tags describing the resources' main characteristics : title, type, thumbnails. A service provider crawls the data and uses the tags to improve the presentation of the resource it will provide. The structured data extracted from sets of web pages can be used to enhance access to the set as a whole, for instance by using ranking that exploits the data.
Application of linked data for the given use case
When designing a website for structured information, following Linked Data architecture principles is a guarantee that the website will be available for use by machine, in particular the data can be crawled by bots. The RDFa syntax to include RDF data into HTML pages is a more elaborate and flexible solution than microformats, another solution for putting structured data in HTML. It is currently getting momentum. Also, if the data for an object is already described in RDF, it is fairly easy to add RDFa tags in the corresponding HTML page, better capitalizing on RDF- and Linked Data-related efforts. The HTML page and RDF representation can be served by content negotiation. Microdata, which is defined with the new HTML5, provides another way of doing this. It has noteably gained prominence for Search Engine Optimisation purposes with the announcement of http://schema.org/ by Google, Microsoft and Yahoo.
Existing Work (optional)
- RDFa: deployment cases include PLoS, White house, O'Reilly, Newsweek, BBC Music.
- Vocabularies like LCSH and RAMEAU are also published with (SKOS) RDFa mark-up.
- Ranking algorithms, e.g. for Google Books (Inside the Google Books Algorithm by the Atlantic)
Related Vocabularies (optional)
Applicable standard mainly apply to the vocabularies (schemas) that are used in the data represented in the semantic markup:
- http://rdfa.info/wiki/Learn#Vocabularies: for a list of commonly used vocabularies
- Facebook's OpenGraph: http://developers.facebook.com/docs/opengraph
- Vocabularies used by Google's Rich Snippets for RDFa: http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=146898
- http://schema.org/ created by Google, Microsoft and Yahoo in 2011
Problems and Limitations
The addition of machine-readable tags can improve the presentation of the resource but there is no guarantee that it actually improves the ranking in a results list from a search engine.
Though it is in principle extensible it would require a lot of extension to express library information using schema.org, as most of the required vocabulary is lacking. There is some level of interoperability with linked data thanks to the efforts at http://schema.rdfs.org/ but at this time it seems like it would be difficult to cultivate the high level of interconnectedness between library and other datasets that is possible with linked data using this approach.
Related Use Cases and Unanticipated Uses (optional)
- Creative Commons promotes using RDFa for publishing licensing information on works: http://wiki.creativecommons.org/RDFa using the Creative Commons Rights Expression Language
Library Linked Data Dimensions / Topics
- [MGT.PATTERNS]: sharing same patterns across the board for library data would help convince main search engine to consume it
- [LLD.REFERENCE-MODEL-FIT]: vocabularies used in RDFa markup should be probably much simpler then the ones used for reference library data.
- RDFa wiki: http://rdfa.info/wiki/RDFa_Wiki
- RDFa Primer: http://www.w3.org/TR/xhtml-rdfa-primer/
- Chris Crum: A Markup That Could Have Big Implications for SEO http://www.webpronews.com/topnews/2010/01/22/a-markup-that-could-have-big-implications-for-seo