This best practice describes how to exploit Linguistic Linked Data resources. The suggested steps for exploitation comprise:
- search and discovery of relevant resources
- verify the license of the dataset
- navigating to the distribution of the data (download or SPARQL endpoint)
- extract that part of the data that is relevant for a particular purpose or application
This document was published by the Best Practices for Multilingual Linked Open Data community group.
It is not a W3C Standard nor is it on the W3C Standards Track.
There are a number of ways that one may participate in the development of this report:
Let us consider the example of a company developing sentiment analysis and opinion mining software that has a working system for the English language and wants to port the system to also support German. The company wants to find a corpus that is annotated at the sentiment level and extract a first seed lexicon of German subjective expressions with their polarity (positive, negative, neutral).
In order to exploit Linguistic Linked Data resources, the above mentioned methodology can be implemented as follows:
If LIDER guidelines are followed during publication and metadata provision for resources and if the resource is registered at either Metashare, CLARIN VO, LRE Map or DataHub, LingHub will crawl the resource and index the resource with the appropriate metadata. Further, if de facto standards and vocabularies as recommended by LIDER are followed, then the same extraction patterns can be used to extract data from different datasets.
- Search and discovery: relevant linguistic resources can be discovered using LingHub, which has been developed by the LIDER project.
- Licensing: when a relevant dataset has been found using LingHub, by clicking on the link of the resource one can navigate to a page containing all the metadata about the resource.
- Distribution: from the metadata page in LingHub, one can either download the dataset or discover where the SPARQL endpoint of the data is.
- Extraction: Using W3C standards, in particular SPARQL as RDF query language, one can extract that portion of the data that is needed for a particular purpose.