Licensing information

From Linked Data for Language Technology Community Group

Comments on the representation of the licensing information

Introduction

The purpose of this page is to comment on the representation of licensing information in Metashare’s metadata model, its UPF ontology’s version and a possible improvement over them. In order to make this analysis, the following document input documents have been considered:

  • On Metashare License Model. The MetaShare (MS) metada model is an ontology (see [1]), not formally described, but implemented as an XML Schema. For all regards, the git URI has been considered as the source for the several Metashare XML schemata. The MS elements can be easily understood as properties (e.g. name, gender, version, etc.) and are formalized as simple-type elements in the XSD schema. Components are defined as complex-type elements, etc.
  • On the UPF translation to RDF. The translation of UPF to RDF is well described in [3]. Instead of departing from the model itself, UPF's task has been mostly translating from the XSD files. The mapping itself is described in an Excel spreadsheet.


Comments on the Metashare Model

The Metashare model provides extensive support for the detailed representation of licensing information, making an effort that in some regards goes beyond of what has been described by license-specialized models. This representation permits two levels of detail. Firstly, the MS model allows referencing some of the most widely used licenses. Secondly, it allows explicitly representing as RDF some of the most important rights, conditions and constraints found in the most popular licenses.

The MetaShare model use the "distributionInfo" component to link a resource to the licensing conditions under which it is distributed. More specifically, the distributionInfo component includes all the legal information deemed important for a resource, e.g. IPR holder, distribution rights holder, availability conditions etc. The licensing information per se is described in Section "6.2.2 licenceInfo" of [1], where a component groups several license-related properties. The most direct (licence), declares the license among 29 possible values including some of the most common licenses and a few general. This list of possible values is a list of string literals (also enumerated as strings in the XSD implementation). Additionally, the MS XML Schemata devote a full schema for the licensing information: see META-SHARE-LicenseMetadata.xsd. The rest of the properties serve the purpose of further refining the license description and making it more eligible to human users. Detailed information includes describing the restrictionsOfUse (evaluationUse, commercialUse, shareAlike etc.), the price or the distributionAccessMedium (CD-ROM, paperCopy, etc.) among others. Each resource may be linked to one or more licenceInfo components, in case the same resource is made available under different formats and/or licensing conditions (e.g. for free for non-commercial purposes vs. at a price for commercial purposes, downloadable for commercial users vs. accessible through interface for academic users).

The following comments can be made:

  • Use of specific languages. There are existing REL (Rights Expression Languages) that could have been used, like ODRL, XACML, or others.
  • Reuse of vocabularies. Even with their own model, there are terms that could have been reused from existing vocabularies (price specification etc.)
  • License identification. Existing licenses are referenced by their name, rather than by their URI. The second choice is unambiguous.

Comments on the UPF ontology

The UPF ontology stays loyal to the MS model. Licence (whose name could have been the more common "License") is an object property, expecting to receive as object the individuals of the class License. This class has already some individuals predefined, most of them with a label and a description, but lacking a URI. The rest of the elements have been translated also as properties attributed to the class Resource. The translation of the licensing elements has been made as described in this spreadsheet excerpt:

The following comments can be made:

  • The translation is not complete Some elements have not been translated, such as the membership info.
  • Reuse of existing terms This mapping could have reused existing terms: for example, ms:attributionText could have done with cc:attributionName or cc:attributionURL (elements in the Creative Commons namespace
  • The structural information has been lost in the translation In the MS model, there are two structuring elements that gather all the distribution/licensing info (distributionInfo and licenceInfo). In the UPF ontology, there are datatype properties like “Download location”, “Execution location” etc. that are attributed to a Resource, but should be attributed to the distribution class. Also, provenance information like “Original source” etc. shares the same pattern. Instead, a “licenseInfo” class should have been made: exactly as it was in the MS model, and improving the readability of the licensing (and possibly provenance) information. This would be qualifying a relationship, giving entity to the abstract concept of licensing info.
  • Licenses as RDF. The most common licenses might have been defined purely as RDF. This is a heavy but doable task. As an example, the following figure can be seen how resources can be provided with licensing (in red) and provenance (in blue) information. This information uses the standard vocabularies of ODRL 2.0 (Open Digital Rights Management) and PROV (Provenance Ontology)

In the figure below, the license might be one that "Gives access to the resource to Spanish research institutions. Redistributing or transforming the work is forbidden"

The next figure may be the representation of a META-SHARE NonCommercial NoRedistribution NoDerivatives For-a-Fee Licence

Recommendations

These are the recommendations and guidelines that have founded the license representation in the ontology.

R1. Improve the license representation in RDF.

Create a licenseInfo class, whose properties detail the licensing information. The licenseInfo instance must include at least the "license" property, which declares the license with a well-known license URI like http://creativecommons.org/licenses/by/4.0/. The license details can be given with vocabularies like CC-REL or the ODRL vocabulary (the latter being preferred for its richness). The licenseInfo must include a human-readable label for the license, and a license version. A resource may be linked to one or more instances of licenseInfo, allowing dual licensing.

R2. Make a RDF version of the most common licenses. Licenses will be more easily classified, searched and retrieved. The possibility of having access control system to linguistic resources will be enabled.

R3. Use the distribution class (cf. also dcat:Distribution) to group together properties related to the distribution instance of a resource, including legal information that is not specific to a license.

References

[1] Documentation and User Manual of the META-SHARE Metadata Model, online here and here

[2] M. Gavrilidou, P. Labropoulou, E. Desipri, S. Piperidis, H. Papageorgiou, M. Monachini, F. Frontini, T. Declerck, G. Francopoulo, V. Arranz, and V. Mapelli, "The META-SHARE metadata schema for the description of language resources," in Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey: European Language Resources Association (ELRA), May 2012. Available: http://www.lrec-conf.org/proceedings/lrec2012/pdf/998_Paper.pdf

[3] Marta Villegas, Maite Melero and Núria Bel. “Metadata as Linked Open Data: mapping disparate XML metadata registries into one RDF/OWL registry.” In Proc. of LREC’2014, Reykjavik, Iceland, May 2014