Using SKOS for classification information

From Linked Data for Language Technology Community Group

MS/OWL parameters that may be relevant for a SKOS classification


The original MetaShare model included a number of XML elements that are used as classifying information, either of the Language resource itself or of its contents. We can distinguish between the following cases:


(a) Taxonomy of the Language resource itself, i.e. the class ms:LanguageResource and its subclasses.

Proposal:It's best if we keep it as an OWL ontology but we should discuss some details, as outlined in the relevant cells of the spreadsheet.


(b) Classificatory information of the language resource used also outside the NLP community (e.g. subjects, topics etc.), with reference to the external classification system used (e.g. the LCSH, the PAROLE text genre classification etc.).

In the XSD implementation, this includes for text corpora the following properties: domain, subjectTopic, textGenre, textType and register, which are all used in conjunction with the element conformanceToClassificationSceme.

In the MS/OWL version, the elements have been mapped to properties (with some differentiation for the domainInfo but I am not sure of the reason for this; also, the conformanceToClassificationScheme has been renamed as classificationScheme without mapping of its values).

Note: Similar elements exist also for audio/video/image corpus classification but they have not been mapped to the MS/OWL version. They should be added whenever the remaining resource types are added in the final model.

Proposal:

1) keep the four properties (domain, subjectTopic, textGenre, textType and register) as distinct subproperties of dct:subject (in the same way that DCAT uses the dcat:theme property) and link them to SKOS concept schemes.

2) Where possible, we can recommend the use of well-established classification systems that exist in SKOS format; otherwise, the user can include their own systems (again in SKOS). The conformanceToClassificationSceme points to some of the most widely used classification systems:

For the rest of the above systems, check if there's any SKOS implementation or see if they could easily mapped to one.


(c) Classificatory information used mainly inside the NLP community. This includes the following properties and classes (mapped from the original XML elements):

  • AnnotationType (used together with the standardsBestPractices property)
  • ModalityType
  • EncodingLevel
  • LinguisticInformation
  • UseNLPSpecific

The list of values of the original XML elements have been mapped to subclasses and/or individuals.

Proposal: They can be represented as SKOS concept schemes, but I am not sure of the benefits. In any case, improvements can be made (e.g. grouping of some individuals under subclasses, given that they came from a flat list of values etc.) - to discuss when the SKOS vs. OWL implementation is decided.