Re: [ISSUE-2] Module suggestions for META-SHARE RDF vocabulary from Sebastian Hellmann on 2014-06-12 (public-ld4lt@w3.org from June 2014)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Thu, 12 Jun 2014 11:49:39 +0200
To: Asunción Gómez Pérez <asun@fi.upm.es>, public-ld4lt@w3.org
Message-ID: <539977B3.1070906@informatik.uni-leipzig.de>
Hi Asun, all,

the image looks quite appropriate, here are some things that are an 
addition to the mentioned names on the image:

## General Metadata
The DBpedia community is currently pursuing an implementation and 
extension of DCat and VOID called DataID [1]. While DCat and VOID are 
vocabularies, DataID will provide some guidelines how and where to 
exactly publish the DataID file (similar to the robots.txt or sitemap 
file). There will be a validator implementation to help adoption.

## Linguistic Specific Metadata

Language Codes for 639-1 and 639-2 are provided by the Library of 
Congress (LoC):
http://id.loc.gov/vocabulary/iso639-1/ab
http://id.loc.gov/vocabulary/iso639-2/eng
Also in RDF:
http://id.loc.gov/vocabulary/iso639-2/eng.rdf

Sadly, the most popular code, i.e. iso639-3 are not available by LoC:
http://lexvo.org is the authority here at the moment: 
http://www.lexvo.org/page/iso639-3/eng

## Linguistic Data
In my opinion, NIF and lemon are able to cover most industrial use cases.

lemon for dictionaries and terminological data
NIF as an annotation format for text

While NIF itself provides mechanisms to model (offset) annotations as 
linked data, here are the incorporated NIF modules for expressing the 
annotations itself:

* ITS RDF ontology - http://www.w3.org/2005/11/its/rdf# based on 
http://www.w3.org/TR/its20/
* NERD - for entity classification  (person, location, ...) 
http://nerd.eurecom.fr/ontology
* MARL - for sentiment analysis http://purl.org/marl/0.1/ns
* OLiA - for morpho-syntax, POS tag sets, etc.  http://purl.org/olia
* DBpedia + DBpedia Ontology for Entity Linking: 
http://dbpedia.org/resource/Barack_Obama

We started to collect them all here: 
https://github.com/NLP2RDF/ontologies/tree/master/vm

## Limitations of the above:

* If the language codes of ISO are not enough http://glottolog.org/ is 
an option
* If you need fine grained features  like annotations of annotations 
http://www.openannotation.org/ can be used. The triple count is much 
higher than NIF though and scalability can be a problem.

All the best,
Sebastian

[1]  http://wiki.dbpedia.org/coop/DataIDUnit



Am 31.05.2014 11:49, schrieb Asunción Gómez Pérez:
>
> Dear all,
>
> Please consider the following picture as a starting point to try to 
> identify different metadata in clusters and  splitting it from the  
> content oriented part of the LR . Issues related with country codes 
> are not included in this slide, but it should be easy to extend.  In 
> the middle, the white boxes refer to candidate vocabularies to be 
> reused or to initiatives that could help us with the deffinition of 
> the properties and their values.
>
> I hope that it helps
>
> Asun
>
>
>
>
> El 22/05/2014 14:00, Marta Villegas escribió:
>> Dear Penny Dave and all,
>>
>> For things like ORGANIZATION, PROJECT, DOCUMENT, PEOPLE (ie 
>> non-linguistic things) we could use existing ontologies like foaf, 
>> doap, bibo srwc etc.... (just chose the one that fits more your purpose)
>> Also for language names/codes, country names, mime-types (we did not 
>> find anything but ...) etc.
>>
>> Best
>>
>>
>>
>>
>> 2014-05-22 11:55 GMT+02:00 Penny Labropoulou <penny@ilsp.gr 
>> <mailto:penny@ilsp.gr>>:
>>
>>     Dear Dave and all,
>>
>>     We agree that a separation into modules will help the discussion,
>>     and we
>>     basically agree with your proposal.
>>
>>     One point as regards the RESOURCE_TYPE module: all LRs are
>>     described via the
>>     same set of "administrative/descriptive" components + an
>>     additional set of
>>     more specific components, depending on their resourceType AND
>>     mediaType
>>     values - the latter set corresponds to all the components
>>     included in the
>>     resourceComponentType part. So, there's a specific set of
>>     components for
>>     corpora, lexical/conceptual resources, language descriptions and
>>     tools/services (the four resource types recognized by
>>     META-SHARE); inside
>>     these, we have separate components, depending on the mediaType,
>>     so we have
>>     text corpora components, video corpora components, audio corpora
>>     components,
>>     but also lexical/conceptual text components etc. Inside each of these
>>     combinations, some elements are shared (e.g. linguality and
>>     language, time
>>     classification etc.) or can be similar (e.g. there are similar
>>     classification components for text, audio, video and image). So,
>>     it might be
>>     more convenient to separate RESOURCE_TYPE and MEDIA_TYPE modules.
>>     What do
>>     you think?
>>
>>     We also suggest that we add three further modules: ORGANIZATION,
>>     PROJECT and
>>     DOCUMENT - corresponding to the organizationInfo, projectInfo &
>>     documentationInfo parts of the original model.
>>
>>     Best,
>>     Penny
>>
>>     -----Original Message-----
>>     From: Dave Lewis [mailto:dave.lewis@cs.tcd.ie
>>     <mailto:dave.lewis@cs.tcd.ie>]
>>     Sent: Thursday, May 22, 2014 12:38 PM
>>     To: public-ld4lt@w3.org <mailto:public-ld4lt@w3.org>
>>     Subject: [ISSUE-2] Module suggestions for META-SHARE RDF vocabulary
>>
>>     Hi all,
>>     At the last call we discussed the template for the meta-share
>>     ontology as
>>     kindly initiated by Jorge:
>>     https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXK
>>     NTlDYpQ/edit#gid=0
>>     <https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXK%0ANTlDYpQ/edit#gid=0>
>>
>>     with further information at:
>>     https://www.w3.org/community/ld4lt/wiki/Meta-Share_OWL_metamodel
>>
>>     We discussed modules for this to help break down the taks and to
>>     partition
>>     parts that might take more time to agree or need involvement by
>>     different
>>     subgroups compared to others.
>>
>>     We already agreed to have a CORE component and split out a
>>     LICENSES module,
>>     but had asked for other suggestions.
>>
>>     I'd like to propose two further modules:
>>
>>     RESOURCE_TYPE corresponding to the resrouceComponentType part of the
>>     meta-share schema:
>>     http://www.meta-share.org/portal/knowledgebase/Resourcecomponenttype
>>
>>     and
>>
>>     USAGE_TYPE corresponding to the usageInfo part of the meta-share
>>     schema:
>>     http://www.meta-share.org/portal/knowledgebase/Usageinfo
>>
>>     These contain large enumerations that could both be subject to
>>     ongoing
>>     debate and likely candidate for extension/specialization. By
>>     separating
>>     these out we can avoid such debate delaying work on the CORe module.
>>
>>     Should we add these as modules to the spreadsheet?
>>
>>      From an ontology modelling viewpoint, how should we manage the
>>     modelling in
>>     these proposed modules, would a class taxonomy be a better
>>     approach and an
>>     enumeration?
>>
>>     Kind Regards,
>>     Dave
>>
>>
>>
>>
>>
>>
>>
>>
>> -- 
>> Marta Villegas
>> marta.villegas@gmail.com <mailto:marta.villegas@gmail.com>
>
> -- 
> Prof. Asunción Gómez-Pérez
> Catedrática de Universidad
> Director of the Ontology Engineering Group
> Facultad de Informática owl:sameAs Escuela Técnica Superior de Ingenieros Informáticos
> Universidad Politécnica de Madrid
> Campus de Montegancedo, sn
> Boadilla del Monte, 28660, Spain
> Home page:www.oeg-upm.net
> Email:asun@fi.upm.es
> Phone: (34-91) 336-7417
> Fax: (34-91) 352-4819
Received on Thursday, 12 June 2014 09:50:18 UTC