LD4LT telco -- 17 Jul 2014

<daveL> Meeting: LD4LT community Group Call

<daveL> chair: Dave Lewis

<daveL> Agenda: http://lists.w3.org/Archives/Public/public-ld4lt/2014Jul/0018.html

<daveL> apologies, we just lost goto meeting for a minute, back now

<daveL> Agenda: http://lists.w3.org/Archives/Public/public-ld4lt/2014Jul/0018.html

agenda review

dave: this week want to focus on meta-share ontology
... then will go through meta-share changes we did since last time
... and will cover suggestions by victor on licensing
... penny is not here today, she had mailed comments on licensing, will go through those

action items

action-5?

<trackbot> action-5 -- Víctor Rodríguez-Doncel to proposal for a license modue -- due 2014-06-19 -- OPEN

<trackbot> http://www.w3.org/community/ld4lt/track/actions/5

<scribe> done

close action-5

<trackbot> Closed action-5.

<daveL> ACTION-7: Felix - Check with w3c groups if there are other approaches to represent languages as uris

<trackbot> Notes added to ACTION-7 Check with w3c groups if there are other approches to represent languages as uris.

action-7?

<trackbot> action-7 -- Felix Sasaki to Check with w3c groups if there are other approches to represent languages as uris -- due 2014-06-19 -- OPEN

<trackbot> http://www.w3.org/community/ld4lt/track/actions/7

<daveL> http://lists.w3.org/Archives/Public/public-ld4lt/2014Jul/0004.html

close action-7

<trackbot> Closed action-7.

felix: ok to discuss with meta-share, hard to resolve in general

action-8?

<trackbot> action-8 -- David Lewis to Look into isa work related to dcat profiles and report back -- due 2014-07-10 -- OPEN

<trackbot> http://www.w3.org/community/ld4lt/track/actions/8

<scribe> done, see mail from dave

close action-8

<trackbot> Closed action-8.

see mail at http://lists.w3.org/Archives/Public/public-ld4lt/2014Jul/0011.html

action-9?

<trackbot> action-9 -- Jorge Gracia to Implement changes in metashare spreadsheet -- due 2014-07-10 -- OPEN

<trackbot> http://www.w3.org/community/ld4lt/track/actions/9

close action-9

<trackbot> Closed action-9.

<daveL> ACTION-10: Jorge - Identify some external vocabularies to use in ms

<trackbot> Notes added to ACTION-10 Identify some external vocabularies to use in ms.

dave: will discuss later in the call

jorge: will be covered during meta-share discussion

close action-10

<trackbot> Closed action-10.

dave: thanks to all for working on your action points :)

<daveL> http://mlode2014.nlp2rdf.org/lider-roadmapping-workshop/

event announcements

dave: ld4lt / lider RM workshop in leipzig

http://www.w3.org/blog/International/2014/07/14/linked-data-meets-content-analytics-4th-lider-ld4lt-event-2nd-september-leipzig/

<daveL> felix: this event is looking at getting more input from the analytics use cases and needs for linked data

<daveL> .. as there will be a lot of those companies there

dave: will be a good opportunity for people in this group to meet f2f and discuss content analytics and general topics
... then, we recently had a workshop in dublin at loc world
... we have now an opportunity to repeat that in vancouver
... will be in last week of october
... so FYI, I'll send details around later

META-SHARE vocabulary CORE

<daveL> https://docs.google.com/spreadsheets/d/15SE4_qAqYFostmD52uKxpkCPZh1f5TrPeoXKNTlDYpQ/edit#gid=0

jorge: modifications of the gdocs spreadsheet format:
... I created new columns to put new information in
... we keep track of the old information. these have been hidden as you can do in excel. Just click and the previous info appears
... I added colors so that you can see what changed - this is shown in blue
... in the discussion column: I, penny, dave, others have added comments for feeding the discussion
... my proposal: go through the rows in the spreadsheet, re-read the discussion column, see what we can decide
... I colored in red the discussions that may be more critical
... propose to go through whole list of rows

dave: agree

jorge: set of classes are short
... first: agent
... proposed to use FOAF agent both for person and organization
... see in the comment suggestion provenance agent

dave: using it by itself does not make sense, of course

jorge: for us, as a first step, I propose this, without prov ontology

dave: sure

jorge: now row six: there were some labels expressed as camel case
... I changed this as separate words
... also suggest to write labels in lower case
... recommended to write labels as normal English
... something to keep in mind when we do the clean version of this
... for corpus: I removed disjointness, thought it is not useful
... in row 10: corpus collection
... penny explains that this value does not come from meta-share model
... I say: we could introduce collection class of dublin core
... need to check that with meta-share people if that fits with them

<jgracia> (http://purl.org/dc/dcmitype/Collection

<daveL> http://www.w3.org/TR/vocab-dcat/#vocabulary-overview

dave: in dcat there is the idea of a data set
... that would be the language resource in our case
... but it can also be a catalog, which can be a collection with data sets
... so catalog rather than dct collection may be a better way of doing it

marta: for corpora you can have audio of the corpus and the transcript

<jgracia> Marta Villegas

marta: in a sense you have two corpora
... so you need two instances of corpuse to encode both parts of the corpus
... that is the idea: to build a higher node so that you can add more corpora inside

dave: so that is probably different than dcat catalogue
... in your description a collection is a sub grouping

<daveL> http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=dcmitype#Collection

dave: about dublin core: it says "it is an aggregation + collection of resources"
... so dct: collection maybe is more accurate

jorge: we have to decide: how to map dcat data sets with language resources

dave: so stay with collections as suggested

jorge: ok

discussion on the definition of "corpus"

philipp: postpone discussion and decide later whether we define this as property or class

marta: in meta-share corpus you may have different media types
... you can have audio media type or text part
... penny can give us more info - it is not trivial to move from annotation schema to ontology at this point

john: two options: we map to other concepts, or we just represent what is in meta-share. what is the goal here?

jorge: the aim is closer to use what is in meta-share, and to convert that in owl ontology
... meta-share is based on decades of discussion

john: so if in meta-share there is a corpus collection we use that, if not, we can use s.t. from a semantic web vocabulary

marta: corpus collection has been added, it is not in the original meta-share

john: if this is about alignment we should not have a new vocab that is not in meta-share?

dave: this is a first attempt to map existing xml format into rdf
... that's slightly different to map a vocab into antother one

(scribe has a hard time to capture discussion, will see if there is a conclusion)

<daveL> https://www.w3.org/community/ld4lt/wiki/Meta-Share_OWL_metamodel

<daveL> http://www.meta-net.eu/meta-share/META-SHARE%20%20documentationUserManual.pdf

https://github.com/metashare/META-SHARE/tree/master/misc/schema/v3.0

marta: above schema is the latest version of the xml schema

<Tcarrasco> Proposal - corpus: collection of linguistic data; it be in several media-types. Corpus can be: media-type homogenous or heterogeneos; monolingual or multilingual.

jorge: this type of corpus are first class citizens in meta-share model?

marta: yes

jorge: maybe good then to define this as first class entity

<Tcarrasco> Today, the relevant corpora today is n-lingual plain text

dave: how to wrap up the discussion on corpus definition, jorge?

jorge: let's move to license topic
... one major issue to clarify: mapping between language resource and dcat data set and dcat distribution classes
... this is still oepn

s/oepen/open/

<scribe> ACTION: daveL to gather info on how to provide more detailed mapping from meta-share to dcat [recorded in http://www.w3.org/2014/07/17-ld4lt-minutes.html#action01]

<trackbot> Created ACTION-11 - Gather info on how to provide more detailed mapping from meta-share to dcat [on David Lewis - due 2014-07-24].

<Tcarrasco> Human annotation is realistic for small corpus - large corpus requires programatic processing for cleaning, annotation and other processing

jorge: agree, now let's move into licensese topic

META-SHARE vocabulary LICENSE

<daveL> https://www.w3.org/community/ld4lt/wiki/Licensing_information

<daveL> http://lists.w3.org/Archives/Public/public-ld4lt/2014Jul/0014.html

dave: wikipage from victor, penny sent above mail

<Tcarrasco> Sound poor

victor: penny likes the approach and made some comments
... she said: we should declare more precisely which elements we use
... currently literals are plain strings. they should be replaced by URIs
... penny says the license name has to be kept
... use of URL is also ok
... resources that have double licensing
... should be supported, I agree
... she discussed more information that should be there
... penny has not reflected comments in wiki
... I can do it for here or she can do it herself, I'll send a mail to her about that
... next step will be to update spreadsheet
... we can use meta-share term, declaring odrl
... connecting both via owl:sameAs
... in martas translation I missed an element to aggregate license information
... in martas model these properties were directly attributed to the resources

dave: is it necessary to have the aggregation? or can you retrieve that via sparql?

victor: if a resource has two licenses the properties will be related to license one or two

dave: ok
... when I look at dcat I will take the discussion of multiple licsenses into account too

felix: when you have issues with dcat you may want to talk to phil archer directly, he is on top of things

dave: makes sense - about dcat we can make a wiki page
... so that it is digestable for dcat people

felix: makes a lot of sense

<jgracia> +1

dave: so victor will lialise with penny and the wiki page
... and then we can make changes to the actual spreadsheet

moving forward

dave: do people want to have another call next thursday?
... I won't be around but we could arrange it

people can do both weeks

dave: I could not chair next week but maybe somebody else can do that
... trying to nail things down before we get to August

process reminder

dave: we want to finish off spreadsheet, then a stable core part, and then handle that back to marta / penny to publish that on their own github
... hope that we can get to that after the holidays

aob

dave: we will arrange to have a call next week, assure that we can start the session, then another call in two weeks too
... thanks to all for your efforts in the mail and here!

adjourned

- DRAFT -

LD4LT telco

17 Jul 2014

Attendees

Contents