LLD XG -- 23 Oct 2010

<emma> Scribe: Lars

<emma> scribenick: LarsG

Introductions:

19 participants

three more people arrive, makes it 22

TomB: basic principles
... it's not like a DC working group, guests are _not_ encouraged to participate unless they have something very specific to contribute. If necessary, guests please move to the back
... WiFi is not free, and we have no sponsors. TomB payed himself, we will let the hat pass around
... $300
... agenda is tight, so let's go

Use Case Discussion

Presentations are at http://www.w3.org/2005/Incubator/lld/wiki/F2F_Pittsburgh_UCslides

scribe: We received 42 UseCases (that's the meaning of it)

emma: We try to group UseCases
... it's OK to twitter about the meeting. Hashtag is #lld
... UseCases at http://www.w3.org/2005/Incubator/lld/wiki/UseCases

kcoyle: There are guests who came for specific use cases. TomB will present those

Preparation of use case descriptions. Distribution of PostIts. Presenters please write names of the use cases they present on them

TomB: presents 3 FAO use cases
... 1) Agrovoc

TomB: 1980 multilingual thesaurus, since 2000 an owl ontology, since 2009 SKOS
... 2) FAO authority control
... description at http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_FAO_Authority_Description_Concept_Scheme
... 1) is at http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_AGROVOC_Thesaurus
... 3) AGRIS. Description at http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_AGRIS

emma: the tree UCs fit together since they are from the same organisation. For clustering purposes, it might be better to group them differently.

kcoyle: One large piece of paper with the topic, and then move postIts with UC names around until we're satisfed

one flipchart per UC area

Antoine will consolicate all UC presentation slides into one presentation and upload it to the wiki

scribe: all presenters please mail their slides to Antoine

Jeff Young: UC Authority Data Enrichment (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Authority_Data_Enrichment)

scribe: authority data used to collocate information, need to consolidate internationally
... goal: enrich authority data by linking in and out
... how can we remodel the LinkedData back into MARC
... how far can we re-use existing vocabularies and how much do we need to define ourselves

Jeff Young: UC Open Library Data (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Open_Library_Data)

scribe: Open Library has much bibliographic information from different sources (people, Amazon). It's not in MARC but key-value pairs
... problems: forms of personal names not preserved, no subfield structure preserving structure of data
... concepts (subject authority data) is probably more user friendly and less librarianesque
... they use FRBRish structure
... one goal just to present the data as LinkedData and see if it's useful
... vocabularies used: owl, skos, foaf, frbr, rdvocab, dcterms

TomB: if UCs don't have a list of used vocabularies, we should add that to the UC

kcoyle: four cases with authority data
... 1) AuthorClaim (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_AuthorClaim)
... goal: try to identify authors and encourage authors to use the same name form in future, so that authors can find themselves in the database
... vocabulary: METS

<michaelp> METS, I believe

3) VIAF (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Virtual_International_Authority_File_(VIAF))

scribe: vocabularies viaf, owl, skos, foaf, frbr entities, frbr elements, dcterms

<edsu> http://www.vivoweb.org/ would've been a nice use case to have in this area ...

scribe: makes sense to cluster it with Jeff's UC
... i. e. UC Data Enrichment (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Authority_Data_Enrichment)

alex: UC DNB Linked Data (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Linked_Data_Service_of_the_German_National_Library)
... Service in prototypical state
... topics: alignment (DBPedia, Wikipedia, VIAF)
... vocabularies: rda, foaf, relationship vocab, gnd (dnb internal),

kcoyle: NEP: New economic paper is the same as author claim

Thus we have 41 UCs

emma: does GordonD want to cluster with Jeff (Open Library Data)

GordonD: three different clusters
... 1) Language technology (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Language_Technology)
... problem: different library communities use different terminology (access points=
... no real authority control for subjects
... differences include language (multilinguality), authority terminologies and notations, uncontrolled terminology (natural language)
... need: link terms from different languages (singular/plural etc). Translate user input into controlled terminology
... LinkedData allows term-by-term matching (if the vocabulary allows it...)
... also issue with compound vs simple terms (broader/narrower, part/whole)
... Translation architectures:
... * one2one: translate term in vocab1 to exactly one term in vocab2 (scalability issues)
... * Hub-spoke: One vocabulary as hub. Issue: What to chose as hub? Issue: semantic drift between spoke vocabularies
... examples:
... * Vocabulary mapping framework (hub-spoke) http://cdlr.strath.ac.uk/VMF/
... * HILT (hub-spoke) using DDC as http://www.d-nb.de/eng/wir/projekte/macs.htm
... * s/http:.../http://hilt.cdlr.strath.ac.uk/
... HILT experimented with multilinguality and it seemed to work
... * MACS (one2one) SWD, LCSH, Rameau, DDC http://www.d-nb.de/eng/wir/projekte/macs.htm
... 2) UC Library Address Data (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Library_Address_Data)
... libraries to publish information about themselves as linked data to allow identification, perhaps including collection-level data

emma; topic of morning session is to identify the clusters

TomB: then analyse the clusters one by one
... we hear recurring themes

Marcia to present on Vocabulary Merging (http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Vocabulary_Merging)

marcia: if user find things in a local service or a tag cloud
... a vocabulary service to relate terms
... vocab merging service to work at the back end e. g. as a super structure
... sometimes actual merging, sometimes switching system
... mapping of user terms (synonyms) to vocabularies
... presentation of different projects: HILT, MACS, OCLC terminology services
... UMLS metathesaurus (creating a superstructure) over 1mill concepts and 4.3 mill concept names
... there concepts have unique URIs

antoine: who does what in this UC? What does the process look like?

GordonD: It's about terminology services. Black box: Service takes a user term and maps that back to a particular terminology a catalogue/community uses. It's transparent to the user: They enter a term and get a bunch of terminology back they can use in specific services.

marcia: It's much a silo

GordonD: DDC and UDC do the same thing but don't talk to oneanother, but there's rapid progress
... do you need a terminology service layer to organise the LinkedOpenData
... good example of statistical mapping technique in the DDC/LCSH mappings from WorldCat

michaelp: through consistent use of URIs we can get the whole cluster.

<edsu> also nat'l diet library: http://id.ndl.go.jp/auth/ndlsh map their subject terms to lcsh now as linked data / skos

emma: interesting discussion, but we're pushing the break
... postpone discussion

Break, 1/2 hour

<michaelp> Scribe: MichaelP

<emma> Scribenick: michaelp

Gordon: UC Library Address Data

GordonD: Libraries ro publish informarion about themselves for identification
... this can be subsumed under collection-level description
... There is a DCMI AP for this which could be used
... In a LOD environment this still has to be triplified
... This type of collection-level metadata allows for pre-search filtering and inform decision of users

Jeff: VCard could be used for this.

edsu: Martin has already done this in Sweden.

GordonD: We have this in a DB but not as linked data. We need advice.
... We want to link Sweden up with Scotland and the US. Sounds crazy, but is important for travelers.
... and cross-cultural researchers.

Alexander: Accessibility is key here. The accessbility of e.g. digital documents is in scope here.

GordonD: Also availability of assistive technology is important info here.

UC: Bibliographic Network
... Seeking the use of FRBR to bring metadata components together.
... Matching and deduping is another task in large-scale aggregations.
... Background issue to this cluster: data in catalogs is heterogenous.
... But users want homogenous discovery interface.
... Linked data help by breaking these records down into components.
... Some statements will be the same.
... Focus shifts from the record to the statement.
... Deduping can happen at a much lower level.
... We need to get to the triples from the legacy records. There is a lot of work going on in this area.
... Main barriers:
... Need to find identification methods.
... Matching URIs, establishing equality of sub-properties.
... Comparing values; Dewey numbers same as Dewey caption?

TomB: Do these fit into the same category?

GordonD: They are all about record identification.
... But they are still multidimensional in terms of the way we have split up the topics.

<Jeff> http://www.w3.org/2005/Incubator/lld/wiki/Topics

TomB: My use cases fell into different topics.

emma: We don't have to do the clustering today.

TomB: If a UC has three salient topics, there should be a sticker in each category for the UC.

GordonD: I would leave it like it is at the moment; we can go back and look at the aspects of UCs in relation to topics later.

TomB: We now try to identify the key topic. We break up the aspects later.

kcoyle: Open Library UC has some FRBR aspects to it.

TomB: Ok, we place it into LLD SW Technologies category.

UC: Subject search

antoine: Better use of subject vocabs for web search.
... Subjects, works, web pages about subjects and works
... The case addresses all of these aspects
... The scenario allows the user to select a controlled subject that the system has selected.
... Requirements/Linked Data: Availability of vocabs on the web.
... and use of indentifiers.
... Issues: Human readable URIs
... URIs patterns for real-world objects.
... Also, there might a difference in the view of the concepts of the concept provider vs. the user of the info.
... Another issue is the presentation of simple subjects (user-friendly)
... Vocab merging is another issue.
... Cluster: It is about authority data and bibliographic data.

Jeff: What I was trying to say in that UC is that by modeling these systems as linked data we can use web search technology like Google to do web searches with controlled vocabularies.
... Leveraging Google for semantic purposes

kcoyle: Would that put it in the Semantic Web section?

Jeff: Semantic Web environment
... Ok

Antoine: UC Digital preservation
... Goal is to support planning and realization of digital preservation
... Two kinds of data: technical data and preservation processes and agents.
... Some vocabs of interest: Preservation vocabs from LC
... OAI-ORE
... DOAP: Description of a project
... Scenario: Finding objects based in preservation criteria, tracking checking preservation actions.
... Value of LD technology: linking items, sharing data across organizations.
... Two main issues: Scalability and persistence; coverage of existing vocabs incomplete.
... No related UC, but the data could be used in other UCs than preservation.
... Cluster: Data management?

emma: Non-bibliographic information?
... We could cluster together with recollection, but the issue is completely different.
... but the same context.

antoine: We put this UC in non-bibliographic data.
... UC: Publishing 20th century press archive

<TomB> Antoine: Provide every item of this collection a persistent identifier for citing.

antoine: General goal: provide for every item a persistent identifier.
... Support the use of a standard metadata viewer.
... Kind of data: bibliographic data + context data
... Scenarios: User interacts with the system using provided metadata
... search and browse

<edsu> just added CDL's Merritt digital repository software to the digital preservation use case, since they use linkeddata for coordination of curation services: http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Digital_Preservation

<emma> thx edsu !

antoine: User can then view the images of the pages with the standard viewer.
... Also, info from other sources is pulled in for the end user.

antoine: There is also a back-end service side that focusses on harvesting
... Value of LD technology:
... Good vocabs available
... Availability of external sources as LD
... RDFa for machine/human publication of metadata
... Vocabs: ORE, SKOS, FOAF, RDA (persons), EXIF
... Issues:
... Representataion of adhoc aggregations
... end-user display of rich data aggregations
... Capturing the order of documents. Big problem in RDF
... There are only cumbersome solutions available.

<LarsG> added PRONOM as vocabulary for digital preservation http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Digital_Preservation#Related_Vocabularies_.28optional.29

antoine: Related UC: NDNP (Chronicling America), Europeana, VIAF

<edsu> here's an example of martin's linked data for library institutions: http://libris.kb.se/resource/library/S

TomB: Please squeeze in Europeana here.

antoine: I don't think so. It touches many different aspects of several cases.

TomB: Europeana is a mega-case!
... Can we present NDNP now?

emma: We had a presentation from Ed on the telecon.

TomB: OK, so we just cluster it.

<edsu> Scribe: Ed Summers

<edsu> ScribeNick: edsu

<michaelp> Scribenick: edsu

:-)

antoine: Digital Text Repositories
... linking texts to authors and other contextual resources
... there are somre repositories that curate at level of books, and some that will curate at different levels, portions of books, poems, etc
... there was some frbr mentioned, digital editions as manifestations
... linking is useful for authors, topics and to existing descriptions from external sources ; to make cataloging faster
... also to enable citation
... also automatic alignment tools could be of use, for suggesting links in the text to other linked data resources
... linkeddata useful for adopting and sharing identifiers, and possibly for representing provenance data
... related to the open library data, subject search, and bibliographic network use cases
... not sure where to fit it in precisely

emma: we can create a topic if necessary

antoine: it seems bibliographic

emma: it seems to be about using library data that's used elsewhere

antoine: ok let's put it under USE.Consuming and using library data

kai: Citation of Scientific Data Sets http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Citation_of_Scientific_Datasets
... there is gaining interest in making data associated with research available
... in some domains there are some best practices, but they aren't globally identifiable
... focused on making the data citable
... there are 3 use cases
... 1) verification of research
... 2) find publications based on a dataset
... 3) reputation system to provide incentives for researchers to make their data available and citable
... a citation is nothing but a link, and they want to link the data so it's relevant for Linked Data
... an interesting case is if the data itself is linked data
... maybe the distributed nature to it, fits linked data as well
... possibly a future role for libraries: making data available
... existing work in the healthcare/lifescience work
... it's a cross domain problem, not very easy to define requirements
... we have different roles for people that are part of the process: authors, reviewers, etc
... there's no existing vocabulary for doing this
... may need to link the citations in publications as well

antoine: there is the need to reference an article in a newspaper
... in some other use cases

kai: i'm not sure how to classify the use case: maybe library data ; but also handling digital objects

antoine: is it also connected to the authorclaim case?

kai: yes
... it relies on authority data
... especially for people

emma: are you looking to enhance publication?

kai: yes

TomB: where are we going to put it, which category?

kai: is citation the main aspect, or scientific data?

oai-ore was kind of designed for this use case btw: http://dlib.org/dlib/october06/vandesompel/10vandesompel.html

kai pins the tail on Citation

markva: Enhanced Publications UC
... aggregates of papers, chapters, datasets
... contributed by the SURF foundation where they have 4 projects where the actually implemented it
... fits in with what kai just presented
... the've been using foaf, oai-ore, dctypes, dcterms
... i have some questions about what the use case is about
... are they annotating the content?
... otherwise very little added on top of ORE
... i think it should be clustered with citation scientific data uc

antoine: it also seemed kind of bibliographic too, focused on the publication

markva: it's focused on aggregates

kcoyle: kind of background information

markva: they have high res geological images that they would like to include

kcoyle: part of that is a data management issue; how do you make sure you store things and can assemble them again
... why don't you put it in Data Management

markva: Mapping Scholarly Debate UC
... modelling rebuttals, reactions, disagreements ; to capture evolution of thought
... the schemas are frbr like (work/manifestation) ; i wasn't able to access the schema
... it would be very useful to link to the actual schemas so you can see what people have been doing
... they have an implementation at bibliographica.org ; i couldn't drill down to the relationships ; wasn't clear if it is work that they would like to do, or have done
... could be relevant Digital Text Repository UC
... also NDNP UC, 20th Century Press Archives UCs

kcoyle: seems relevant to citation
... the *why* of citation

TomB: i think there might be overlap with linking across datasets

kcoyle: i think in the end we'll have things in multiple places

antoine: we could go back to the owner to figure what vocabulary they use, since william is in the IG

TomB: it's 12:30 so it's lunch
... ray, lars, emma still have to present

rayd: Radio Station Archive Digitization UC
... current practice is that audio programs aren't often digitized, litle metadata ; the goal is to enable cross references, and search
... the scenario about an archivist who is creating and annotating the digital versions
... linked data is useful for subclassing dc:identifier, creating new vocabulary for interviewer, people, etc
... there is little guidance for creating metaata about audio recordings, and provenance information (who created various things)
... also seem to be missing vocabulary for documenting uncertainty
... it all boils down to a vocabulary problem
... a vocabulary for radio programming

kcoyle: it sounded like building an internal system

rayd: i didn't get that sense that it was internally focused

kcoyle: it is almost identical to the linkeddata discussion we had around someone from pbs who was creating vocabulary for programming

edsu: also the work that the bbc are doing

emma: LOCAH Project and Photo Museum UCs
... they have a connection because they are both about archival material
... the materials in archives are generally unique, in high quantities, and multiple content carriers
... the challenge is to get common view of these materials, so that they can be found
... they have hierarchical descriptions, contextual information is very important
... ordered sequences, which are more difficult in RDF
... sometimes the data is semi-structured, and there are quality issues (similar to radio archive)
... they want linkeddata to provide a hub, to make it easier for users to get to the materials, and related materials via the context
... linking to dbpedia, library content, library authorities

they used dcterms, bibo, foaf, skos, rdfs, frbr

emma: but the use of bibo wasn't clear, they said they just put it in there
... they aren't working on converting ead to rdf, they are going back to ISAD(G)
... similar to the FRBR -> RDF efforts, which aren't oriented around marc
... maybe cluster with radio station archive
... Recollection UC
... an effort from NDIIPP to enable discovery of resources, to provide a tool to easily aggregate archives, to create descriptions of them, and to publish as linked data
... could be a bit different because it is a digital archive

antoine: that case is quite connected to the europeana one

corey harper: just last week there was an interesting thread about generating OWL for EAD

<charper> The thread I menteiond starts here: http://listserv.loc.gov/cgi-bin/wa?A2=ind1010&L=ead&T=0&P=1910

antoine: yes, i've been involved in one of those things

charper: it's strange because it's more a document format for finding aids

GordonD: there was a meeting in helsinki about the archival communities search for a data model that connects up with libraries and museums

kai: are they going to publish it as an ontology

GordonD: on the CIDOC/CRM site it is published as rdfs
... it's an evolving supermodel across libraries/arhives and museums

LarsG: PODE UC
... it's about pulling together linked data
... wikipedia, project gutenberg
... phase 1 is about frbrising, mashing library data through web service apis
... 2nd phase is about finding non-fiction material via links to external datasets
... marc records are very inconsistent, 40 years of doing things sort of the same way
... also dewey.info is only summaries
... uses frbr, dc, bibo, lexvo, geonames, foaf, skos
... not really sure about what people want to use the data for
... i would put it under USE

antoine: also related to bibliographic network
... perhaps what we should do later is flag the more user oriented ones

See post meeting cleaning:Outcome of the use case discussion

TomB: ok, it's time to break for lunch

Meeting resumes after lunch

<emma> Scribe: Jeff

<emma> scribenick: jeff

Vocabulary discussion

antoine: looking at vocabularies that are being used and how they can be aligned

alexander: still intend to look at requirements?

antoine: yes, look at requirements first

<jodi> hi! I'll just be popping in while I'm online this weekend. :)

antoine: do the vocabularies we have do what we want and where are the gaps?

alexander: requirements should include sparql and protocol into the discussion?

antoine: focus on vocabularies first and then talk about other requirement issues
... gordon wrote document about library standards and linked data
... but start first with use cases and look at vocabularies they're using
... start with with bibliographic data vocabularies

bib networks

<marma> http://www.w3.org/2005/Incubator/lld/wiki/Use_Case_Bibliographic_Network

<antoine> -> Use Case Bibliographic Network

gordon: bibo and frbrcore

gordon: concerns of frbrcore, including modeling mistakes

karen: but it was the earliest and frbrer is only around 3 weeks

emma: but frbrcore is being used outside the library community

tomb: is persistence of frbrcore a concern?

gordon: ifla frbrer can be trusted with persistence. unlike frbrcore

dianeH: persistence and ownership is critically important exp. for larger libraries
... not willing to invest in ontologies they don't trust

edsu: I'm willing to trust frbrcore, but it's behind the scenes

<emma> +1 with dianeH's statement

karen: frbrcore was published before FRBR was cooked

kcoyle: encourage groups dragging their feet to realize people want to use these ASAP

antoine: what about bibo? is there a relationship with FRBR?

kcoyle: bibo is more about academic articles and citations
... and journal articles

bibo uses frbrcore, dc, and a mashup of other vocabularies with some additions

bibo and frbr could be derived from the same underlying data

edsu: bibo is concrete and intuitive and that's a useful thing

karen: looking a bibo, they don't include frbr

<edsu> kcoyle is right (i was wrong) bibo doesn't use frbr at all

<edsu> looks like bibo uses: dcterms, foaf, vann, owl, skos, event, prism

mpanzer: they're more interested in a citation perspective

martin: casual users will be attracted to bibo
... mapping between frbr and bibo is a useful thing

gordon: true. Who's responsible for dealing with this mapping?

antoine: the LLD XG wiki could be used to list vocabularies and maintain links between them

TomB: LLD XG could provide guidelines for others to maintain links to vocabularies rather than expecting to to be managed centrally

tom: mapping relationships between different vocabularies that are constantly evolving is a complex process

corey: the expertise in this room can help explain how others can connect their vocabularies to otheres

gordon: the issues of cross relationships becomes a problem of institutional agreements and politics

gordon: who could manage these: IFLA, W3C and ...
... DCMI

gordon: cultural shift that needs to happen to open world movement/assumption. It's a foreign idea still

<TomB> Gordon: cultural shift - orgs rooted in 20th century - open movement - something completely foreign as paradigms. It suits everyone's interest to move into that for the future, because failed in the past.

edsu: an opportunity to create a process for things to incubate elsewhere and then be adopted and developed by major organizations.

major organizations need to be more open to foreign models

NISO has this problem

NISO has a process to move projects from somebody's garage to a managed space

<edsu> TomB and Harry Halpin's paper: http://www.aaai.org/ocs/index.php/SSS/SSS10/paper/view/1140

TomB: vocabulary developers partner with cultural memory organizations and national libraries. Partnership where the organization takes over long term
... this creates a level of trust without imposing too much early bureaucracy

emma: could the major organizations take the initiative to encourage and nurture promising vocabularies

<marma> one triple, one vote?

<ww> marma: after minimisation? :P

mpanzer: Simply using a vocabulary is an endorsements, but it's still not curation

<TomB> Ed thanks Antoine for opening up the can of worms :-)

<edsu> it's an important can of worms though :-)

antoine: keep track of links from vocabularies to use cases and vice versa

jeff: we can create a database of two way linking

edsu: I'm keeping a tally, but the links would be useful

<edsu> here's the tally i made of vocabs mentioned during the presentations this morning: http://gist.github.com/642570

ACTION: for each use case champion: on the Vocabularies page, link to each URL use case that uses it - see http://www.w3.org/2005/Incubator/lld/wiki/Vocabularies [recorded in http://www.w3.org/2005/Incubator/lld/minutes/2010/10/23-lld-minutes.html#action01]

antoine: continue to look at use cases...

TomB: identification and deduplication

Gordon: no vocabularies listed

TomB: Regional catalog/vocabularies

gordon: bibo, FRBR, etc.

gordon: RDF
... problems and limitations: lack of political will, ownership, rights, finding synonymous identifiers, lookup service for bibliographic items

Data BNF: skos, foaf, rda

gordon: frbrizatoin is a concern because it makes assumptions in the underlying data

antoine: the data needs to be enriched

gordon: this may be more of an assumption than a reality. Is it a mistake to mix and match vocabularies?

antoine: there are perceptions that specific vocabularies are psychologically difficult to embrace.

kcoyle: what's the goal of identifying vocabularies listed in the use case. What the purpose?

antoine: the purpose is to identify the issues and concerns of using vocabularies

gordon: are we imagining difficulties and issues because of the vocabularies are or are not being used?

TomB: persistence, mapping, is good. Scope and limitations may not be so important
... If the ontologies are slow to publish URIs, is that a clue to complexity and uncertainty? Bounded/unbounded concerns may be a problem.
... The goal isn't to "review" these vocabularies, just to ideentify the issues

kcoyle: RDF vocabulares only, or are other vocabularis in scope?

antoine: assume that non-RDF vocabularies will be developed eventually

TomB: it's worth mentioning potential issues converting vocabularies into RDF

<emma> Scribe : Karen

<emma> Scribenick : kcoyle

library standards time is very long -- three years is a short time (frbr, etc.)

gordon: we may be providing a framework to encourage linking between vocab developers

karen has better sense of what we are doing. we can go on

gordon: polymath case... viaf, lcsh, rameau, linked data services of dnb, insittuto geografic nacionaol espana
... (IGN), EDM (Europeana Data Model), dbpedia

using lcsh -- is the data set, not a metadata schema

is a controlled vocabulary; emma: we have them on the wiki page for vocabs

that is a big can of worms because of all of the semantic alignments between them. (gordon, and others)

scribe: this may be too difficult

edsu: disagrees, because there aren't many more than 10 in the library world
... they should be kept separate, but maybe we can gt to that later

jeff: viaf ontology doesn't always make sense; maybe needs revision before others begin to use it

gordon: feedback mechanism that causes ontologies to be revised

tom: needs for namespace policies that articulate how vocab will evolve, e.g. dc: if semantics change, will coin new uri
... needs for namespace policies that articulate how vocab will evolve, e.g. dc: if semantics change, new uri is coined

not clear if dbpedia/wikipedia have such a policy

what does stability mean on web?

mpanzer: most semantics are conveyed in notes fields
... do vocabs from same ontology have to be used together to have correct semantics?

jon: we are identifying organizational level problems for use of linked data, but are in an environment that doesn't have that commitment

emma: points: ownership, official and not
... institutions should provide links between vocabs and curate them
... barriers - some are perceived more difficult one
... persistence policies
... can you pick some from a vocabulary and not use whole vocabulary guidelines?

kc: how do you know what can stand alone?

jon: does it matter?

kc: there can be dependencies between items in vocab

mpanzer: ontological baggage, is not part of linked data stack

charper: isn't that covered by domains and ranges?

mpanzer: domains and ranges are only two pieces of a relationship; there can be other parts/relationships
... and domains and ranges are not constraints

jon: we are talking about Lld, which has an existing domain model, exemplified by marc21
... and marc21 is not expressed anywhere in rdf

tomB: we are talking about a larger environment

jon: we are talking about other things because we can't talk about marc21 in a linked data context

emma: can't, or don't want to?

<ww> jon: marc21 as rdf: curl -H "Accept: text/n3" http://bibliographica.org/301b111e-0dc0-5e34-a5e6-06c461d51789/57512

mark: what could go wrong when we use pieces from other vocabs?

mpanzer: if you assume everyone using ore properties provides a resource map... but not necessarily the case
... linked data doesn't know about APs, doesn't know about records. our domain has highly structured data
... what does linked data mean for us?

<ww> mpanzer: quite so, bibliographica uses ore to group together graphs... and doesn't provide a resource map

<ww> e.g: http://bibliographica.org/aggregate/301b111e-0dc0-5e34-a5e6-06c461d51789/57512/contributor/1

tomB: libraries rely on data definitions that are out of band in the lld environment
... data received may not meet users' definitions; LD has formal relationships, but not a community view

jon: this is a significant flaw in the way we think about linked data
... inld, each statement in itself makes sense
... but for a complete description, may need more than one statement

gordon: where we started... choosing different properties from different name spaces
... one issue is definitions; if they aren't absolutely precise, they will be used wrongly
... meaning that definitions have to be very clear, but in library world we have many assumptions
... frad has class called Person defined as "an individual" - not helpful

<ww> even if the definitions are very precise they will be used wrongly cf. owl:sameAs

gordon: the mark twain sam clemens problem
... lassie is creator of paw print outside of grauman's chinese

<emma> in FOAF i read "Something is a Person if it is a person." is that much better ?!

gordon: vocabulary creators need guidance on creating definitions that can make sense outside of the context of the vocabulary

edsu: in the end, those that don't make sense won't be used

<jodi> gordon++

tomB: library community definitions are natural language concepts
... LD world uses formal relationships to other terms
... skos vocabulary terms were never defined in natural language

diane: ref. dcmi/rda task group work, and its lessons
... no 'how to' guidance for building vocabularies for the web
... this group is identifying some issues about what that guidance might be

<edsu> diane's paper http://www.dlib.org/dlib/january10/hillmann/01hillmann.html

antoine: strongly related to concept of application profiles

mpanzer: w3c has recipes and best practices; that is what could come out of this group
... not normative, but helping people who need to do something
... could be aimed just at library data, so it is do-able

<marcia> +1 mpanzer recipes and best practices

<edsu> +1 to michael's suggestion for best practice docs

<marma> +1

<ww> +1

<LarsG> +1

antoine: let's make this part of the deliverables discussion

break!

<TomB> as mentioned during break: http://lists.w3.org/Archives/Public/public-lld/2010Oct/0098.html - just posted - Mikael Nilsson on Thoughts on validation / documentation / abstract models in reaction to yesterday's application profile discussion

<emma> Scribe: Marcia

<emma> Scribenick: marcia

another hour for vocabularies

TomB: encoding vocabularies issues
... how to identify the sources that control the controlled vocab. terms
... this is an issue
... waiting for Jon for some special issue related to MARC
... differences discussed yesterday about DCAM and metadata language
... community and info services case

Gordon: Use Case Community Information Service

tomb: Use Case Linked Data and legacy library applications case?

Jeff: Use Case Open Library Data: FRBR, RDA vocab

karen: Use Case Virtual International Authority File (VIAF):

Jeff: there is a problem. In the VIAF, we kept adding individual elements that make sense. There is no vocabs available.

Gordon: future is that FRAD to do all the control-related things

Gordon: FRAD has a very rich properties for person.
... compared for person defined by FRBR, FRAD, FOAF

Karen: there are properties in FOAF that library data do not use at all.

Alex: our database has to do a detour to link each variant first name whith corresponding last name. We had to add an bnode

Ed: to help library users, could libraries to be parteners to develop

michaelP: issues of complexity. local properties are not expected to be adopted by others. Should add as FOAF sub-properties, in the future people can use the dump-down approach

<markva> +1 michaelp

<edsu> for the record I was just relaying to Alex that danbri is looking to partner w/ people like alex and the dnb in the library community to add missing things to foaf

<edsu> doesn't necessarily need to be The Library Community

<edsu> +1 charper # linking to foaf, so that library data can interoperate with the larger world of linked data

gordon: to distinguish different identities of people, libraries may use other data such as home address to help.
... other issues: redundant, depricated

<michaelp> Expressing person name authority data as linked data doesn't necessarily mean producing triples that can act as a surrogate of the MARC data.

gordon: context is important to the meaning, not always carries in the definitions. authority headings is different from describing persons

Jeff: this is, the label is different from the concept. heard more like about the label of the person

gordon: conversion issue

<LarsG> GordonD: models develop through feedback and eventually they converge.

tomB: when DC:create domain has not merged with RDF, later created dcterms:creator to assign domain range. Difficult to explain to the RDF people
... heard people prefer to have property un-constrained

ed: yesterday's Linked Data session of Karen and Corey discussed about constrain issues

<emma> ...it's about ontological commitment : the more you say, the more guidance, but also constraints

ed: may bring new problems

tomB: the group is carried away a little bit from LLD per se

<emma> TomB: feedback on DC was that it's good to make that commitment

mark:not constraining the range is only OK if you have a mechanism to constraint it locally

mark: in some case it is good to have range constrained

<markva> ... has a function in recommending people what you want in ranges, either literals or URIs

<markva> ... in context of linked data, often you want URIs, e.g. for creators

<antoine> Coming back to MARC: there was a MARC ontology under construction at DERI a few years ago, but now it's gone
... none of our use case mentions the need for a MARC vocabulary

karen: MARC people probably has a big gap with the linked data

<LarsG> kcoyle: there is a use for MARC in RDF

<emma> ... the issue is to translate legacy data into other thing, one way may be marc

Ray: regarding MARC expressed as RDF
... MADS

<emma> MADS and MODS were actually mentionned in Use Cases

<edsu> markva: is your dissertation available online somewhere?

<edsu> markva: i was just fishing around on http://www.few.vu.nl/~mark/

<markva> http://www.cs.vu.nl/~mark/papers/thesis-mfjvanassem.pdf

emma: AACR, RDA, ISBD

MODS and MADS are formats

Jon: there was a presentation, that break MARC records into statements
... explain this how data can be expressed in linked data

Karen: there are problems to make the MARC data into that kind of statements

gordon: unimarc still allign with ISBD
... some other allignments are complicated
... registered ISBD in registry

<markva> funny to hear people talk about modelling me ;)

<edsu> markva: thanks!

<markva> hope somebody actually reads it...

tomB: Use Case FAO Authority Description Concept Scheme: SKOS, RDF, FOAF, ???
... there is an issue of RDA

Alex: GND vocabulary, not registered yet, not for reuse. there was one vocab that did not mentioned. Not sure what's coming next-- official or not
... RDA, SKOS, ???
... conncecting the headings to other vocabs.
... person including academic title
... map to MADS
... all mapping things are working on. maybe next 6 months to work on

Lars: about the timeline of the LD project

Alex: we have the vocabulary, but did not regiter them
... already has the description, document

tomB: this is an issue

registry of resitry

scribe: URIs are being point
... registry become a portal and management tool, a secondary thing

the word 'registry' in the context of point to URIs is a problem

michael: you could do in your data

tomB: registry is problematic now. it is confusing
... there are registry under registry.
... has the problem with the word "to register'

<michaelp> In linked data context, "registering" in an external database is quite misleading.

gordon: I used term consistently, to represent your property in the registry

<markva> TomB: "registring" is same as coining a URI

<michaelp> Coining a URI, defining semantics and making this definition available in RDF when this URI is dereferenced is enough.

koren: there are people who do not know the meaning of registry, with domain name behind it

tomB: nothing about the registry in the sense of Diane and Jon's I do not like
... the issue is the environment

<michaelp> DNB could do that without relying on an external provider.

tomB: registrying in the LD context is to coin a URI.
... the URI is coined in a registry is... by using the word 'to registry' is important from vocab management point of view, but is not the sense in linked data.

<markva> TomB: putting something in registry is orthogonal to use in LD

Jon: formal official namespace registry
... registration is a formalization of that namespace

Alex: one of the requirements is that a vocab has to have a place to be referred to, look for provoence, etc

michael: national libraries do not need domain names, no requirements to rely on external services

Alex: we have internal and external services. Human reable version and machine readable version.

tomB: something is resolvable is machine ...?

<michaelp> Registries are helpful if you have no easy access to a domain name / namespace. This is usually not an issue for a big library organization.

corey: how do you track the change of the data, should be an important issue to be discussed here

<michaelp> There is no requirements "to register" properties / vocabularies externally to make them "official".

antoine: what are the basic requirements, what are the most important, distinguish with others

tomB: move on.

<michaelp> Coining the URI is the statement that matters.

tomB: wrap up this discussion
... I would like the group not use to verb 'to register' when not coin a URI

<michaelp> Corey: Registries provide services that are not available by just using conneg and publishing a flat RDF.

gordon: not happy to use the word 'publishing' either
... to register imply some requirements

jon: registry is a namespace service, also for trusted vocabularies
... it is more than linked data environment

<markva> michaelp: maintaining a URI is not a function of technology but function of an organisation (after Stu Weibel)

<LarsG> +1 to Stu Weibel

<michaelp> mantaining and persistence ...

<edsu> +1 for moving on

<michaelp> Alexander: We have to keep this issue in mind in terms of best practices.

<michaelp> TomB: Registries good, not required.

<edsu> _+1 for rolling registry information into potential best practice doc

Alex: need some way to say a property is deprecated, etc.

Corey: Need to know of previous version, etc.

Karen: Let people use the technology they have
... Best practice, not requirement
... We need to say that versioning, etc. is a good thing, but shouldn't dictate that this is a requirement

All: Agree, it's best practice, and a good thing to have a registry to support important services

See post meeting cleaning:Outcome of the vocabulary discussion

<edsu> paulwalk's creating a tool to visualize vocabulary usage in our use cases: http://172.22.172.216/topics

<paulwalk> I have deployed an early version of my visualising app here: http://www.paulwalk.net/lldvis/ Feel free to play with this - any changes you make **NOT** be persistent yet

<emma> Scribe: Gordon

Deliverables required by May 2011

Emmanuelle: One report expected.
... Discuss YouTube video this evening
... Have captured vocabulary requirements from this afternoon's discussion ...
... Requirements on three slides

All: discussion on requirements, slides adjusted, some requirements need to be revisited and further discussed

Emmanuelle: Concern now is to move from use cases/requirements to deliverable

Antoine: Go through the components of the deliverable and identify who is interested in developing them

Emmanuelle: Small groups could analyze use-case clusters

Kai: For each cluster, extract scenarios, abstract from them, and develop single-action use-cases (what a use-case really is)

Antoine: Allows a check that these really are clusters

Karen: What are the clusters?

Emmanuelle: May be other clusters emerging from use-cases not discussed today

Alex: What is the deadline for completing this work?

Emmanuelle: By end of December
... We will invite XG members not present to add their names to curation teams

<antoine> ACTION: Karen and Emma to curate archive cluster for end of december [recorded in http://www.w3.org/2005/Incubator/lld/minutes/2010/10/23-lld-minutes.html#action02]

Emmanuelle: Other deliverable is relevant technology pieces, etc.

Antoine: Outreach and dissemination activities are in charter - some progress to this already embedded in wiki

Emmanuelle: Tomorrow, we should take each topic and see if it translates into deliverable
... If we want to create further W3C activity we should charter it

Antoine: We should attempt to inventory what we know is out there (in addition to output from use-case and vocabulary discussions). Using CKAN as for the LOD cloud

Antoine: -> http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation

Karen: Any inventory is a moving target, and we should acknowledge that - but inventory useful

<charper> antoine++ re: CKAN

<edsu> antoine: would be good to have ww walk us through adding a package to ckan on a telecon

See post meeting cleaning:Outcome of the deliverables discussion

<antoine> ACTION: Kai and Ed to curate citations cluster for end of december [recorded in http://www.w3.org/2005/Incubator/lld/minutes/2010/10/23-lld-minutes.html#action03]

<antoine> ACTION: Mark (and someone else) to curate digital objects cluster for end of december [recorded in http://www.w3.org/2005/Incubator/lld/minutes/2010/10/23-lld-minutes.html#action04]

<antoine> ACTION: Gordon and Martin to curate bibliographic data cluster for end of december [recorded in http://www.w3.org/2005/Incubator/lld/minutes/2010/10/23-lld-minutes.html#action05]

<antoine> ACTION: Jeff and Alexander to curate authority data cluster for end of december [recorded in http://www.w3.org/2005/Incubator/lld/minutes/2010/10/23-lld-minutes.html#action06]

<antoine> ACTION: Antoine and Michael to curate vocabulary alignment cluster for end of december [recorded in http://www.w3.org/2005/Incubator/lld/minutes/2010/10/23-lld-minutes.html#action07]

LLD XG F2F meeting - Day one

23 Oct 2010

Attendees

Contents

Use Case Discussion

Vocabulary discussion

Deliverables required by May 2011

Summary of Action Items