Application Profiles for Linked Data: models and requirements -- 22 Oct 2010

<TomB> Scribe: Mark van Assem

Antoine: welcome joint meeting DC architecture group and W3C LLD
... issues with DCAM and APs built on them, explore options, consequences for future activities

Antoine: TomB presents his work
... 2nd presentation Michael Panzer et al
... after coffee break exploration of reqs of APs in context of subject authority data
... then informal discussion

Review of DCMI Abstract Model and possible options

Presentation from Tom: http://www.w3.org/2001/sw/wiki/File:2010-10-22.dcam-joint-meeting-20101006.pptx.pdf

TomB: with pete johnston, walk through history of DCAM
... early 2000s: two mindsets: RDF and record format mindset
... interoperability among DC implementations problematic
... but RDF hard sell: researchy; perceived as flavor of XML

TomB: role of DCAM: bridge between mindsets; tree struct vs. graphs
... DCAM future: descriptive patterns reflecting existing metadata practice
... notion of bounded records
... notion of constraints
... (shows diagram summarizing DCAM)

TomB: DCAM can be expressed in diff syntaxes; RDF/XML, HTML
... common interface for operating across syntaxes
... allows diff applications to communicate
... DCAM family: DCAM, DSP, syntaxes (DC-XML, HTML, DC-Text, ...), user guidance (Singapore Framework, Guidelines ...)
... Description Set Profile Constraint Language: layer on top of DCAM
... example: book, creator. Template for instances of Book
... Statement template: slots: property, literal value, language, Syntax Encoding Scheme
... Statement template for creator, only use slot value string
... "cookie cutter" for creating descriptions; Book's title is a literal, creator with dcterms:creator
... wiki syntax for combining template representation and html presentation of template
... XML syntax for DSPs
... motivation: configure metadata editor; use template to generate form for entering metadata
... validating metadata
... create OWL expression of constraints
... (diagram of Singapore Framework)

TomB: interoperability levels: informal; semantic; Description set syntactic interop; Description Set Profile interop
... ~ shared Natural languge, shared formal model, shared records, shared constraints
... future scenarios: (1) carry on as before (2) DCAM 2 spec, better aligned with RDF (3) deprecate, continue with RDF (4) nothing
... (1) interest? editors? review?

(2a) simplified and better aligned with RDF; structural constraints of APs

TomB: impact of DCAM 2 on DCAM family?
... (2b) goal: clarification; transitional, to be deprecated in favor of RDF
... (3) negative impact? existing specs status? change in message? basis for APs gone?
... (4) does DCMI stand behind it or not? reputation? credibility?
... DCAM abstract syntax vs. RDF
... Descritption (sets) ~ named graphs?
... VES ~ SKOS concept schemes?
... use of rdf:value continues or something else such as skos:prefLabel?
... Issue: APs
... syntax pattern checks; checking patterns in the graph? Use OWL with closed world assumption?
... split in Singapore Framework
... constraints in underlying vocabulary or patterns on the data?

<edsu> what the room looks like (if you are interested) http://www.flickr.com/photos/inkdroid/5105654040/

<mini> @edsu hey, thanks, I was just wondering

<andypowe11> @edsu thanks

Jon Phipps: continue developing DCAM only realistic option
...RDF no notion of record, DCAM provides that
... enormous value outside RDF world

TomB: remote participants comments?
... additions to presentation?

<andypowe11> nothing from me at this stage

<andypowe11> i'm lost - is the floor open for discussing the options?

<antoine> @andy : yes

Akira Mijasawa: DCAM DCAM2 differences?

TomB: DCAM2 mostly RDF except where RDF does not have constructs
... get rid of DC terminology that is mapped to RDF

andypowe11: options 2b, 3 and 4: all work to RDF, which is where we want to get to
... which of these is better to get to that end game, wrt time available
... 4 seems not ideal, but less effort
... lean to 3; 2b has political value by taking along community; but 3 better given time

<edsu> loud and clear :-)

Stu Weibel: frustrated; no productive outcomes all these years
...we should just adopt Web as the model
... nobody understands DCAM
... W3C published architecture document after actual implementation
... revive effort: develop reference software; easily drop in data, generate linked data

andypowe11: support Stu
... DC efforts was trying to say Web is model; got confused

TomB: gap: how to express constraints? Or not necessary?

<jphipps> Just because the DCAM is poorly expressed and poorly understood, doesn't obliterate its value as a model

<jphipps> The world is NOT rdf-centric and is not likely to be

Michael Panzer: was puzzled by description sets; but it does make ontological commitment clear
... bundles of assertions have to make sense; requires way to communicate this
... RDF struggles with same issues
... DCMI should get involved with RDF development

<jphipps> It's clear to me that even (maybe especially) the creators of the DCAM don't understand its value

mikael nilsson: DC close to data and data production
... lots of RDF data being produced
... different position now: syntax not problem anymore
... RDF encounters problems that DC has too

<petej> I agree w Andy that the RDF model is where we want to get to, and 3 seems to me the best option, tho I'm willing to be persuaded there is a value in 2b
... look at problems, solve collaboratively
... DCAM starting replicating stuff in RDF; RDF has broader base
... DC produces vocabulary that's used in RDF; produces set of terms not linked to RDF in natural language

<kai> Scribe: Kai Eckert

Application Profiles and OWL

presentation by Jeff Young: http://lists.w3.org/Archives/Public/public-lld/2010Oct/0101.html

<andypowe11> i'm going to drop out at this point - can't see presentation or hear very well - sorry

<edsu> andypowe11: thanks for voicing your opinion so clearly

<andypowe11> bye all - enjoy rest of the conf :-)

Jeff Young: Introduction to next presentation: Application Profiles in OWL
...I Want to make sure that I am not that familiar with DCAM and that I come from the Semantic Web world.
... shows picture of FRBR as a DCAP domain model
... from SWAP
... simple translation to OWL, classes, properties, ...
... domain and range restrictions are used in OWL

Jeff Young: I want to name the things, so I introduced UnnamedAbstraction...

<LarsG> UnnamedAbstraction is a name for the union of Work, Manifestation and Item

<mini> yep

scribe: comparison to UML diagram
... cardinalities in OWL does not prevent anyone from ignoring them

<mini> thx...

<JennRiley> TomB: unclear to me if there's widespread support for keeping the dev't of some kind of constraint language. (I think we need this but didn't have a chance to get up to say so.) So I think we should verify the degree of support for that. And if there is support, discuss whether to do it within DCMI or use resources to push this in core RDF

Jeff Young: Example from Toms Presentation with DCAM usage
... DCMI Type Text is a Class but you can not be sure in the XML representation

<mini> I would like to see this done with RDF community on board, in any case.

<mini> Even the Topic Maps standard has a CL

<mini> -- http://www.isotopicmaps.org/tmcl/tmcl.html

scribe: in RDF, a subject should be a concept
... everything else about the concept should be get by dereferencing
... I cashed it here to let it look more like a record. You want the additional data here. You want to cache.

TomB: DCAM is historical and DC-RDF is also available.

Jeff Young: This is an example how I progressed.

<TomB> Jeff is showing the first page of XML output extracted by the wiki tool from SWAP

Jeff Young: Now I try to convert that into OWL
... Types are already there, so let's look at the title

<TomB> Jeff tripped up on fact that the first property cited in SWAP is dc:type (and he thinks in terms of rdf:type).

<TomB> Michael Panzer is coming to the microphone, setting up.

Switch to Michael Panzer

<TomB> JennRiley - let's come back to this during the discussion in the second half

Michael Panzer: Main difference between using OWL for DSP vs. DSP constraint language:
... DSP CL example
... Title has mincardinality of 1
... title has to be there
... a title with two types would not be valid
... with two types you could infer that both have to be the same

<TomB> Pellet - an inference language for OWL 2. Has a dialect that treats OWL as a constraint language.

Michael Panzer: test with pellet shows constraint violation
... with removed type it is valid

<TomB> Mikael: Nice because you add the constraints to the class.

Michael Panzer: People should remind that OWL approaches constraints in a different way

Maja Zümer: Explains that Work, Expression, ... are no subclasses, so there was a reason to model it that way

Akira Mijasawa: How can we incorporate management properties of the record? Who made it, ... provenance information, management properties

<TomB> Akira: Description set constraints in OWL - how can we incorporate [metametadata]?

Jeff: Named Graphs would be a possible solution, probably not the best, but possible. We can attach properties to graphs

<TomB> Jeff: Create new entity, "record", attach property to that. Not clear how much overlap how much DCAP and how much OWL can express. Not clear to me.

Jeff: Lot of further work to be done, it is not yet clear what OWL can do for us, what DCAM can do...

Coffe Break

<mini> i'll drop out here, thanks for an interesting discussion.

<mini> I'll add a few lines of comments for the discussion later:

<mini> 1. We need very concrete functional requirements, what kinds of constraints do we need? what precisely is "validation"? based on example records and profiles.

<LarsG> Perhaps we should forget about records. We use to think in records because that's what we had, but now we have new possibilities. The metametadata problem is really the same as with provenance, and there's work underway with that, too.

<mini> 2. We can test if OWL with constraint semantics can do it, and if DSPs can.

<mini> 3. The critical question is: based on DCAM, or based on RDF. I certainly prefer the latter, but requires DCMI to adopt RDF.

<mini> I personally see many advantages and potential use cases for an RDF CL that can specify "valid" graphs down to every last triple.

<mini> Now I'm off, good luck!

<petej> I'm leaving too. Thanks for discussions. I think Mikael's closing comments above summarise the key issues/questions very well

Coffe break is over

still scribing

TomB: Repeats minis statements for the audience

Application profile uses

Marcia Zeng: Presentation about Application Profiles (based on FRSAD model) for subject domains: http://www.w3.org/2001/sw/wiki/File:FRSAD-AP.ppt.pdf
... Questions: 1. Why do we need APs for FRSAD?
... and two more...
... FRSAD conceptual model. Notion of thema: anything that can be a subject of a work
... different ways to group things
... examples: FRBR, SUMO
... even within one domain it is difficult to map thesauri.
... In general relationships between themas are hierarchical
... but there are others, ALA came up with 100s
... different types of KOS have different types to represent concepts: classifications, theauri, ...
... 2nd question: How formally can the AP be defined?
... communities have different domain models and usage guidelines
... FRSAD-AP Functional Requirements:
... in general vocabularies, but with specific different applications

<TomB> FRSAD is a general model. Need more specific models for different types of vocabulary (classification versus thesauri), subject domains (medical vs consumer heatlh)...

Marcia: FRSAD-AP domain model: a general model, needs more specific ones for different types of KOS

<TomB> ...what are the characteristics of your subject vocabulary?

KOS = Knowledge Organization System (Thesauri, Classifications, ...)

Marcia: Triples have challenges, e.g. how to preserve order

<TomB> ...specify the set of properties in a particular subject domain?

Marcia: Nomen specifies different, general attributes
... Usage Guidlines for FRSAD-AP: Recommendations, e.g. SKOS, MADS, standards (BS, ISO)
... 3rd question: Difference between APs for subject domains and descriptive metadata
... serious sameAs issues: Is a concept from one KOS the same than the concept of another?

<edsu> Scribe: Ed Summers

Switch to Gordon Dunsire

Gordon Dunsire: Classification/subject schemes presentation: http://www.w3.org/2001/sw/wiki/File:ClassificationAP.pptx.pdf

Gordon: there are things in faceted classification schemes that need application profiles
... semifaceted sub-divisions also have issues that require AP: DDC, LCSH
... some subdivsions are mandatory in some schemes and optional in others
... also sequencing is important Law--Sociology, Sociology--Law
... something that APs need to address particulary for validation purposes

Gordon Dunsire: FRBRer vs ISBD: OWL vs DCAP http://www.w3.org/2001/sw/wiki/File:FRBRerISBD.pptx.pdf

Gordon: I'm working with FRBR conceptual model: nothing mandatory, sequenced or encoded
... monolithic record split into 4 related parts, with some cardinality constraints
... seems to me the best way to model this is w/ OWL
... e.g Expression is a realization of *exactly* one Work
... not sure how to model that in AP
... contrasted w/ ISBD - which is a data model
... made up 9 separate sections or areas, sequencing is very important
... there is also 'manditory if applicable" which makes some things required depending on the resource being described
... seems to me the best way to model that is a DC application profile
... there are aggregations
... I'm wondering if there need to be 2 separate approaches, and how others would do it

Brainstorming

TomB: any questions?
... I'd like to circle back to the OWL method, I understood from the discussion before the break that the idea was to model constraints with OWL, and to validate those constraints with closed world assumptions
... in pellet the owl is used to generate a sparql query to validate
... can someone confirm this?

Michael Panzer: pellet is an owl2 reasoner, for doing inferencing ... but there is a project called pellet integrity constraint validator http://clarkparsia.com/pellet/icv/
... it doesn't change anything in your owl, but it generates sparql queries from the owl ... the same owl is used for both the inferencing and the validation
... the integrity constraints wouldn't generate any inferences

Karen Smith-Yoshimura: i'm trying to separate what it is, from what you are doing with it

scribe: sequencing (how things are presented) needs to vary on language context, and the application
... I'm not sure what happens with translations

TomB: i wonder if jon or corey might have some thoughts
... do RDF and linked data need standard approaches to "application profiles"?

Stu: do application profiles need to consider RDF/LInked data to be useful? Does Linked data/SemWeb need application profiles?

Antoine: that's a valid question. in rdf there isn't so much guidance on how to reuse vocabularies. i think semweb community could benefit from this

JonPhipps: an application profile at this point is documentation, too many organizations lack the documentation about their data, similar to what mike bergman talked about this morning

TomB: are there only documentation requirements, or do we need to express constraints?

JonPhipps: i'm deeply critical of people who think they have the answers in this space

:-)

TomB: not looking for answers, but suggestions

JonPhipps: if you don't document what your data is, are you really communicating anything? It seems essential for trust.

Gordon Dunsire: i think isbd would be a lot easier to understand as an AP. for communicating what this thing is

Emmanuelle: i don't feel like i can say what's good for rdf, but the library community needs something that's like AP but for the linked data world

TomB: i'm hearing a requirement to communicate the purpose and substance of a metadata model to a community for coherence of data and sharing an approach
... not hearing a clear requirement for standardizing an approach to modeling constraints for validation. does anyone want to argue for that?

Gordon Dunsire: look at the FRBR model, if you convert legacy data to that model, having something you can validate aggregations of triples is quite important

JennRiley: i agree, there are two reasons validation is important: it makes tool support easier ; it's also important for public relations, to constrain the world of linked data, and allows you to scope the web of data into manageable chunks (my wording)

TomB: is the Description Set Profile language a good start at that?

JennRiley: i don't have an opinion about whether it needs to be dcmi related

Jeff Young: i think we should come up with some example use cases, it's hard to say -- we are grasping here

TomB: can we identify different scenarios for different types of profiles?

JonPhipps: there is creation metadata, there is the publishing metadata, and there is the consumption of the metadata
... there isn't a notion of constraints around publishing / consuming data for rdf ; those are areas that need to be covered by an AP

antoine: there is agreement that some guidance should be provided when using vocabularies, but does this require a langauge?
... the fact that there was a formal language for the description set profile wasn't useful to me

JonPhipps: i second that

<emma> Markva asked wether Antoine's comment implied to stop effort on DCAM

TomB: we have the singapore framework, if we ignore the DCAM is the rest valid?

[Singapore Framework diagram on screen: http://dublincore.org/documents/2008/01/14/singapore-framework/singapore-framework.png]

Diane Hillman: the idea that we will have to explain AP in terms of RDF...i've been through lots of phases of technical wonder. i'm worried that we are getting too far into thinking in one mode, need more general thinking than that

JonPhipps: [gesturing at large parts of the Singampore Framework diagram and saying it is documentation related]

TomB: what about data format?

JonPhipps: that's a specification, perhaps somewhere else like SKOS

(thumbs up from the modelers in the back)

Michael Panzer: the abstract model is a meta model, and in this way in clashes with RDF
... how would you do some of the things in the DCAM with OWL? are you going to throw out some requirements?
... we could get involved in rdf next steps. but in the end dcam and rdf are at odds, and one must win

Stu: Jon's assertion that we have confused syntax and semantics is a really strong point
... i wonder if someone is willing and able to explain what the abstract model means. we know how to describe items. i don't understand the singapore framework. we've got models that we don't believe. we haven't connected them with what we are trying to do.
... I'm not saying DCAM or RDF must win. if we were to sit down and write a document that would not allow us to use models, triples, domain models ... a plain natural langauge description of what we are trying to do...i tried to write about it in my blog and i got feedback that I didn't understand it.
... if you can't describe what the framwork is to practitioners then we can't move forward

JonPhipps: the value of the upper two layers is that they allow us to document a domain model, in a way that is independent of the bottom layer (the implementation)
... it provides a valuable documentation model, there are bits that are too technical. it would help to have it rewritten in a way that's understandable.

markva: could add some documents that explain it in very clear ways, like what the owl community has done
... could add some documents that explain how to go from the conceptual level to the implementation

antoine: keeping the rdf reference you can do without a reference implementation guideline, that might not even express all the requirements. Just as informal guidance

TomB: Michael do you think you can do without this bottom layer of RDF?

Michael Panzer: the question is more where the wind is blowing
... why build it on RDF? do we do it because it's a good brand, or that it's useful? how important is that?
... the DC of working with metadata, will enough people find it useful without anchoring it to the RDF specs?

JonPhipps: perhaps the bottom layer can be informative, and the middle layers would be normative

TomB: adjourned

DCMI Architecture Forum and W3C Library Linked Data Incubator Group joint meeting on "Application Profiles for Linked Data: models and requirements"

22 Oct 2010

Attendees

Contents

Review of DCMI Abstract Model and possible options

Application Profiles and OWL

Application profile uses

Brainstorming