Linked Enterprise Data Patterns Workshop

Semantic Web Application Patterns: Pipelines, Versioning and Validation

<inserted> scribenick: SteveBattle

<dbooth> by David Booth

<dbooth> http://dbooth.org/2011/ledp/

dbooth is talking

joined new startup PangenX in healthcare

Addressing the semantic integration problem

create the illusion of a single unified data source

scribe: using RDF in the middle

There's some kind of data production pipeline

Question: Martin Nally: We made people turn data at the edge into http resources

use ontologies/rules etc within the data pipeline

David Wood: Suggests breakout session about minimal server for HTTP+RDF

Illustrates with Cleveland Clinic reporting pipeline

A dependency graph

Each node is a consumer and producer of data

Martin Nally: Suggests a comparison of standard ETL techniques with an RDF based approach would be interesting.

Convert into RDF at the edges - but you're not done, plenty more work to do.

David Wood: LeeF is going to talk about data segmentation - also relevant.

Need to be smart about tracking when data changes - not doing redundant work

open source project "RDF Pipeline" PoC

The pipeline (graph) is described in RDF

Each node is a processing stage combining a wrapper and update code

Each node declares its inputs (as an RDF List)

Each node has a URL

The updaters are found in well known places (not described in the RDF)

The updater does the interesting work

This is like RDF Make / Ant

Updaters have their own policy about when to recompute their output (lazy, eager, periodic etc)

The wrappers are typed (eg. FileNode, SparqlGraphNode)

FileNode is simply a wrapper for a shell-script

The SparqlGraphNode inputs and outputs to/from named graphs

Wrappers handle inter-node communication, HTTP by default

If you have 2 nodes in the same JVM you don't need to go out to HTTP, the wrapper can be directly invoked

This is a framework not an API. based on an RDF description.

This is language agnostic - you don't even need to use RDF

part 2: Ontologies and rules for semantic transfromations

uses SPARQL as a rules language (see also SPIN)

(unlike SPIN) this uses SPARQL INSERT

This is more efficient than using CONSTRUCT, because data is handled within the triple-store

Uses dynamic combination of named graphs

equivalent to multiple from named clauses

David Wood: Mulgara does something like this

My question: Why not use the multiple from named construct? Because you might have thousands (e.g. think of patient records)

TBL: Have you considered other graph operations?

David Wood: Algenraic operation s sneak in through real-world use-cases

TBL: Are you looking for the language for writing pipeline operations. eg. In CWM you merge and filter graphs.

The motivation is that ad-hoc pipelines are a nightmare to maintain.

Martin Nally: Was this produced for a customer? Would be more convinced there's a market if a major customer had demanded this.

Oracle has an RDF product (Oracle semantic technology)

There will be RDF support in DB2 - not a major announcement, just incremental development.

Martin Nally: Is there a role for the W3C to compile information about customer use-cases.

TBL: W3C will do education outreach, including case-studies.

These will explain the ROI

David Wood: Aware of a use-case where they have millions of (named) graphs. They run into exactly this kind of processing problem.

Last points about URI versioning

Change the URI or the semantics?

point 1: Publish your versioning policy whatever way you go.

Point 2: Old and new URIs can co-exist (in RDF). Doesn't break extant software

Martin Nally: Works well in publishing web. Read/Write web is more problematic

<ericP> http://dbooth.org/2011/ledp/

How to validate in an open world?

1: model integrity: is what I'm producing sensible?

2: Suitability for use - this is defined by the consumer with own expectations.

Producers supply validator for the data they produce, the consumer for the data they expect.

"SPARQL is my hammer"

Martin Nally: How do I publish my validation (rules) ahead of time - not just a runtime task.

dbooth: The validator is a design-time construct serving that purpose

Martin Nally: There's runtime variability. How do we capture that?

<ericP> [slide 24]

How do you describe that _structure_ (metadata)?

The consumer expects more that just RDF

Sandro: first approx is a enumeration of the vocabulary

Then the constraints on the data.

Martin Nally: Schema can be used to form and constrain data. Can SPARQL be used in the same way?

Sandro: Easy to write inscrutable SPARQL
... Should be possible to use SPARQL (ASK) and schema interchangeably

dboot: SPARQL ASK can be used as a validator

Martin: This is a view definition, it can be used in different ways.

Question: Isn't RDF self-describing? Properties you (the consumer) recognises can be used however they like. What's the role of validation?

TBL: For example, in geodata there's a lat for every long.

dbooth: has finished

<martynas> I wanna mention SPIN again - it has some vocabulary for validation. Haven't tried it though

One more question: How does this contrast with SPARQL motion?

dbooth: that has a centralized architecture

Web Access Control

EricP begins

<ericP> https://www.w3.org/2011/Talks/1207-acls-egp/

on slide 2

About Web Access Control

If the atomicity is the document you end up with hundreds of named hraphs expressing the ACLs

Express ACLs for a particular graph pattern, not a particular document.

Analogous to Oracle label security

Existing Abstract POlicy Language

helthcare dominated by XACML

XACML needs to be profiled, leans heavily on HL7

The expressivity eg. "SalesManager in Boston implies access to regional projections". A bunch of conjunctions

Enforcement ensures request falls within the policy

Expressed in RDF "just for fun"

Enforcement by SPARQL extension functions used in a SPARQL filter. Standard tooling doesn't do anything for you.

Virtual views (SPARQL constructs create a virtual view)

Give syou control over what the consumer can see.

For example you can hide private information

Sprinkle the access controlled data in SPARQL OPTIONALS that will leave the sensitive data unbound.

Each optional has a condition that looks at the access (named) graph for authorisation

slide 13

ACLs are visibly next to the sensitive information.

Makes it easier to debug and inspect

Arbitrary SPARQL expressivity: eg. you can have conditions that are selective on the medication.

XACML doesn't give you that degree of expressivity

Martin: IBM has done some query rewriting for security - ran into performance issues

dbooth: Was the performance issue because they were doing it on the fly.

Martin: No not construction, but execution of the (rewritten) query.

dbooth: What did you do instead?

Martin: fell back to something closer to the virtual graphs. Jena built this at a lower level rather than doing this in-query.

EricP: data obligation (time sensitivity) is hard.

Policy injection (of patterns from a XACML) is future work.

Contrast with Oracle object level security - the users will choose.

Ashok: Is fine grained access control (at the triple level) really useful?

Elsevier: We need to create views that particular customers can see.

EricP has finished

LeeF up next

<LeeF> http://dl.dropbox.com/u/11365687/ledp/ledp2011-data-segmenting-in-anzo.pptx

"Data Segmenting in Anzo"

Data Segmenting in Anzo

<dbooth> by Lee Feigenbaum

Where do we see value in the loinked data space

Cambridge Semantics support deployment of the Anzo platform.

Our hammer is named graphs.

Smallest unit of granularity for versioning, access control. Using TriG

Does everything go into a named graph? single triples (eg. for statement level control) ?

Do named graphs correspond to documents? sometimes docs are artificial constructs.

Or all triples sharing the same subject - gets the job done. subject triple closure.

concise bounded description - though the moral is to avoid bnodes where possible.

Use annotation to denote some properties as internal. eg. The wheels of a car are treated as 'internal' to the car (and end up in the same named graph) rather than having separate graphs for each wheel.

TBL: You're treating the direction of the relationship as being significant.

EricP: You're making access control decisions.

TBL: RDF isn't a tree - don't do object oriented programming in disguise.

<sandro> TBL: with this design, my decision about modeling with "parent" vs "child" becomes an Access Control decision. Direction of link shouldn't matter.

dbooth: The direction is being used as a heuristic way to organize the data.

Sandro: The unit of AC is a subject oriented graph.

TBL: This makes it harder to navigate to the parent - it ends up in a different named graph.

<sandro> sandro: do we automatically also get to see who the subject parents are, or who the children are, having been given access o the subject graph...?

The impact is that we end up with millions of small (named) graphs

But this corresponds with the natural granularity for permissions.

The challenge is to find the right graph

scribe: and sometimes multiple graphs (eg. including both the parents and children of a resource)

The graph name is the same as the resource name

Ora: RDF lets you use the same name for different things so that's OK

Sandro: You mean the name (URI) of the subject? Yes

Elsevier: We treat named graphs as a simple kind of packaging. They are combined later for a given application.

eg in notepad: ex:Lee { ex:Lee a ex:Person; ex:name "Lee" }

That was TriG, the name ex:Lee serves two roles

not kosher but pragmatic

The fallback is a system-wide sparql query.

<sandro> ( IMHO, the reason Lee hasn't come across problems with this is they're not really doing decentralization / linked data. )

graphs are replicated (cached) on the client

Now for linked - data... everything exposed as LD. It dereferences URIs.

LD priniciples are not used internally, but for public consumption.

Sandro: Does it do 303 redirects?

Impractical to enumerate millions of FROM..NAMED. Similar to dbooth but we call them 'named datasets'.

TBL: It's a virtual union graph

RDFS, OWL used as expressive data model (not really open world semantics)

publish RDFa and support JSON serializations.

SPARQL rules. Like dbooth, used CONSTRUCT but switched to INSERT

SPARQL ASK for preconditions and validation

Message: Anzo driven by semweb technologies, but it needs to be integrated within conventional software architectures.

Wary of standards that don't affect interoperability

]

<sandro> Lee: The time is totally ripe for a new Semantic Web Education / Outreach effort.

time is ripe for education and outreach. We're often asked to use their tools against arbitrary SPARQL endpoint. The answer is NO.

SPARQL 1.1. service description may make this easier.

standards needed for: advertising content of linked data sources, SPARQL endpoints

standards for named datasets and other SPARQL extensions

David: What extensions?

A few hundred function extensions.

Question: Do you have a feeling for the right kind of sizing for graphs in typical application?

Answer: It depends on the application, depending on performance metrics

Have you looked at Kasabi (re advertising content of LD)? Will look into it.

David Wood: Taking an action item to set up group to look into use/extensions/federation of SPARQL

e.g. what functions, extensions are supported by an endpoint.

David Wood: Also which predicates. vocabulary

TBL: That's a different level

There are a few different aspects, different groups are concerned with different aspects, they need to get together and agree, adopt and implement.

David Wood: There's a core of people who would be interested in talking about this and making a member submission.

EricP has picked up the ball and taken the action item.

LeeF has finished

<betehess> scribe: Alexandre

<betehess> scribenick: betehess

[note: DS will stand for David Schaengold, Revelytix]

DS: starting a background on revelytix

Validation of Distributed Enterprise Data is Necessary, and RIF Can Help

DS: we're in DC area

<dbooth> http://www.w3.org/2011/09/LinkedData/ledp2011_submission_16.pdf

DS: we're profitable

<dbooth> by David Schaengold (Revelytix)

DS: around 35 people
... will explain our software architecture
... we have different kind of stores
... it's not exactly a semweb architecture
... it's more like data integration
... we have rdbms, flat files, sometimes triple stores, etc.
... we have spider, an implementation of r2rml
... and then a module for sparql query federation
... r2rml in an emerging standards (at w3c)
... relation to rdf ML

Ora: how do you tweack the sql for your database?
... speaking about case where databases can't do some kind of joins

ashok: you have full power of sql

Ora: let me ask again: there are some stuff you can't do with SQL, like Hive

<sandro> "hide" ???

<sandro> (something which "can't do self-join")

DS: we can take advantages of indexes, and other things
... the result set is embedded into triples, that can be given to spider
... spdier does optimizations during the federation
... we intend to support virtualized queries
... related to enterprise ontologies
... we love sparql, but http is kind of slow
... Sherpa is an adapter that is faster
... between sparql federation and spider
... so, we've run into problems
... for example at the integration level
... for example when you unify IDs among different stores
... so we thought using owl or sparql
... but they have different modelization capabilities

[DS giving examples of limits of sparql/owl]

DS: we need *one* language
... for the set of rules
... we're using RIF

[ enumerating key points for RIF from the slides ]

DS: RIF has 2 dialects
... SPARQL is not a rule language
... so we don't validation with it
... there is no distinction between sparql query and sparql rule
... too confusing

<ora> earlier, the database I was referring to was "Hive" (for Hadoop)

ora, is it even considered as SQL-compliant?

<ora> no, it isn't, it is "SQL-like"

DS: just by clicking a button, we can know which source is concerned

<LeeF> Permanent link to my slies: http://www.slideshare.net/LeeFeigenbaum/data-segmenting-in-anzo

<rulesguy> I am curious to see RIF used as a 'rule language' as opposed to a 'rule interchange format'

DS: triples are frames, will give an example later
... here is the example
... [see slides for the query]
... it's a disjoint property written in owl
... if you know rdf, it's easy to write

sandro: it's not the interchange format
... it's a convenient syntax

<ericP> davidwood, Arnaud, ashok, et al, proposals for breaks? strawman: http://www.w3.org/2011/09/LinkedData/Report

timbl: but these people wrote a parser for that

sandro: there is some history there, we chose XML and RDF for the exchange format

DS: problem: we don't have NOT
... I mean not generic negation
... but we needed it
... sometimes, we can use built-in predicates
... if rules are too complex, you need it
... our solution: implement Not() with rif:error()
... we anticipate there will be a profusion of zero argument predicate in the future

<davidwood> ericp, We should have a breakout regarding the definition of a minimal RDF/REST server.

sandro: rif:error is not part of RIF core

DS: but we use it a lot, we'll be happy to see it standardized
... the rules have labels in documents, we see it in the output
... so we can show you all the uris that triggered an error
... there is no inference here

timbl: and there are tagged as strings?

<davidwood> ericp, ...and another breakout regarding pattern collection (I think). Let's poll immediately following David S's talk.

DS: not sure

sandro: this is addressed by RDF 1.1

DS: we have entailments as a bonus
... we don't use not() here
... we do entailment before validation
... you can try it out
... download spyder, it's free
... not currently open-source, but we're working on it (maybe next year)
... spinner is the federated query engine
... and Rex is the RIF-implementation

davidwood: was told they would open-source it before semtech in June
... not sure what that meant
... looked on the Web right now, there is still a license agreement
... so maybe there is a misunderstanding about free and open

DS: we want to make the source available
... I don't know about the license
... just want people to be able to use it

davidwood: maybe it's just a communication issue

DS: there is code overlap between spyder and spinner
... but you don't see it operationally
... we haven't implementation the sending of email from the rules yet
... thanks

davidwood: what will be the breakout sessions?
... trying to do that before eric comes back

<dbooth> http://www.w3.org/2011/09/LinkedData/Report

Breakout discussions

[people trying to organize the breakout sessions]

<sandro> Martin: (1) LDEP Basic Profile / 101 Curriculum. Self-contained set.

<Sumalaika> so topics (1) Missing specifications (2) Education (3) ROI

<Sumalaika> (A) Missing Specifications (B) Education and ROI

<sandro> sandro: How about we each breakout makes a list of deliverables?

[trying to identity the deliverables a group would create]

timbl: I see 3 deliverables
... spec of the basic profile
... which is not best practice
... then tutorial for developers new to this world
... and a whiter paper for management

@elsevier@: the tutorial is a set of resources to get started

davidwood: it's already on the web

people: there are too many resources

brad: the thing is that you still have to make choice
... it's different from providing a minimal profile to achieve something
... we have big players in this room, that want to do linked data
... we can reduce freedom to make easier to start

timbl: pagination is a feature part of the spec

<bheitman> here is the Linked Data Patterns text: linkeddatabook.com/

<bheitman> http://linkeddatabook.com/

Cornelia: profile is basically a subset of things that already exist. I see a Linked Data pattern spec being missing. trying to solve the identity for example

<sandro> http://willyou.typewith.me/p/eldw

Arnaud: they are not always subsets
... it can be about combining specs
... I'm ok to choose another name
... doesn't have to be totally new

<sandro> ericP, http://willyou.typewith.me/p/eldw

Arnaud: it's more about putting things together

<Sumalaika> There are topics that come up over and over again in LD prpojects

<Sumalaika> They need specific attention

<Sumalaika> e.g., URL opqueness

<Sumalaika> Resource Identity

<Sumalaika> these topics need attention urgently

@@: depending on your community, you starting point is different

scribe: so we need to identify the enterprise developer we want to reach
... for example, take the pet store from the jee community

timbl: I've got a worry
... take anzo clients, they are all different
... if it's too simple, people may think they can do it with only XML
... dangerous to tell people how to do things if they come to the LD world
... for example, sometime ETL are good in some situations, sometimes they are not adapted

@@: I agree, but we have to be specific

scribe: because we may not make any progress
... maybe one example is not example
... we need the courage to be specific
... my sugegstions are:

<timbl> No server is available to handle this request.

scribe: 1. describing the target, what they know, etc.
... 2. get specific about examples
... usecases can be expressed as examples

ericP: let see if we can identify some patterns
... people see that outreach is a major issue

Arnaud: one of the problem is the lack of definition
... could see that at IBM

<davidwood> Linked Data materials:

<davidwood> Linked Data: Evolving the Web into a Global Data Space By Tom Heath and Chris Bizer http://linkeddatabook.com/editions/1.0/ Linked Data Patterns Edited by Leigh Dodds and Ian Davis http://patterns.dataincubator.org/book/ Linking Enterprise Data (the entire book is on the Web) http://3roundstones.com/linking-enterprise-data/ The Joy of Data: A cookbook for publishing Linked Data on the Web By Bernadette Hyland Slides:

<davidwood> http://www.slideshare.net/bhylandwood/bernadette-hyland-semtech-2011-west-linked-data-cookbook Book chapter to be online shortly. Linked Data 101 http://3roundstones.com/linked-data-101/ Searching for "linked data tutorial" on Google brings up lots of developer-oriented materials.

Arnaud: can't point to one single document
... we need an official standing, like spec, not just a note from timbl
... would make a lot easier for people to understand

davidwood: that would just add one more resource on the Web

Martin: it's not about creating more patterns
... it's about identifying the important ones
... I think there is core set we can identify
... as a basis to start in an enterprise context

Arnaud: martin we have at least 2 different communities
... so maybe 2 different profiles?
... that's why profiles are for
... makes it very hard for us to communicate to clients
... it will help putting names on these concepts

sandro: will tell a story here
... some time ago, UK produces a bunch of data
... first org I knew that just said: let's just try to do it
... when they finished, the came to us, saying it had to be standardized
... at that time, it was only govs
... no enterprise were there!

<sandro> http://www.w3.org/2011/gld/charter

sandro: the charter says things about best practices
... including linked data system
... so that people can point to it
... it's sort a basic profile
... next is about how to build vocabularies
... then how to publish

Cornelia: and there is security too

sandro: just concerned that gov people had not participated to this workshop
... sensing there is some energy here
... so maybe we can recharter the other group

Arnaud: I looked at this charter before
... but with the gov context
... but maybe there is a significant overlap here

sandro: yes but it can make things more difficult, for example URIs construction

Arnaud: URIs creation can be very domain specific

sandro: lot of invited experts
... due to structure of w3c
... you find DERI, RPI, etc.

<davidwood> http://www.w3.org/2000/09/dbwg/details?group=47663&public=1

sandro: there are active people, but that's difficult
... still a lot of work here
... no FPWD yet

dbooth: there is clearly an overlap
... I also had these goggles on when I looked at the charter

Cornelia: would help for education, outreach

ericP: there is opportunity for bringing other Members

davidwood: there are issues that don't happen with govs, like IP
... will participants participate with such a charter

Cornelia: at least, I will press my legal department to participate
... but there is definitely IP

sandro: but people can say that standards is more important for than IP :-)

Arnaud: don't commit for IBM, but I don't foresee any issue at this point

martin: my fear is that if it's out of scope
... we came here with enterprise in mind

sandro: ibm, emc, oracle are gib players for govs as well
... so enterprise data for you can be gov data for them

timbl: gov data is more about read, enterprise needs read/write

Arnaud: @sandro, as the group can't do everything, pratically, ,what are they doing?

sandro: dont really know, we have people who accepted each item

davidwood: we can write deliverables in the charter to make it more concrete

martin: yes, and I would add deadlines
... that's why defining the kind of persona you want to reach
... so that you can focus on

ericP: so, pet store could be too small
... and may not provide enough details
... so if you say you 1, 2, 3, ...
... it can be much bigger

ashok: there is a CG about networked data
... anyone knows about it?

people: we don't know it

<sandro> http://www.w3.org/community/networked-data/

sandro: I'm a bit worried about Gov data group, because they have not made commitments yet

<davidwood> Government Linked Data Cookbook draft: http://www.w3.org/2011/gld/wiki/Linked_Data_Cookbook

sandro: there may be answers to most of the questions, but they have not shared them with the group yet

ericP: also, we may produce something that is not a charter, but close enough
... so we start documenting work that IBM has done
... can help see the overlap

sandro: can't build a community without a place to do that

ericP: mailing list?

timbl: we should try to find the content for the deliverable
... what to we need to specify
... making sure that some things are not messed up
... like sparql update for example :-)
... we had people speaking about collections, seems important
... maybe webacls too?

Arnaud: liked how tim framed it earlier
... that's a standard, a spec
... speaks about compliance

<sandro> "Linked Data Basic Platform" -- compliances

[break for lunch]

<bheitman> scribenick bheitman

<bheitman> scribenick: bheitman

Discussion of how to proceed

eris is bringing up the list of possible deliverables

eric of course

john: how is enterprise data distinct from linked open data ?

ericP: we will run into debates as the differences between enterprise and open data is not clear, and this could be about any kind of linked data

discussion about use cases for data which requires read / write data

timbl: for crowdsourced data, people need to change the data
... governements have different problems with LD, enterprises have different ones again
... a chain of data providers where one buys data from the previous is a very enterprise thing

ericP: the gov LD and the enterprise LD people can work together on the intersection

david booth: would it make sense to recharter the gov LD group ?

martin nally: we should not add a new thing to a dormant working group

arnaud le hors: we could take over that part of gov LD group

<dbooth> DavidWood: I strongly object to the assertion that the gov LD group is dead.

ericP goes through gov LD out of scope list.

ericP: almost everything in the out of scope list is in scope for us here

david wood: in an enterprise context you want to define e.g. a relationship to XML

TimBL: this new group would be about creating a core architecture
... any client who conforms can use that system, you can build any kind of app on top

david wood: we need to define all the edges, how do you get to RDF ?

david booth: its too ambitious to require it to work with any client

TimBL: this is just about the spec, not the "on ramp" guide
... its about some turlte, some control onotology... (?)
... a linked data basic plattform

<dbooth> s/any client/any app/

sandro: similar to the w3c web app plattform name

ericP: are we moving towards a linked enterprise data charter ? who thinks this is wrong ?

sandro: this is about solving martin nally's use case

<sandro> ... in an entirely standards-compliant manner

<sandro> tim: With Access Control as an Optional deliverable.

timbl: we might not handle access controll in this charter,.
... sometimes when people handle problem a, they realise they have enough for a spec for prob b as well

ericp: if we use martins story whch is very short, then this is a very doable task

martin nally: we are looking at a short term solution

scribe: this is about solving small problems to gain something in the short term
... i would be nervous about being too ambitious

ashok: we should do this fairly quickly without a 4-5 year work group

ashok and ericp: lets do it in 6 months

david booth: what do you want to sacrifice for that ?

consensus about taking out a lot

arnaud le hors : dont try to solve everything, scope it so that the result is useful

<sandro> (not consensus, I think)

sandro: sorry, was not sure how to document a quick discussion

arnaud le hors: the reminder can be part of the next version

<sandro> "schedule-driven", "deadline-driven", "time-boxed"

timbl describes data.fm

timbl: we can use data.fm to describe test cases for read write data

<dbooth> http://data.fm/

<sandro> s/timbl: we can use data.fm to describe test cases for read write data/timbl: we can use data.fm as one example of a platform implementation/

ericP: the list of topics for the ld platfotrm basic profile contains many entries which are output of other working groups

sandro: we could make an interest group which uses the expected specs as input. we could have a mailing list , maybe a few phone calls

discussion about writing a primer for linked enterprise data (?)

sandro: there is no list of what precisely describes linked data. there is no spec or primer (?)

martin nally: we need this kind of document

ericP: a primer could be sufficient

alexandre b.: a primer is not enough

<sandro> sandro: What's needed is a definition of Linked Data, which is basically a list of specs needed.

ericP: is our time best spent writing a spec . or a primer which explores same use cases as IBM has already explored

arnaud le hors: you can claim compliance against a primer

member: arnaud le hors: you can *NOT claim compliance against a primer

<sandro> Arnaud: You can't claim conformance with a Primer.

martin nally: we have about 5 tools in ibm rational, they work together because we made some choices together

scribe: others will make other choices, thats the problem

ericP: looks like we will need a test suite and such

david wood: why could you not claim compliance with a primer if it is a package of standards ?

arnaud le hors: what we are tallking about is about talking about a combination of standards

lee feigenbaum: too me its pretty clear that there is more then one way to use the standards, a spec would mean that there is a clear way.

<dbooth> bheitman: This sounds a lot like the AWWW. Defines high level concepts and how they work together.

<dbooth> http://www.w3.org/TR/webarch/

<timbl> http://www.w3.org/DesignIssues/ReadWriteLinkedData.html

<dbooth> timbl: Design issues docs ended up like a primer.

<dbooth> ... I've added questions that got asked.

timbl: this introduces new details. there is a little bit of spec in there, the rest is about how to test things
... its nothing to which you can claim compliance too. there is just one test, you would need several thousand tests
... profile means, saying which level of which standard to use together

<sandro> PROPOSED: We want a Working Group to produce a W3C Recommendation which defines a Linked Data Platform -- something that solves IBM Rational's use case (presented yesterday). We expect this to be an enumeration of specs which constitute lnked data, with some small additional specs to cover things like pagination, if necessary.

<dbooth> s/lnked/linked/

arnaud le hours seconds it

<davidwood> -0, because I question the need/desire for a WG, but agree with the deliverable.

john battle: make sure its about constructive use of specs

<davidwood> Could this instead be a Member Submission that could be subsequently blessed by W3?

<dbooth> s/john battle/SteveBattle/

ora: sounds like a conglomeration of specs

<davidwood> s/john battle/steve battle/

ora: I am not opposing this, I am just wondern what it means for the process

john wood: why do we need a seperate working group

arnaud le hors: making a submission is definitively on our agenda

<sandro> s/john/david/

<LeeF> s/john wood/david wood

ericP: many successfull working groups started as member submissions

sandro: it would be possible as a starting point to specifcy that the group can fall back on a working group

david booth: an interest group is an alternative

<sandro> arnaud: what's your resistance to the WG, DavidW ?

<LeeF> I agree wholeheartedly with dwood

david wood: there is a lot of time spent unproductively in a working group

<sandro> davidwood: They spend a lot of time. The process, the sociology, ....

scribe: there might be a possibility for parts of the deliverable to be made by different existing groups

arnaud le hors: should the deliverable be hosted by an existing working group or by a new one ?

<sandro> davidw: I support it being a Rec, but not necessarily a new group.

david wood: we should use the w3c process in a lighter way

david booth: I am not convinced that it needs this level

ericP: we need tests.

<betehess> my take is that the Recommendation is the only way to have a strong enough voice to exist within the Semantic Web ecosystem

ericP: if we want the recommendation because we want tests, then we can write the tests independently

martin nally: we can not get serious traction of this tech in ibm if it is not a standard

<sandro> martin: The reality is we'll never really convert even other parts of IBM without this being a standards. A Note or Best Practice or whatever is not enough.

<dbooth> Fair enough.

timbl: the working group chartering process has a lot of flexibility, so e.g. face to face meetings are optional
... you cant have a member submission but do it behind closed doors

david wood: i like the idea of a meta spec

<sandro> RESOLVED: We want a Working Group to produce a W3C Recommendation which defines a Linked Data Platform -- something that solves IBM Rational's use case (presented yesterday). We expect this to be an enumeration of specs which constitute linked data, with some small additional specs to cover things like pagination, if necessary.

<sandro> (no objections)

<sandro> (obviously lots of people could word-smith that phrasing.)

ericP: it looks like we have enough resources

arnaud le hors: IBM can not work for something 6 months before submitting it as a spec (?)

<dbooth> s/can not/should not/

<sspeiche> http://www.ibm.com/developerworks/rational/library/basic-profile-linked-data/index.html

scribe: we should not work behind closed doors on this

<betehess> scribenick: Cornelia

<sspeiche> Which is an updated version of what we have done(learned) by work done in public at http://open-services.net/wiki

sandro: Use a community group to create the member submission?

<tlr> note you don't need a member submission if you have a community group. But you can set up those groups very quickly and easily for initial exploration.

Timbl: what about submitting something very close to what IBM already has, right away?

sandro: use the community group to allow inclusion on the member submission

ashok: It takes a long time for the lawyers of places like Oracle to bless employees to join even community groups

<timbl> saved copy of pad: http://www.w3.org/2011/12/07-ledp-pad.txt

ericP: Not everyone offering advice must be a member of the community group

arnaud: I'm not against community groups but I don't want to be sidetracked by that. I don't want it to delay the formation of the working group

ericP: the community group will want editorial control

arnaud: Suggest taking current document, allow others to sign on in support, and make that a member submission as is.

sandro: What about overlap with other standards?
... i.e. REST and SPARQL - do we have them stop this and we take it on or let them do a 1.0?

EricP: Do we have any evidence that what they are doing here isn't what we would want?

sandro: yes. PATCH

Timbl: The PATCH paragraph is informational and loaded with SHOULDs

<Arnaud> http://www.w3.org/2009/sparql/docs/http-rdf-update/#http-patch

LeeF: no one in the working group uses PATCH

EricP: I am hearing that there are people who use PATCH

<LeeF> http://www.w3.org/TR/sparql11-http-rdf-update/

Martin: the way that I read the spec: "If you know RDF, etc. and want to expose resources, do REST, here's how you do it"

<sandro> LEE, want to paste the Ed's Draft, too?

Martin: my people are not coming from that "graph store" perspective. They know resources, not "named graphs"
... people coming from different planets

<betehess> this is basically what I'll *need* to specify anyway for the Validator Suite project, as it will rely on all that stuff: http://www.w3.org/2011/12/07-linked-data-betehess.txt

<LeeF> editor's draft of Graph Store Protocol - http://www.w3.org/2009/sparql/docs/http-rdf-update/

<timbl> MS-Authpr-Via

timbl: discussion around ms-author-via header and how it can be used to detect that something is editable
... but this pattern isn't written up anywhere
... discussion around a spec that mostly references other specs.
... <rant>
... there have been a lot of things that have not been written because they are too small.
... but if you just defined that one predicate (for example) there are a huge number of things you can do as a result

</rant>

EricP: I think we are arguing about the editorial nature of a spec.

sandro: I want to come back to the question of whether SPARQL REST should be moved from the current working group

davidwood: I'm a bit concerned about this workshop resulting in taking something away from the core SPARQL WG

sandro: The REST part is arguably not core

<timbl> public-rdf-dawg-comments@w3.org

<timbl> http://www.w3.org/2009/sparql/docs/http-rdf-update/#http-patch

<timbl> davidwood ^

timbl: Can someone from SPARQL working group ... do what?...

yes - LeeF

EricP: SPARQL endpoint vs. Web resource

sandro: Doing a RESTful interface over a graph takes some work

timbl: disjoint from the read/write web

martin: I don't want to tell every web programmer at IBM that their resources are graphs.

timbl: We will talk about web abstractions, not graphs

sandro: When do we try to work through this SPARQL/web resource issue? Before the WG?

<davidwood> I took an action to make a public comment to the SPARQL WG regarding the lack of PATCH support in IE's XMLHttpRequest object.

sandro: This work group must have input into SPARQL HTTP spec

martin: it seems odd that the spec assumes an RDF store - it's an implementation detail.

timbl: All of the SPARQL implementations operate against an RDF store

Arnaud: Might make sense to have what timbl suggests - that this WG write their version of the spec

sandro: with the same test suite across the two groups

<LeeF> I'm highly suspicious of what seems to be an underlying assumption here that standards get widespread adoption from popularly-accessible specs, rather than from copying working examples, working code, and educational materials

EricP: I have some outstanding issues here
... are we poking our fingers into SPARQL's affairs

<davidwood> Sent email to public-RDF-dawg-comments@w3.org re IE's lack of PATCH support in its XMLHttpRequest object.

EricP: a r/w interface that is web oriented

<davidwood> LeeF, ^

<LeeF> thanks, davidwood

<timbl> http://www.w3.org/wiki/EditingData

Arnaud: There is a document already out there and I invite comments but I don't want to spend a lot of time preparing the member submission.
... there will be authors and co-submitters.

<timbl> http://www.w3.org/community/rww/

<timbl> http://www.w3.org/wiki/WebAccessControl

Arnaud: goal to have member submission in Jan 2012

davidwood: when I look at the open-services.org I see lots of gaps

Cornelia: this is not what is proposed as a member submission. The developer works article is.

<sspeiche> A proposal started by IBM http://www.ibm.com/developerworks/rational/library/basic-profile-linked-data/index.html

Discussion on the way to collaboratively edit...

googledocs, W3C Wiki, word doc?

timbl: we can put the docs in a database and use the w/r web to post updates

davidwood: I would rather use w3 infrastructure.

timbl: would then have a history in cvs

Arnaud: Davidwood has a good point. How do we move forward
... next step: Have a look at the dev works article and provide comments

sspeiche: and also indicate the level of engagement you would have.

sandro: I can create a mailing list.

Arnaud: maybe not public yet?

timbl: I think you want to make it public because there are likely a lot of interested parties outside of this group.

Arnaud: concerned that too large a forum might delay getting the member submission in.
... Let's not over engineer the process of providing the member submission

sandro: r/w web is only 10% of the problem in linked data

timbl: pagination

sandro: validation

The above are other parts of the problem.

Arnaud: the working group needs to decide these things. Not prior to the working group formation.

davidwood: the more immediate concern is getting a good member submission.

sandro: The charter is a bigger problem. Need community consensus

davidwood: Agree that we need to engage the existing linked data community
... how do we do that?

sandro: In ... we did a survey, ...

<sandro> sandro: for the RDF 1.1 WG we did a survey before the charter

<sandro> davidwood: the charter needs to be shopped 1-1 to the enterprises.

timbl: Suggest sandro could publish the current draft charter relatively immediately.

sandro: someone needs to talk to Microsoft and various other players to see if they would come on board.

Arnaud: How are charters created these days?

sandro: I do them in public

dbooth: Do you mean public or member?

sandro: public

Cornelia: why are we calling out MSFT in particular?

<sspeiche> What about Google?

davidwood: they are big, browser vendor, ...

Others

scribe: Amazon?

davidwood: some are in the room. IBM, Oracle, EMC,...

<sandro> who else should be in the group, maybe...?

Arnaud: What do we need to produce for this workshop?

<LeeF> How about the people actually doing enterprise semantic web? e.g. topquadrant? or is enterprise semantic web disjoint from enterprise linked data?

EricP: This decision, the formation of the WG and the member submission, are the most significant.
... everyone please give me your slides.

<sandro> LeeF, certainly we should discuss it with them.

<dbooth> Eric's email: eric@w3.org

EricP: Resolution
... IBM hold keys to member submission

<Arnaud> lehors@us.ibm.com

EricP: people will contact Arnaud to indicate interest in participating in this.

Arnaud: We will keep the member submission process lean. This is just the starting point.

<betehess> Arnaud, what's the ETA for the IBM's Member Submission?

sandro: Many people in the room have remained silent on this. Any concerns?

There were none voiced.

EricP: Sandro to own the charter

sandro: modulo w3c staffing issues.

?: If we are not creating a mailing list then were do we communicate?

sandro: I will create a mailing list for charter discussions.

<betehess> sandro, I will be working on this stuff anyway because of the W3C Validator Suite

EricP: Where do we have technical discussions?
... or do we want to have those direct with Arnaud?

timbl: TOC for the spec could happen on this new mailing list.

<sandro> public-ldp@w3.org created. http://lists.w3.org/Archives/Public/public-ldp/

timbl: actually solving technical problems maybe on an interest group list

sandro: suggest having those discussion on the new list

davidwood: with alerts to the interest group lists.

<timbl> public-ldp-request@w3.org

davidwood: Details on who wants to be or should be on the member submission?

Arnaud: interested in having others there. Said (from EMC) expressed an interest earlier. Nokia says they are interested.

davidwood: 3 round stones are interested (albiet a small company)

revelitix also interested.

<davidwood> Oracle, too

Arnaud: Is there anything else, beyond the formation of the WG, that this workshop would like to recommend?

timbl: such as tutorials, etc.

sandro: primers for different audiences, i.e. gov, enterprise, EU gov, ... maybe?
... there are others who can write those (other than W3C)
... Other things?

Cornelia: ROI for selling the concept in the enterprise

timbl: As case studies

ryan: We need hard facts in those.

timbl: NASA use case

sandro: this and others had the form - we tried x at cost(x) and failed then did RDF and succeeded

ora: we have a lot of qualitative data but little quantitative

davidwood: 60% of the cost of a product is in maintenance and some large percentage of that is in assessment
... can Elsivier provide such numbers?

Brad: Have to think about that.

davidwood: there are other people I can go to and ask

sandro: Who is going to write this up and publish it?

timbl: Talk to Ivan

sandro: davidwood will gather case studies. Only those without proprietary information

davidwood: Only those without proprietary information
... Usecases, casestudies already up for collection
... add ROI information as something to collect

<Arnaud> http://www.w3.org/2001/sw/sweo/public/UseCases/

sandro: I'd like to have this group review these case studies, use cases, ROIs

timbl: suggest ian jacobs to do interview
... the list of workshop participants should be distributed. Any objections?

davidwood: chairs will send out pointer to minutes, contact list, appropriate mailing list, etc.

Meeting is adjourned

<betehess> http://en.wikipedia.org/wiki/Stream_processing#Stream_Programming_Languages

Linked Enterprise Data Patterns Workshop - day 2

07 Dec 2011

Attendees

Contents