Data Activity Coordination Group

26 Feb 2014

See also: IRC log


PhilA, Guus_Schreiber, David_Wood, Arnaud, JeniT, Sandro, ivan, caroline_, yaso, ericP


<PhilA> hi yaso - code is 26631

<yaso> Tks Phil :-)

what's the code?

<yaso> yea

<yaso> yes

I'm happy to scribe

<scribe> Scribe: Jeni

<scribe> ScribeNick: JeniT


<Guus> zakim who is here?

yaso: I work for the Internet Steering Committee
... UX designer & developer
... I work with data viz

<PhilA> yaso also works for W3C Brasil

yaso: one of the co-chairs of the WG on Best Practices on the Web
... working on Best Practices based on use cases
... collecting common questions at the moment
... which we will try to answer

PhilA: WG is new, going for just over a month
... most of you were involved in the SemWeb coord group
... data activity is combination of SemWeb & eGov
... coord is to get together chairs
... to understand what other groups in the area are doing

davidwood: the RDF 1.1 WG is the 2nd group I chaired with Guus
... 2 year WG that took 3.25 years
... shocked that it's ended successfully
... I'm a software engineer in industry

Guus: I'm co-chair RDF 1.1 WG, and had several other co-chair positions eg of OWL group

<PhilA> RDF 1.1 to Rec announcement

Guus: my real job is professor of computer science in Amsterdam
... going on sabbatical to Southampton for 6 months

PhilA: have you completely finished on RDF 1.1?

davidwood: there are possibly some minor edits to come

Arnaud: lead data standards lead at IBM
... at IBM almost 15 years
... before that at W3C as staff member
... I chair the Linked Data Platform WG
... in existence almost 2 years, current charter expires June
... trying to finish main spec
... ended review period with few comments, but significant ones
... including from timbl, so we couldn't ignore them
... we finally closed all the issues, editors are trying to implement the resolutions
... there are significant changes, so we'll go to 2nd last call
... and there are some people in the group who are questioning some of the functionality in the spec
... so we're still trying to decide if we can go to 2nd Last Call or not
... hopefully we'll decide next Monday

PhilA: are you already thinking about LDP2?

Arnaud: yes

PhilA: but getting to Rec by June is going to be tough

Arnaud: yes, there's no hope of Rec, we might be Proposed Rec
... it depends on what comments we get
... and on implementations
... we have a wish list
... we have different camps with different use cases
... and lots of wishes we couldn't address in the time
... so we had to agree to limit the scope
... which we did by creating the wish list for LDP2

PhilA: creating a wish list is a useful way to make people happy with scoping
... would it be helpful to start framing that wish list into a charter?

Arnaud: yes, I'd like to when we get past the Last Call

<PhilA> scribe: PhilA

JeniT: I', Tech Director at the Open data Institute and I'm on the W3C TAG, and co-chairing CSV on the Web WG (with DanBri)
... started end of Jan with Ivan as TC
... scope is how to supply extra info with CSV files to give context etc. that you currently can't get into CSV
... We've got 2 specs being edited. a UCR
... all grounded in real world cases
... also started work on putting together a model of tabular data - annotated tabular data - that can be expressed as CSV
... working out requirements before looking at the many alternatives available
... charter due to June 2015

<scribe> scribe: JeniT

ericP: I work for W3C, working in Healthcare & Lifesciences Interest Group
... we're trying to promote uptake of Semweb in Healthcare & Lifesciences
... been focusing on clinical care standards for data
... eg electronic health records
... and clinical trials for the FDA, building ontologies so future clinical trial submissions will be in RDF
... to make it easy to integrate clinical trial data
... the Interest Group rarely provides support -- I do most of the work

sandro: I've been taking care of RDF WG for last 2 years
... now transitioning from semweb work towards decentralised social work
... redecentralising the web
... away from Facebook etc
... we have some research funding
... LDP & other standards are key on this

ivan: I'm staff contact for CSV on the Web WG, and otherwise am uninteresting
... not part of this activity, but I'm involved in Force11 which looks at collaborative publishing
... I co-organised a workshop a couple of years ago
... I'm in the "Board of Directors" of that stuff
... we're finalising a set of principles on how to cite datasets etc in scholarly publications
... with final text published this week or next
... that's on the side
... I'm also activity lead for digital publishing
... there are several areas there that raise issues relevant to this group
... eg metadata has become a huge issue for publishers
... the vocabularies are in chaos

<PhilA> Data Citation Principles that Ivan's talking about

ivan: there are probably three different vocabularies for each term
... the IG can give an overview of what vocabularies are around and how they're used
... the other thing is what kind of URIs you use to specific areas within a book
... vs the book on my machine etc
... and if there's a URI scheme that's usable for that
... and issues around annotation, essential in books particularly in the educational market
... which also relies on an RDF vocabulary

PhilA: there's clearly a lot of overlap there
... The point of getting together on the phone is to make sure we know what each other are doing
... and keep an eye on the community
... what people need, what we should be doing next
... davidwood, can you summarise your view of how linked data & the RDF world is doing?
... in the context of the US public sector?

davidwood: general feeling is that Europe is way ahead on the interesting use & funding of linked data projects
... with the UK well out in front
... the public sector use in the US has been centred on the environment & on health (HHS)
... driven by a single individual
... that project has slowed
... EPA has been in a morass of bureaucracy
... they've been stuck on system security bureaucracy for the last 2 years
... I wrote a developer's guide for linked data, released as an ebook last year, as print in Jan
... sales in the US have been strong
... currently Manning's #6 in their best seller list

<ivan> -> David

<ivan> -> David's book: "Linked Data", David Wood, Marsha Zaidman and Luke Ruth, Manning Publications (2013), http://manning.com/dwood/

davidwood: there's quite a bit of activity in China, driven by govt owned agencies
... there are conflicting messages
... there's some interest in Brazil, but they're struggling for support
... I'd say Jeni's group were the leading people on linked data in govt

PhilA: yaso, when you talk to people about linked data, what do people ask?

yaso: they ask what data to put on the web
... 2nd question is how to collect data correctly, when planning to open it

PhilA: does anyone ask about technology?

yaso: yes, about exposing catalogs like CKAN or Socrata or Junar
... they don't know about URIs or vocabularies

PhilA: we had a talk from a CEO in Palo Alto the other day
... I asked about whether they'd considered using URIs for eg buildings
... he said he didn't know what URIs were

<PhilA> scribe: JeniT

<PhilA> scribe: PhilA

JeniT: The ODI is trying to support the use of open data. When I talk to devs, almost none of them ever ask for RDF. The only ones that do are academics
... that puts us in a position. We need to provide data to devs in a way they understand it
... but building in a tech architecture. We call them URLs for political reasons - reduces barrier to entry
... normal devs are happy with idea of embedding URLs in data as long as you don't say RDF or Linked Data
... less religion

<davidwood> +1 to URLs over URIs. That works for me, too.

JeniT: Hence interest in CSV, tabular data etc

<JeniT> ScribeNick: JeniT

PhilA: The data activity was put together to try to address that
... Arnaud, you represent big business, how do you see the world?

<ericP> actually, best practice would entail a new term: IRL

<davidwood> People seem to know URLs, but not IRIs, URIs, URI-references, etc.

Arnaud: IBM is very big
... data is a big thing at the moment eg big data, data analysis, huge amount of activity
... IBM Watson has a whole division which builds on semantic technologies
... the Watson engine is fed with things like DBPedia
... there is interest in linked data from that point of view
... we always have people in IBM interested in every technology, because we're big
... IBM is technology agnostic, we'll do whatever the customer wants
... we haven't made a decision to adopt & push linked data
... we have groups who have embarked on linked data
... linked data is big for some groups at IBM
... the IM (?) group doesn't see much demand
... the DB2 group decided to add linked data on their database
... and there hasn't been much uptake
... customers don't really care much

ericP: IBM doesn't have the greatest history of marketing: do Healthcare & Lifesciences communities know about this?
... they always go to Oracle

Arnaud: tbh Oracle is a better choice in this regard
... so the question of open data
... IBM gets a bunch of requests asking for help complying with the mandate to open data
... that is significant
... but customers don't care about which technology is used, so long as they comply with these mandates

PhilA: It sounds as if, if you talk to some people and include "RDF" & "linked data" then some people will be turned off
... some won't know what you're talking about
... we have a marketing problem
... yet here we are trying to bring the benefits of this to these people whether they want it or not
... my underlying question today is, how can we address this?
... and is there something we need to standardise now?
... it's a wider question about W3C's role too

Guus: I'm not sure it's a marketing problem
... even in social conversations, people ask about big data & data transparency
... I think it's about how you market it

PhilA: should we talk about big data instead?

<Zakim> davidwood, you wanted to suggest that not all is bleak; there are well defined use cases

Guus: it's all data: how you make it available is a theme & we should make sure we're part of that

davidwood: I'm giving a tutorial today at a conference on Healthcare & Lifesciences
... the people come from pharma companies
... they're all looking outside mainstream technologies
... there are pockets of these kinds of use cases
... where there are combinatorial data needs
... if we apply linked data to these industries, we're very successful
... it's not mainstream
... the big data meme has taken off, because of clever marketing

<Guus> in the application area I work in (libraries, musea, archives) publishing data on the Web is a central issue

davidwood: if I say that the work that we do is part of big data or data transparency or open data, we can have a conversation

PhilA: Eric & I are speaking to Tom Baker from DCMI about RDF validation / shape expressions etc

ericP: Dublin Core in 2008 did a description set profile, way of saying what properties you're expecting
... associates a set of properties with a shape, then say a node is a shape
... I extended the expressivity for shape expressions
... we looked at the expressivity of DSP & shape expressions

PhilA: given that and the workshop in September might lead to a WG

ericP: that's our expectation, with Arnaud

Arnaud: we've submitted a spec, with some co-submitters
... we hope to have a WG created for that, and the workshop concluded with that recommendation

PhilA: there's a bit more community building to do

ericP: the DCMI work will provide use cases & W3C will do spec design & test cases

<PhilA> LGD14

PhilA: the other thing coming up is the linking geospatial data workshop next week
... we don't know what the outcome will be, but it looks like it could lead to a WG with participation from OGC, jointly branded
... but we don't know

JeniT: to do what?

PhilA: maybe best practices
... but I don't know
... usually when a workshop starts, you have a pretty good idea about what's going to come out, but this time not
... there's lots of people going to be there from a number of communities
... if something else comes up that we should be looking at, please tell us
... there are chairs with nothing to do!
... please tell me if this call isn't useful
... I propose we do these monthly
... I suggest 9am Boston time every month
... if they were scheduled every other week with the expectation that they were frequently cancelled, is that more useful than every month