Chatlog 2010-05-06 Data Catalog Vocabulary

From W3C eGovernment Wiki
Jump to: navigation, search

See CommonScribe Control Panel, original RRSAgent log and preview nicely formatted version.

Please justify/explain all edits to this page, in your "edit summary" text.

<sandro> Guest: Niklas (lindstream) Lindström
14:43:50 <RRSAgent> RRSAgent has joined #egov
14:43:50 <RRSAgent> logging to
14:43:55 <sandro> rrsagent, make log public
14:44:03 <sandro> zakim, this will be egov
14:44:03 <Zakim> ok, sandro; I see T&S_EGOV(LD TECH)10:00AM scheduled to start 44 minutes ago
14:44:12 <sandro> chair: cygri
14:44:15 <sandro> scribe: sandro
14:44:19 <sandro> meeting: DCAT
14:44:44 <sandro> Agenda:
14:45:11 <sandro> sandro has changed the topic to: DCAT 6 May,
14:58:00 <Zakim> T&S_EGOV(LD TECH)10:00AM has now started
14:58:09 <Zakim> +Sandro
14:59:43 <Zakim> +[IPcaller]
15:00:00 <edsu> zakim, IPcaller is edsu
15:00:03 <Zakim> +edsu; got it
15:00:15 <lindstream> lindstream has joined #egov
15:00:37 <fadi> fadi has joined #egov
15:01:07 <jonphipps> jonphipps has joined #egov
15:02:57 <Zakim> +jonphipps
15:04:18 <Zakim> +cygri
15:04:22 <Zakim> +??P7
15:05:20 <sandro> zakim, ??P7 is lindstream
15:05:20 <Zakim> +lindstream; got it
15:05:23 <kate_geyer> kate_geyer has joined #egov
15:07:40 <sandro> zakim, who is on the call?
15:07:40 <Zakim> On the phone I see Sandro, edsu, jonphipps, cygri, lindstream
15:08:04 <cygri> zakim, fadi is with me
15:08:04 <Zakim> +fadi; got it
15:09:30 <LuigiMontanez> sorry, we will be joining shortly
15:10:17 <LuigiMontanez_> LuigiMontanez_ has joined #egov
15:11:17 <Zakim> +LuigiMontanez
15:12:26 <cygri> zakim, who is here?
15:12:26 <Zakim> On the phone I see Sandro, edsu, jonphipps, cygri, lindstream, LuigiMontanez
15:12:29 <Zakim> cygri has cygri, fadi
15:12:30 <Zakim> On IRC I see LuigiMontanez_, kate_geyer, jonphipps, fadi, lindstream, RRSAgent, Zakim, cygri, LuigiMontanez, hughb, ww, edsu, sandro, trackbot
15:12:46 <sandro> topic: Admin
15:13:01 <sandro> cygri: Looks like no one new today.     
15:13:27 <sandro> ... fourth meeting.     We need to be having scribes.   Sandro's doing again today, then Fadi will do it, then please we need volunteers.
15:13:40 <sandro> ... please get comfortable with the idea of scribing.
15:14:04 <DavidJames> DavidJames has joined #egov
15:14:28 <sandro> ... Three things today: use cases and requirements;  scheduling some more presentations; namespace for rdf vocab
15:14:53 <sandro> ... (and maybe name -- all lower case, all upper, etc.)
15:15:04 <sandro> topic: Scheduling presentations of relevant related work 
15:15:42 <sandro> cygri: Questions around synchronization and notification-of-updates.    seems to be in scope, some solution is required, so we should hear from people doing work in this area.
15:16:03 <sandro> ... maybe Dataset Dynamics (dady) group
15:16:18 <sandro> ... it'd be good to hear their thoughts.
15:16:52 <sandro> ... Also, we've been talking about using Atom for data catalogs.   Maybe we can map dcat into atom feeds, as surface syntax.
15:17:02 <sandro> ... Would it be for updates?  Or would it be the API?
15:17:30 <sandro> ... If the latter, then we'd need to use some Atom Extensions, like feed paging and feed archives and tombstones.
15:17:45 <sandro> ... It would be good to hear from someone familiar with these concepts
15:18:05 <sandro> ... RDFa 1.1 may also have some features that would make RDFa a more realistic options here.
15:18:08 <sandro> q+
15:18:51 <sandro> ... for the next 1-2 meetings we could have short presentations from folks working on this stuff.   Volunteers or candidates to present?
<sandro> Topic: Scope (rdf vocab?  active feeds?)
15:20:19 <sandro> sandro: I would hate to see dcat say you have to use atom -- these seem like orthogonal technologies.
15:20:50 <sandro> cygri: two things -- the dcat vocabulary (all sorts of ways to use and deploy)
15:21:03 <edsu> q+ to ask sandro about rdf serialization
15:21:38 <sandro> ... but also some need for specific recommendations on deploying dcat in practice; eg for federation of data catalogs.      Can't be done JUST by declaring a vocab.   Need more guidance.
15:22:17 <sandro> ... maybe the first outcome is the vocab, but to have really solved the problem we need more than just the vocab.    we'll need some guidance beyond that.
15:22:36 <sandro> ... I don't know if dady, sparql, atom, etc, has the right solution there.
15:23:05 <sandro> ... So I see that as the next step after teasing out the use cases and nailed down the vocabulary.
15:24:07 <sandro> sandro: I'm worried about getting lost in this space before nailing down the vocabulary, but yes, I see the need for guidance for the community.
15:24:29 <sandro> cygri: an RDF-only solutluon could be awkward for some of the publishers we care about.
15:24:38 <sandro> ... we have to strike a balance.
15:24:43 <sandro> +1 
15:24:47 <sandro> q/
15:24:50 <sandro> q?
15:24:53 <sandro> q-
15:24:54 <lindstream> q+ distiction between a dataset and a catalogue (and updates from them)
15:25:09 <sandro> queue=edsu, lindstream
15:25:25 <sandro> ack edsu 
15:26:08 <sandro> edsu: Sandro, would you be perfectly happy with the output of this group being an RDF vocabulary?
15:27:17 <sandro> sandro: Yes.     Sync is important, but needs a different community.
15:28:37 <sandro> edsu: I don't think RDF-only works.
15:29:40 <sandro> (i can't scribe what you just said ed)
15:30:15 <sandro> (because i can't figure out what you're talking about.)
15:30:55 <sandro> edsu; i agree that having an RDF vocabulary is important.     
15:30:57 <DavidJames> i have a question about the agenda (makes sense after this discussion)
15:32:41 <Zakim> -lindstream
15:32:44 <sandro> sandro; i think basing this on the RDF model is the best way to get interoperability.
15:33:13 <sandro> s/sandro;/sandro:/
15:33:39 <Zakim> +??P7
15:33:54 <DavidJames> I'm not sure if it made it on the agenda for today, but during the last call we had planned to vote to see if our group wanted to support
15:34:01 <sandro> zakim, ??P7 is lindstream 
15:34:01 <Zakim> +lindstream; got it
15:34:04 <sandro> ack lindstream 
15:34:45 <sandro> lindstream: relationship between dcat and datasets in general; how it relates to void and dady
15:35:07 <sandro> cygri: It would be great to have dady folks tell us about it.    
15:35:35 <sandro> ... in terms of void -- that's about descibing RDF datasets, which dcat is for describing government data catalogs.
15:36:16 <sandro> ... I'm talking to void folks about how to make this relationship clearer
15:36:17 <sandro> q?
15:37:17 <sandro> lindstream: If you use dcat to describe a dataset you're creating a new (RDF) dataset, which you could describe with VOID.
15:37:35 <sandro> ... so that could be used to discover how a data catalog is updated.
15:38:08 <sandro> cygri: Yeah, important to get dady person on call.
15:38:13 <sandro> ack DavidJames 
15:38:40 <sandro> DavidJames: i was hoping for a vote count on the requirements
15:38:56 <sandro> +1 to talking about requirements (instead of arguing about RDF)  :-)
15:39:54 <sandro> cygri: I thought we had consensus on the deliverables.
15:40:07 <sandro> sandro: It can be nice to have +1 on the record 
15:40:14 <cygri>
15:40:29 <lindstream>  also: I think void *can* describe basically any dataset (since any dataset *could* be expressed in rdf. not necessarily *should*.)
15:41:20 <edsu> DavidJames: thanks for bringing that back up, it does help to vote on this stuff
15:41:31 <lindstream> +q
15:41:33 <sandro> (pause for people to look at )
15:42:07 <cygri> ack lindstream
15:42:20 <edsu> q+ about api
15:42:31 <sandro> lindstream: A Data Catalog API -- would that be *suggested*, one of many possible ones?
15:42:34 <DavidJames> I generally like the statement of deliverables; however the part of "data catalog API" is a little vague
15:43:00 <sandro> cygri: Maybe we want to document multiple ways?   Atom + RDF ?
15:43:11 <sandro> cygri: can we agree on good way of doing it?
15:43:30 <sandro> q?
15:43:43 <sandro> queue=edsu
15:44:30 <DavidJames> I would recommend that we start with the first two points, those are clear to be. And I think they are easy to understand. Those would be good first steps.
15:44:36 <sandro> edsu: The word "API" has bad connotations for me.
15:44:38 <sandro> +1 DavidJames 
15:44:46 <DavidJames> I meant "clear to me" instead of "clear to be". sorry
15:45:14 <lindstream>  suggestion: Data Catalog "REST practise(s)"?
15:45:22 <sandro> edsu: Because we're using the Web, and the that's kind of the API
15:45:47 <DavidJames> I agree with edsu in the sense that "API" is a little misleading of a term here
15:45:53 <sandro> edsu: the fourth deliverable, the Resource Guide, those would be good to have, but don't need to have the full wait.
15:46:05 <sandro> edsu: Let's remove DC APIs from that list.
15:46:06 <DavidJames> I agree with removing Data Catalog API from the list
15:46:10 <sandro> +1 remove it
15:46:36 <sandro> edsu: As long as tutorials includes expressing dcat in RDFa, JSON, etc.
15:47:09 <lindstream> we need a "conceptual hub" for the practises. whatever that is...
15:47:16 <sandro> cygri: Value in these options.   If we declare 10 ways of doing dcat, then that could be a problem.
15:47:41 <sandro> cygri: maybe we can reduce this to two options, we'd be doing a favor to users.
15:48:38 <sandro> edsu: For developers, the more choices, the less likely to figure out the best way to do it.
15:49:04 <sandro> edsu: On the other hand, if all choices are RDF, maybe folks will lose interest.
15:50:00 <sandro> sandro: How about we agree UCR, Schema, and RG are three good deliverables to start with?
15:50:25 <sandro> q- edsu 
15:50:47 <sandro> cygri: For API, I was thinking of some particular set of serialization formats and approaches to notifications/feeds.
15:51:16 <DavidJames> cygri: does your bullet point about API refers to this snippet? "The central catalog must somehow be able to discover newly published datasets on an agency's web site, e.g., by crawling or by receiving an automated notification from the agency. There also has to be a way of notifying about changes to the metadata."
15:51:16 <lindstream> q+ for the orthogonal pieces, see e.g.
15:51:20 <sandro> cygri: A well-defined and well-documented way to figured out what's in the catalog and how to track its updates.
15:51:36 <sandro> s/cygri:/cygri,/
15:52:02 <sandro> edsu: Isn't GET the API you need?
15:52:31 <sandro> DavidJames: I like the distributed processing sections -- the central catalog must be able to discover newly published datasets and learn about changes to the data.
15:52:50 <sandro> cygri: The "notification" bit requires more than just GET.
15:53:17 <sandro> cygri: which datasets changed today
15:53:26 <sandro> cygri: Incremental updates.
15:53:35 <DavidJames> right, i think this is important
15:53:37 <sandro> cygri: feed, ping, something.
15:53:40 <edsu> ok, i'm with you there
15:53:51 <sandro> cygri: That goes beyond just an RDF vocab.
15:53:52 <sandro> q?
15:53:56 <sandro> ack DavidJames 
15:54:08 <sandro> DavidJames: i do like that we talk about that problem and try to reach consensus on it.
15:54:22 <sandro> ... I'm okay with nailing down the first points, then returning to this.
15:54:24 <jonphipps> wrt publish/subscribe and notification/distribution methods there are quite a few APIs already out there
15:54:28 <sandro> +1 nail down the first two first.
15:54:31 <cygri> q?
15:55:02 <sandro> lindstream: I agree, nail first two down first.   It's a very general topic for anyone publishing data on the web.
15:55:14 <edsu> q? suggest we should update bullet point 3 to talk about updates/synchronization instead of API
15:55:27 <sandro> lindstream: The question we should come back to:  should dcat define this practice?   or list some practices, maybe recommend them?>
15:56:14 <sandro> cygri: In a perfect world, I imagaine we'd figure out (ah ha!) there is an easy mechanism that uses established technologies we can just employ to meet our requirements.   if you use this, you'll be fine.
15:56:30 <sandro> cygri: It may be we need to invent/define these bits, though.
15:56:46 <sandro> ... I'd prefer not to define new stuff beyond the vocab.
15:56:52 <sandro> q?
15:56:55 <sandro> ack lindstream 
15:56:55 <Zakim> lindstream, you wanted to discuss the orthogonal pieces, see e.g.
15:57:17 <sandro> q+ to say I'd object to spec'ing new protocols or formats
15:57:34 <sandro> cygri: here's one way of doing it....
15:58:33 <sandro> lindstream: From my perspective, about timelines, we used that for legal info system.   Dead simple.  Atom, ... tombstones, most entries contain RDF.
15:58:36 <sandro> q-
15:58:47 <sandro> cygri: Sounds great.
15:59:14 <sandro> cygri: do you know how this compares to what's going on in dady?
16:00:07 <lindstream> +1 extend
16:00:09 <sandro> +1
16:00:09 <edsu> +1
16:00:18 <DavidJames> +1
16:00:20 <cygri> +1
16:00:41 <sandro> q+
16:01:11 <sandro> q+ to propose we let feed issues wait until after we publish a vocab spec
16:02:01 <sandro> lindstream: we could ask them; I haven't compiled the options, just used Atom.
16:02:38 <sandro> cygri: maybe wait until we have our list of requirements.
16:03:35 <DavidJames> I agree with sandro
16:04:03 <lindstream> the vocab is the "what", the rest is the "how"
16:04:33 <DavidJames> To be more specific, I think we should get the vocabulary out first. We can always come back to changesets / updates / syndication
16:04:50 <kate_geyer> kate_geyer has left #egov
16:04:59 <sandro> edsu: People need these updates
16:05:25 <sandro> sandro: It can be a requirement; let's just do the vocab first.
16:06:18 <sandro> cygri: we can do the vocab independently, but we'll need to work with others on the sync protocol.
16:06:33 <sandro> +1 (to put it mildly)
16:06:59 <sandro> cygri: So maybe it makes sense to focus our work on the vocab, while having a conversation with other groups on syndications/notification/update.
16:07:48 <sandro> sandro: I think it would be a real mistake to have to solve the syndication problem before publishing the vocab.
16:07:54 <DavidJames> I think syndication / changesets are "In scope" for later. (as a matter of phasing as someone said)
16:07:56 <sandro> edsu: then why do we even need this call?
16:08:04 <lindstream> the "orthogonality" leakage is dcat:distribution ...
16:08:30 <sandro> q+ weekly call
16:08:36 <sandro> q?
16:08:51 <sandro> zakim?
16:09:36 <DavidJames> I think the vocabulary can be developed over the next several weeks. (i agree with sandro)
16:09:52 <DavidJames> a lot will happen offline, in email, etc.
16:10:28 <DavidJames> but i think the call will also useful
16:11:00 <sandro> DavidJames: Let's have a vote/statement.
16:13:10 <sandro> PROPOSED: We focus our work for now on the vocabulary, at least until we have it published, at the same time we should work on use cases and requirements in parallel.   Issues around feeds/notification are postponed a bit.    Deliverables for now are UCR and Schema.   When they're published, we'll revisit list of deliverables.
16:13:38 <sandro> +1
16:13:42 <cygri> +1
16:13:43 <DavidJames> +1
16:13:43 <fadi> +1
16:13:44 <lindstream> +1
16:13:44 <LuigiMontanez> +1
16:13:49 <jonphipps> +1
16:13:56 <edsu> abstain # yeah
16:14:13 <sandro> RESOLVED: We focus our work for now on the vocabulary, at least until we have it published, at the same time we should work on use cases and requirements in parallel.   Issues around feeds/notification are postponed a bit.    Deliverables for now are UCR and Schema.   When they're published, we'll revisit list of deliverables.
16:14:32 <sandro> lindstream: I do, absolutely, want to come back to these other two.
16:14:36 <DavidJames> i agree that we want to come back to the other deliverables (as do others)
16:14:37 <sandro> cygri: as do I.
16:15:19 <sandro> cygri: Everyone, please look again at UCR document, and comment on mailing list.
16:15:46 <DavidJames> thanks everybody
16:15:51 <LuigiMontanez> thanks cygri, sandro for transcribing
16:15:52 <sandro> ADJOURN
16:17:32 <DavidJames> DavidJames has joined #egov
16:17:40 <LuigiMontanez> LuigiMontanez has joined #egov
16:17:42 <lindstream> thanks all.
16:18:15 <lindstream> lindstream has left #egov