Chatlog 2010-05-06 Data Catalog Vocabulary
From W3C eGovernment Wiki
Please justify/explain all edits to this page, in your "edit summary" text.
<sandro> Guest: Niklas (lindstream) Lindström 14:43:50 <RRSAgent> RRSAgent has joined #egov 14:43:50 <RRSAgent> logging to http://www.w3.org/2010/05/06-egov-irc 14:43:55 <sandro> rrsagent, make log public 14:44:03 <sandro> zakim, this will be egov 14:44:03 <Zakim> ok, sandro; I see T&S_EGOV(LD TECH)10:00AM scheduled to start 44 minutes ago 14:44:12 <sandro> chair: cygri 14:44:15 <sandro> scribe: sandro 14:44:19 <sandro> meeting: DCAT 14:44:44 <sandro> Agenda: http://www.w3.org/egov/wiki/Data_Catalog_Vocabulary/2010-05-06 14:45:11 <sandro> sandro has changed the topic to: DCAT 6 May, http://www.w3.org/egov/wiki/Data_Catalog_Vocabulary/2010-05-06 14:58:00 <Zakim> T&S_EGOV(LD TECH)10:00AM has now started 14:58:09 <Zakim> +Sandro 14:59:43 <Zakim> +[IPcaller] 15:00:00 <edsu> zakim, IPcaller is edsu 15:00:03 <Zakim> +edsu; got it 15:00:15 <lindstream> lindstream has joined #egov 15:00:37 <fadi> fadi has joined #egov 15:01:07 <jonphipps> jonphipps has joined #egov 15:02:57 <Zakim> +jonphipps 15:04:18 <Zakim> +cygri 15:04:22 <Zakim> +??P7 15:05:20 <sandro> zakim, ??P7 is lindstream 15:05:20 <Zakim> +lindstream; got it 15:05:23 <kate_geyer> kate_geyer has joined #egov 15:07:40 <sandro> zakim, who is on the call? 15:07:40 <Zakim> On the phone I see Sandro, edsu, jonphipps, cygri, lindstream 15:08:04 <cygri> zakim, fadi is with me 15:08:04 <Zakim> +fadi; got it 15:09:30 <LuigiMontanez> sorry, we will be joining shortly 15:10:17 <LuigiMontanez_> LuigiMontanez_ has joined #egov 15:11:17 <Zakim> +LuigiMontanez 15:12:26 <cygri> zakim, who is here? 15:12:26 <Zakim> On the phone I see Sandro, edsu, jonphipps, cygri, lindstream, LuigiMontanez 15:12:29 <Zakim> cygri has cygri, fadi 15:12:30 <Zakim> On IRC I see LuigiMontanez_, kate_geyer, jonphipps, fadi, lindstream, RRSAgent, Zakim, cygri, LuigiMontanez, hughb, ww, edsu, sandro, trackbot 15:12:46 <sandro> topic: Admin 15:13:01 <sandro> cygri: Looks like no one new today. 15:13:27 <sandro> ... fourth meeting. We need to be having scribes. Sandro's doing again today, then Fadi will do it, then please we need volunteers. 15:13:40 <sandro> ... please get comfortable with the idea of scribing. 15:14:04 <DavidJames> DavidJames has joined #egov 15:14:28 <sandro> ... Three things today: use cases and requirements; scheduling some more presentations; namespace for rdf vocab 15:14:53 <sandro> ... (and maybe name -- all lower case, all upper, etc.) 15:15:04 <sandro> topic: Scheduling presentations of relevant related work 15:15:42 <sandro> cygri: Questions around synchronization and notification-of-updates. seems to be in scope, some solution is required, so we should hear from people doing work in this area. 15:16:03 <sandro> ... maybe Dataset Dynamics (dady) group 15:16:18 <sandro> ... it'd be good to hear their thoughts. 15:16:52 <sandro> ... Also, we've been talking about using Atom for data catalogs. Maybe we can map dcat into atom feeds, as surface syntax. 15:17:02 <sandro> ... Would it be for updates? Or would it be the API? 15:17:30 <sandro> ... If the latter, then we'd need to use some Atom Extensions, like feed paging and feed archives and tombstones. 15:17:45 <sandro> ... It would be good to hear from someone familiar with these concepts 15:18:05 <sandro> ... RDFa 1.1 may also have some features that would make RDFa a more realistic options here. 15:18:08 <sandro> q+ 15:18:51 <sandro> ... for the next 1-2 meetings we could have short presentations from folks working on this stuff. Volunteers or candidates to present? <sandro> Topic: Scope (rdf vocab? active feeds?) 15:20:19 <sandro> sandro: I would hate to see dcat say you have to use atom -- these seem like orthogonal technologies. 15:20:50 <sandro> cygri: two things -- the dcat vocabulary (all sorts of ways to use and deploy) 15:21:03 <edsu> q+ to ask sandro about rdf serialization 15:21:38 <sandro> ... but also some need for specific recommendations on deploying dcat in practice; eg for federation of data catalogs. Can't be done JUST by declaring a vocab. Need more guidance. 15:22:17 <sandro> ... maybe the first outcome is the vocab, but to have really solved the problem we need more than just the vocab. we'll need some guidance beyond that. 15:22:36 <sandro> ... I don't know if dady, sparql, atom, etc, has the right solution there. 15:23:05 <sandro> ... So I see that as the next step after teasing out the use cases and nailed down the vocabulary. 15:24:07 <sandro> sandro: I'm worried about getting lost in this space before nailing down the vocabulary, but yes, I see the need for guidance for the community. 15:24:29 <sandro> cygri: an RDF-only solutluon could be awkward for some of the publishers we care about. 15:24:38 <sandro> ... we have to strike a balance. 15:24:43 <sandro> +1 15:24:47 <sandro> q/ 15:24:50 <sandro> q? 15:24:53 <sandro> q- 15:24:54 <lindstream> q+ distiction between a dataset and a catalogue (and updates from them) 15:25:09 <sandro> queue=edsu, lindstream 15:25:25 <sandro> ack edsu 15:26:08 <sandro> edsu: Sandro, would you be perfectly happy with the output of this group being an RDF vocabulary? 15:27:17 <sandro> sandro: Yes. Sync is important, but needs a different community. 15:28:37 <sandro> edsu: I don't think RDF-only works. 15:29:40 <sandro> (i can't scribe what you just said ed) 15:30:15 <sandro> (because i can't figure out what you're talking about.) 15:30:55 <sandro> edsu; i agree that having an RDF vocabulary is important. 15:30:57 <DavidJames> i have a question about the agenda (makes sense after this discussion) 15:32:41 <Zakim> -lindstream 15:32:44 <sandro> sandro; i think basing this on the RDF model is the best way to get interoperability. 15:33:13 <sandro> s/sandro;/sandro:/ 15:33:39 <Zakim> +??P7 15:33:54 <DavidJames> I'm not sure if it made it on the agenda for today, but during the last call we had planned to vote to see if our group wanted to support http://lists.w3.org/Archives/Public/public-egov-ig/2010Apr/0083.html 15:34:01 <sandro> zakim, ??P7 is lindstream 15:34:01 <Zakim> +lindstream; got it 15:34:04 <sandro> ack lindstream 15:34:45 <sandro> lindstream: relationship between dcat and datasets in general; how it relates to void and dady 15:35:07 <sandro> cygri: It would be great to have dady folks tell us about it. 15:35:35 <sandro> ... in terms of void -- that's about descibing RDF datasets, which dcat is for describing government data catalogs. 15:36:16 <sandro> ... I'm talking to void folks about how to make this relationship clearer 15:36:17 <sandro> q? 15:37:17 <sandro> lindstream: If you use dcat to describe a dataset you're creating a new (RDF) dataset, which you could describe with VOID. 15:37:35 <sandro> ... so that could be used to discover how a data catalog is updated. 15:38:08 <sandro> cygri: Yeah, important to get dady person on call. 15:38:13 <sandro> ack DavidJames 15:38:40 <sandro> DavidJames: i was hoping for a vote count on the requirements 15:38:56 <sandro> +1 to talking about requirements (instead of arguing about RDF) :-) 15:39:54 <sandro> cygri: I thought we had consensus on the deliverables. 15:40:07 <sandro> sandro: It can be nice to have +1 on the record 15:40:14 <cygri> http://www.w3.org/egov/wiki/Data_Catalog_Vocabulary#Deliverables 15:40:29 <lindstream> also: I think void *can* describe basically any dataset (since any dataset *could* be expressed in rdf. not necessarily *should*.) 15:41:20 <edsu> DavidJames: thanks for bringing that back up, it does help to vote on this stuff 15:41:31 <lindstream> +q 15:41:33 <sandro> (pause for people to look at http://www.w3.org/egov/wiki/Data_Catalog_Vocabulary#Deliverables ) 15:42:07 <cygri> ack lindstream 15:42:20 <edsu> q+ about api 15:42:31 <sandro> lindstream: A Data Catalog API -- would that be *suggested*, one of many possible ones? 15:42:34 <DavidJames> I generally like the statement of deliverables; however the part of "data catalog API" is a little vague 15:43:00 <sandro> cygri: Maybe we want to document multiple ways? Atom + RDF ? 15:43:11 <sandro> cygri: can we agree on good way of doing it? 15:43:30 <sandro> q? 15:43:43 <sandro> queue=edsu 15:44:30 <DavidJames> I would recommend that we start with the first two points, those are clear to be. And I think they are easy to understand. Those would be good first steps. 15:44:36 <sandro> edsu: The word "API" has bad connotations for me. 15:44:38 <sandro> +1 DavidJames 15:44:46 <DavidJames> I meant "clear to me" instead of "clear to be". sorry 15:45:14 <lindstream> suggestion: Data Catalog "REST practise(s)"? 15:45:22 <sandro> edsu: Because we're using the Web, and the that's kind of the API 15:45:47 <DavidJames> I agree with edsu in the sense that "API" is a little misleading of a term here 15:45:53 <sandro> edsu: the fourth deliverable, the Resource Guide, those would be good to have, but don't need to have the full wait. 15:46:05 <sandro> edsu: Let's remove DC APIs from that list. 15:46:06 <DavidJames> I agree with removing Data Catalog API from the list 15:46:10 <sandro> +1 remove it 15:46:36 <sandro> edsu: As long as tutorials includes expressing dcat in RDFa, JSON, etc. 15:47:09 <lindstream> we need a "conceptual hub" for the practises. whatever that is... 15:47:16 <sandro> cygri: Value in these options. If we declare 10 ways of doing dcat, then that could be a problem. 15:47:41 <sandro> cygri: maybe we can reduce this to two options, we'd be doing a favor to users. 15:48:38 <sandro> edsu: For developers, the more choices, the less likely to figure out the best way to do it. 15:49:04 <sandro> edsu: On the other hand, if all choices are RDF, maybe folks will lose interest. 15:50:00 <sandro> sandro: How about we agree UCR, Schema, and RG are three good deliverables to start with? 15:50:25 <sandro> q- edsu 15:50:47 <sandro> cygri: For API, I was thinking of some particular set of serialization formats and approaches to notifications/feeds. 15:51:16 <DavidJames> cygri: does your bullet point about API refers to this snippet? "The central catalog must somehow be able to discover newly published datasets on an agency's web site, e.g., by crawling or by receiving an automated notification from the agency. There also has to be a way of notifying about changes to the metadata." 15:51:16 <lindstream> q+ for the orthogonal pieces, see e.g. http://code.google.com/p/court/wiki/Timelines 15:51:20 <sandro> cygri: A well-defined and well-documented way to figured out what's in the catalog and how to track its updates. 15:51:36 <sandro> s/cygri:/cygri,/ 15:52:02 <sandro> edsu: Isn't GET the API you need? 15:52:31 <sandro> DavidJames: I like the distributed processing sections -- the central catalog must be able to discover newly published datasets and learn about changes to the data. 15:52:50 <sandro> cygri: The "notification" bit requires more than just GET. 15:53:17 <sandro> cygri: which datasets changed today 15:53:26 <sandro> cygri: Incremental updates. 15:53:35 <DavidJames> right, i think this is important 15:53:37 <sandro> cygri: feed, ping, something. 15:53:40 <edsu> ok, i'm with you there 15:53:51 <sandro> cygri: That goes beyond just an RDF vocab. 15:53:52 <sandro> q? 15:53:56 <sandro> ack DavidJames 15:54:08 <sandro> DavidJames: i do like that we talk about that problem and try to reach consensus on it. 15:54:22 <sandro> ... I'm okay with nailing down the first points, then returning to this. 15:54:24 <jonphipps> wrt publish/subscribe and notification/distribution methods there are quite a few APIs already out there 15:54:28 <sandro> +1 nail down the first two first. 15:54:31 <cygri> q? 15:55:02 <sandro> lindstream: I agree, nail first two down first. It's a very general topic for anyone publishing data on the web. 15:55:14 <edsu> q? suggest we should update bullet point 3 to talk about updates/synchronization instead of API 15:55:27 <sandro> lindstream: The question we should come back to: should dcat define this practice? or list some practices, maybe recommend them?> 15:56:14 <sandro> cygri: In a perfect world, I imagaine we'd figure out (ah ha!) there is an easy mechanism that uses established technologies we can just employ to meet our requirements. if you use this, you'll be fine. 15:56:30 <sandro> cygri: It may be we need to invent/define these bits, though. 15:56:46 <sandro> ... I'd prefer not to define new stuff beyond the vocab. 15:56:52 <sandro> q? 15:56:55 <sandro> ack lindstream 15:56:55 <Zakim> lindstream, you wanted to discuss the orthogonal pieces, see e.g. http://code.google.com/p/court/wiki/Timelines 15:57:17 <sandro> q+ to say I'd object to spec'ing new protocols or formats 15:57:34 <sandro> cygri: here's one way of doing it.... 15:58:33 <sandro> lindstream: From my perspective, about timelines, we used that for legal info system. Dead simple. Atom, ... tombstones, most entries contain RDF. 15:58:36 <sandro> q- 15:58:47 <sandro> cygri: Sounds great. 15:59:14 <sandro> cygri: do you know how this compares to what's going on in dady? 16:00:07 <lindstream> +1 extend 16:00:09 <sandro> +1 16:00:09 <edsu> +1 16:00:18 <DavidJames> +1 16:00:20 <cygri> +1 16:00:41 <sandro> q+ 16:01:11 <sandro> q+ to propose we let feed issues wait until after we publish a vocab spec 16:02:01 <sandro> lindstream: we could ask them; I haven't compiled the options, just used Atom. 16:02:38 <sandro> cygri: maybe wait until we have our list of requirements. 16:03:35 <DavidJames> I agree with sandro 16:04:03 <lindstream> the vocab is the "what", the rest is the "how" 16:04:33 <DavidJames> To be more specific, I think we should get the vocabulary out first. We can always come back to changesets / updates / syndication 16:04:50 <kate_geyer> kate_geyer has left #egov 16:04:59 <sandro> edsu: People need these updates 16:05:25 <sandro> sandro: It can be a requirement; let's just do the vocab first. 16:06:18 <sandro> cygri: we can do the vocab independently, but we'll need to work with others on the sync protocol. 16:06:33 <sandro> +1 (to put it mildly) 16:06:59 <sandro> cygri: So maybe it makes sense to focus our work on the vocab, while having a conversation with other groups on syndications/notification/update. 16:07:48 <sandro> sandro: I think it would be a real mistake to have to solve the syndication problem before publishing the vocab. 16:07:54 <DavidJames> I think syndication / changesets are "In scope" for later. (as a matter of phasing as someone said) 16:07:56 <sandro> edsu: then why do we even need this call? 16:08:04 <lindstream> the "orthogonality" leakage is dcat:distribution ... 16:08:30 <sandro> q+ weekly call 16:08:36 <sandro> q? 16:08:51 <sandro> zakim? 16:09:36 <DavidJames> I think the vocabulary can be developed over the next several weeks. (i agree with sandro) 16:09:52 <DavidJames> a lot will happen offline, in email, etc. 16:10:28 <DavidJames> but i think the call will also useful 16:11:00 <sandro> DavidJames: Let's have a vote/statement. 16:13:10 <sandro> PROPOSED: We focus our work for now on the vocabulary, at least until we have it published, at the same time we should work on use cases and requirements in parallel. Issues around feeds/notification are postponed a bit. Deliverables for now are UCR and Schema. When they're published, we'll revisit list of deliverables. 16:13:38 <sandro> +1 16:13:42 <cygri> +1 16:13:43 <DavidJames> +1 16:13:43 <fadi> +1 16:13:44 <lindstream> +1 16:13:44 <LuigiMontanez> +1 16:13:49 <jonphipps> +1 16:13:56 <edsu> abstain # yeah 16:14:13 <sandro> RESOLVED: We focus our work for now on the vocabulary, at least until we have it published, at the same time we should work on use cases and requirements in parallel. Issues around feeds/notification are postponed a bit. Deliverables for now are UCR and Schema. When they're published, we'll revisit list of deliverables. 16:14:32 <sandro> lindstream: I do, absolutely, want to come back to these other two. 16:14:36 <DavidJames> i agree that we want to come back to the other deliverables (as do others) 16:14:37 <sandro> cygri: as do I. 16:15:19 <sandro> cygri: Everyone, please look again at UCR document, and comment on mailing list. 16:15:46 <DavidJames> thanks everybody 16:15:51 <LuigiMontanez> thanks cygri, sandro for transcribing 16:15:52 <sandro> ADJOURN 16:17:32 <DavidJames> DavidJames has joined #egov 16:17:40 <LuigiMontanez> LuigiMontanez has joined #egov 16:17:42 <lindstream> thanks all. 16:18:15 <lindstream> lindstream has left #egov # SPECIAL MARKER FOR CHATSYNC. DO NOT EDIT THIS LINE OR BELOW. SRCLINESUSED=00000218