09:21:28 RRSAgent has joined #dwbpbestpractices 09:21:28 logging to http://www.w3.org/2014/04/01-dwbpbestpractices-irc 09:21:29 Scribe: Caroline 09:22:06 Vagner_Br has joined #dwbpbestpractices 09:22:26 BernadetteLoscio_: do we fo trhough the list or we choose some? 09:23:22 Ig_Bittencourt: let's start with the subjects related only with BP. If we have time to discuss the others that are also related with Q&G we will do it later 09:23:57 BernadetteLoscio_: We can skip metadata since we have discussed it yesterday 09:24:02 I agree 09:24:14 laufer_: we can check what we have written about it 09:24:23 long discussion about it yesterday 09:24:40 BernadetteLoscio_: check https://docs.google.com/spreadsheet/ccc?key=0AhTZf3B9yQ3odGVvU3pBazFsY3pyUVppNDFSZGtyQkE&usp=sharing#gid=5 09:25:38 which tab are we looking at? 09:25:49 the one above 09:25:55 sorry 09:26:03 this one https://docs.google.com/spreadsheet/ccc?key=0AhTZf3B9yQ3odGVvU3pBazFsY3pyUVppNDFSZGtyQkE&usp=sharing#gid=6 09:26:20 laufer_: how can we define real time? 09:26:47 ... if we have an update time do we have the old data archived? 09:27:01 to me a challenge seems to be that data is often about phenomena in reality which change 09:27:13 so data may be added or change to reflect that 09:27:16 q+ 09:27:24 ack BernadetteLoscio_ 09:27:33 I can't hear you guys on the hang out 09:27:34 q+ 09:27:34 BernadetteLoscio_: real time 09:27:55 ... if we have a dataset we can have it in a catalogue, it might be in an API 09:28:06 laufer_: we can have this in dataset that are not in real time 09:28:16 q+ to say bp should be just provide data in a timely manner and then elaborate defs or whatever on that basis 09:28:18 ... we have a close time to update the data 09:28:55 now we can hear!!! 09:28:58 q+ to say that metadata should express frequency of updates, timestamp of last update 09:29:06 Zakim, who`s in the queue 09:29:06 I don't understand 'who`s in the queue', Caroline 09:29:13 BrianMatthews has joined #dwbpbestpractices 09:29:17 q? 09:29:29 +1 to CarlosIglesias view. 09:29:36 According to Wikipedia...Real Time Data: Real-time data denotes information that is delivered immediately after collection. There is no delay in the timeliness of the information provided. Real-time data is often used for navigation or tracking. 09:29:37 BernadetteLoscio_: real time is to update the data 09:29:49 markharrison_: very real time can be observation data 09:29:50 gatemezi has joined #dwbpbestpractices 09:30:21 There are two issues: the data/time when the observation was made 09:30:29 The time it takes for data to reach the intended audience 09:31:02 If the the time if takes for data to reach the intended audience (from observation) is known, then, the data/time when the observation was made can be derived 09:31:26 laufer_: in the position of the consumer: I need data. Some data with one week of update is okay 09:31:34 To = the data/time when the observation was made 09:31:36 ... they say they will update weekyl 09:31:44 q? 09:31:46 why this a best practice? 09:31:48 Td = The time it takes for data to reach the intended audience 09:32:01 The problem is that Td is usually unknown 09:32:08 ... if the data is 3 weeks without update could be a problem 09:32:21 It all depend on the type of data... 09:32:23 So, the best practice is how to deal with the fact that we may need to know To 09:32:27 ... if it is olny one data set we must garantee this data set is updated 09:32:35 yes, agree gatemezi 09:32:37 BernadetteLoscio_: this is one requirement 09:33:01 q+ 09:33:05 q+ to say that in addition to specifying expected update frequency for data, there is an (SLA) expectation to honour that update frequency 09:33:15 q+ 09:33:18 ... the question is: one of the use cases is on real time. 09:33:19 Data may have to be indexed in time 09:33:33 q? 09:33:37 ... this data can be available 09:33:46 ack CarlosIglesias 09:33:46 CarlosIglesias, you wanted to say bp should be just provide data in a timely manner and then elaborate defs or whatever on that basis 09:33:47 ack me 09:33:56 someone is making noise 09:34:02 ack laufer 09:34:05 In weather domain, you might need more frequent update (10 minutes?) , while geodata for districts can be updated each year ?§ 09:34:09 CarlosIglesias: we should try to define what best practices are 09:34:19 zakim, who ismaking noise? 09:34:19 I don't understand your question, Vagner_Br. 09:34:22 ... defining what is real time 09:34:40 Zakim can't track the sound of hangout 09:34:44 http://sunlightfoundation.com/policy/documents/ten-open-data-principles/ 09:35:06 ... if you have real time data you must update it 09:35:18 .... some data have vaule only on real 09:35:51 q+ 09:35:52 ... we could have at least the titles of the best practices 09:36:22 ack markharrison_ 09:36:22 markharrison_, you wanted to say that metadata should express frequency of updates, timestamp of last update and to say that in addition to specifying expected update frequency for 09:36:25 ... data, there is an (SLA) expectation to honour that update frequency 09:36:35 markharrison_: the metadata should express the frequency 09:36:44 q- 09:36:52 ... the expectation to provide the data with some frequency 09:37:01 markharrison +1 09:37:11 yes! 09:37:30 +1 markharrison_ 09:37:34 Data is a requirement: data may have to be indexed in time, in order to cope with the fact that we do not know Td 09:37:36 q? 09:37:57 ack Vagner_Br 09:38:06 Vagner_Br: I want to support CarlosIglesias 09:38:31 ... and to add that in terms of requirements the point is that data and metadata should be available 09:38:45 ... we should change the title 09:39:25 BrianMatthews: the data should be availave in a determined frequency 09:39:26 I meant "that is a requirement" 09:39:28 Is it possible to have metadata without the data is about? 09:39:45 laufer_: so the publisher has the obligation to do it on time 09:39:50 ack Ig_Bittencourt 09:40:11 Useful to declare update frequency in metadata to avoid the need to poll more frequently than the update frequency 09:40:15 Ig_Bittencourt: if you have data from the stock market you must update it in every 5min 09:40:50 q+ 09:41:09 ... in this case, the best practices should stablished that the publisher should release the data with a certain frequency 09:41:13 s/data and metadata should be available/data and metadata should be available in a timely manner/ 09:41:22 but when you upadate you may: (i) add new time-indexed entries or (ii) change the content of data [no time-indexing] 09:41:32 these are two different approaches 09:41:52 CarlosIglesias: we can write also general best practices regarding this issue 09:41:54 q+ to ask about support for standing query (publish/subscribe) capabilities for streaming data feeds? 09:42:08 q? 09:42:50 ack gatemezi 09:42:52 gatemezi: Is it possible to have metadata without the data is about? 09:42:57 ack JoaoPauloAlmeida 09:43:06 JoaoPauloAlmeida: but when you upadate you may: (i) add new time-indexed entries or (ii) change the content of data [no time-indexing] 09:43:16 ... these are two different approaches 09:43:24 laufer_: I think this is an issue of archiving 09:43:34 ... you must mantain the old data 09:44:13 q+ to respond to laufer - it depends whether the dataset is journalled or not - the metadata should declare whether the data is journalled (and time-stamped) 09:44:25 q? 09:44:35 ack BrianMatthews 09:44:47 BrianMatthews: I don't think we need to worry about the tec are being used 09:44:54 ... we should stick to the method 09:45:08 ... what policy and frequency the publisher 09:45:14 +1 to BrianMatthews point 09:45:18 ... not worry about tec 09:45:21 q? 09:45:27 ack markharrison_ 09:45:27 markharrison_, you wanted to ask about support for standing query (publish/subscribe) capabilities for streaming data feeds? and to respond to laufer - it depends whether the 09:45:31 ... dataset is journalled or not - the metadata should declare whether the data is journalled (and time-stamped) 09:45:41 markharrison_: we never delete anything regarding data 09:45:54 laufer_: we now have to make a sentence summaring all this 09:46:13 markharrison_, if the data is not time-indexed then we have to delete! 09:46:16 q? 09:46:27 so, there should be best practices for time-indexing, this is my point 09:46:48 CarlosIglesias: we could make an action to people detail it 09:47:42 +1 João Paulo 09:48:06 ok, nice 09:48:40 PROPOSAL: Metadata should declare 1) expected/scheduled frequency of update, 2) if the dataset is journalled (i.e. no deletions, only append), 3) if the dataset it timestamped (can request data for a specific time interval), 4) actual timestamp of last update 09:49:22 +1 09:50:06 +1 09:50:13 s/it timestamped/is timestamped/ 09:50:28 Just to understand the 2) point..you mean adding in a different URI? 09:50:39 This is a good proposal, I think we should just also note that there should be guidelines/best practices for the specification of time 09:51:00 ack markharrison_ 09:51:01 ..when you "append"... 09:51:17 if you point the laptop to the person with the floor, it will help us a lot (sorry to ask you guys that) 09:51:24 markharrison_: ir doesn't change 09:51:37 thanks 09:51:54 No just to understand laufer_ 09:51:58 gatemezi: URI / access method for the dataset should not change, in my opinion 09:52:14 laufer_: we are not saying how the data will be provided to the consumer 09:52:17 because imagine you were already consuming data in time to 09:52:25 ... you can have an URI or an API 09:52:33 ok 09:52:35 ... we don't know how the publisher will define the scheme 09:52:49 +1 then 09:52:52 +1 09:52:54 +1 09:52:58 +1 09:53:02 +1 09:53:07 +1 09:53:08 +1 09:53:09 +1 09:53:16 +1 09:53:19 +1 09:53:54 Why we don't use wiki for this? 09:54:29 Ok Carol, I understand 09:54:47 https://docs.google.com/spreadsheet/ccc?key=0AhTZf3B9yQ3odGVvU3pBazFsY3pyUVppNDFSZGtyQkE&usp=sharing#gid=6 09:55:03 it is in the Group Challenges 09:55:13 Caroline: Can someone put on the wiki later? 09:55:20 RESOLVED: Metadata should declare 1) expected/scheduled frequency of update, 2) if the dataset is journalled (i.e. no deletions, only append), 3) if the dataset it timestamped (can request data for a specific time interval), 4) actual timestamp of last update 09:55:24 I can put 09:56:09 ACTION: nathalia will put RESOLVED 1 on the Wiki (RESOLVED: Metadata should declare 1) expected/scheduled frequency of update, 2) if the dataset is journalled (i.e. no deletions, only append), 3) if the dataset it timestamped (can request data for a specific time interval), 4) actual timestamp of last update) 09:58:50 Caroline: lets talk about "tools" 09:59:07 I'm not seeing you 09:59:13 Vagner_Br: who could explain better what is the idea about "tolls" 09:59:29 TOPIC: Tools 09:59:52 laufer_: we must look at the use cases to understand what are tools 09:59:57 s/tolls/tools 10:00:01 the camera is looking to the roof 10:00:22 much better now 10:00:40 Ig_Bittencourt: Berna, can you explain about the tools? 10:00:55 scribe: Caroline 10:01:12 Bernadete: is related to skill and expertise 10:01:23 Vagner_Br: when you talk about tools as a challenge 10:01:33 ... how can we generalize that as a challenge? 10:01:44 Bernadette: I saw this in the use case from Recife 10:01:51 q+ 10:01:54 laufer_: NYC uses Socrata for example 10:02:00 Vagner_Br: tools about catalogue? 10:02:06 Bernadette: in general 10:02:19 CarlosIglesias: provide a single or centralized access point for the data 10:02:27 ... could be a ckan catalogue 10:02:37 ,,, or another kind 10:02:52 ... matching access to data 10:03:15 s/,,,/... 10:03:30 Bernadette: what could be a best practices for this 10:03:34 q? 10:03:39 ack Ig_Bittencourt 10:03:51 Ig_Bittencourt: we should be agnostic 10:04:12 ... a question on documentation: if we have components with APIs, we should provide documentation 10:04:15 q? 10:04:22 ack laufer_ 10:04:37 q+ 10:04:42 laufer_: the publisher might have a best practice to publish the data 10:04:58 ... he must choose a tool that can do what the publisher wants 10:05:28 ... the choise of the toll will remain on what the publisher wants 10:05:39 .. if there is a tool that can do what he or she expects 10:05:55 ... people who have excell needs a kind of tool 10:06:03 I am wondering if we should recommend a tool here... 10:06:21 ... if the pulisher thinks that tool is good, we can recommend what is the best practice to choose a tool 10:06:23 q+ 10:06:31 No gatemezi. According to the charter, we need to be agnostic. 10:06:43 laufer_: I think we have to think about the consumer and the publisher 10:07:05 ... the publisher has to choose a tool that can do what we want 10:07:09 ack Vagner_Br 10:07:24 Vagner_Br: I agree with Ig_Bittencourt and laufer_ 10:07:35 agree with Laufer 10:07:38 ...it is hard to find any kind of requirements 10:07:41 q+ 10:07:52 so maybe we can skip this and come back later ? 10:07:53 ... even if we must be agnostic 10:08:12 ... if we say any kind of tools you use should be well documented 10:08:20 q+ to say the tool is just a mean, the bp is provide centralized access to data 10:08:24 ... if you don't want to say that just drop this topic 10:08:44 q? 10:09:00 A tool can also be an implementation of our bp 10:09:12 Caroline: we should try to resolve this 10:09:20 q+ to say that Tools are very useful for helping data publishers to check that translation of data into different formats retain their meaning 10:09:22 ... at least to have a way to go 10:09:25 ack BrianMatthews 10:09:46 BrianMatthews: we can write a recommendation about a method description about a dataset 10:09:59 .... we can say a data said could publish a data description 10:10:12 I can't hear a thing 10:10:13 laufer_: this is a requirement of the publisher 10:10:17 s/data said/dataset 10:10:26 ... the tool he will choose it will understand what he needs 10:10:33 +1 João Paulo 10:10:43 ... we can make a recommendation that he should use 10:11:06 ... you should interoperate with the tools in a standard way 10:11:25 q+ to compare this discussion with the a11y use case 10:11:32 ack caroline 10:11:38 Caroline: has talked already 10:11:59 q? 10:12:09 ack CarlosIglesias 10:12:09 CarlosIglesias, you wanted to say the tool is just a mean, the bp is provide centralized access to data and to compare this discussion with the a11y use case 10:12:09 ack me 10:12:49 CarlosIglesias: the best practices does not require any tool 10:13:14 ... sometimes it will be an API, sometimes data catalogue, or another thing 10:13:16 BrianMatthews, you meant we could make a recommendation about metadata about the tool used to publish the data? 10:13:46 ... the BP could provide different set of tool 10:13:52 s/tool/tools 10:14:17 .. we could do something similiar to?? 10:14:34 ... one one side you have a content for a city guidance 10:14:57 ... on the other side after creating it we can have a data tools guidelines 10:15:44 q? 10:15:49 ack markharrison_ 10:15:49 markharrison_, you wanted to say that Tools are very useful for helping data publishers to check that translation of data into different formats retain their meaning 10:16:23 markharrison_: tools are very useful for data publishers to expose data in multiple formats 10:16:32 similar case to WCAG and UAAG use case 10:16:38 ... but then can also be useful to check that the meaninful of the data is not lost 10:16:54 Vagner_Br: we are not defining any kind of requiremnts, only few recommendations 10:17:06 laufer_: we don't have to say what is the tool 10:17:14 ... but a best practice to use the tools 10:18:03 one is general best practices and the other is about how tools should implement best practices 10:18:05 q? 10:18:29 we can follow here a similar approach 10:18:54 ok, waiting for the proposal.. 10:19:01 proposal: bp is to provide a single access point for data 10:19:10 ? 10:19:18 I don't understand the proposal 10:19:23 Me neither 10:20:05 is a general bp for providing an access point (i.e data catalog, api, sparql endpoint, etc.) 10:20:12 technology agnostic 10:20:25 but "single" is quite strong 10:20:39 centralized? 10:20:46 is that better? 10:21:04 centralized to me is not good, ... the web has a distributed nature 10:21:14 I think centralised is not good 10:21:31 agreed to JoaoPaulo 10:21:33 +1 to JoaoPauloAlmeida 10:21:42 New text is coming out 10:22:37 Proposal: Data might be provided via various access mechanisms including (but not limited to) Data catalogues, APIs, SPARQL endpoints, REST interfaces, dereferenceable URIs - and best practice is that data publishers should make use of available tools to support multiple access mechanisms 10:23:02 ok 10:23:08 much better 10:23:09 now I get it 10:23:38 ack CarlosIglesias 10:23:46 q+ 10:24:01 CarlosIglesias: "single' or "centralized" means to catalogue 10:24:09 ack 10:24:12 ack brian 10:24:21 q? 10:24:34 BrianMatthews: could we provide a mechanism vocab? 10:24:55 q+ 10:25:44 ... regarding centralized issue 10:26:02 ... if you can find a description of the dataset you can find them in different places 10:26:22 laufer_: specification IDRA is a way to specify APIs 10:26:34 hydra 10:26:35 HYDRA 10:26:39 http://www.hydra-cg.com/spec/latest/core/ 10:26:41 s/IDRA/HYDRA 10:26:44 could we extended VOID http://www.w3.org/TR/void/#access 10:26:45 is a way of describing web apis 10:26:45 q? 10:27:05 ack JohnGoodwin_ 10:27:30 ack meq? 10:27:30 JohnGoodwin_: maybe we could extented to VOID 10:27:40 +w 10:27:41 +q 10:29:02 In Dcat,http://www.w3.org/TR/vocab-dcat/ there are different ways to access a dcat:Distribution 10:29:41 ..like dcat:accessURL , API and so on 10:30:06 q+ 10:30:23 q? 10:30:50 Proposal 2: There is value in provision of a small number of well-known data catalogues - and registration of data with such catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) - so that data can be found easily 10:31:12 q? 10:31:28 ACk CarlosIglesias 10:31:36 ack Caroline 10:31:51 Caroline: we have another proposal "2" 10:31:53 ack Vagner_Br 10:32:07 Vagner_Br: I don't think makes sense to talk about centralized data 10:32:33 ... I agree with JoaoPauloAlmeida that centralized is aginst the spirit of the Web 10:32:34 rrsagent, make logs public 10:32:49 q? 10:33:10 Second proposal 2 does not mention "tools" ? 10:33:19 +1 to Vagner_Br and markharrison_'s first proposal 10:33:28 I don't understand this proposal 10:34:14 Proposal 2 is additional - to try to address concerns expressed by CarlosIglesias 10:34:43 ok 10:35:17 +1 for the first proposal 10:35:38 q+ 10:35:40 ack to vote 10:35:48 proposal 2 is not related to tools 10:35:48 ack BrianMatthews 10:35:57 the text of the 2nd proposal is obscure 10:36:33 BrianMatthews: proposal 2 is like saying we should not have more than a few catalogues 10:36:43 +1 JoaoPauloAlmeida and Bria 10:36:48 it is not related to the discussed topic 10:36:50 s/Bria/Brian 10:37:30 is not about tools, but about discoverability 10:37:50 CarlosIglesias: we should think more about the feredations concept 10:38:06 laufer_: we are talking about tools to provide access 10:38:16 ... how we are organizing this information 10:38:52 Reworded: Proposal 2-a: The registration of data within data-set catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) should be supported so that data can be found easily. 10:38:56 ... maybe we could change "multiple" for "federated" 10:39:30 Proposal 1: Data might be provided via various access mechanisms including (but not limited to) Data catalogues, APIs, SPARQL endpoints, REST interfaces, dereferenceable URIs - and best practice is that data publishers should make use of available tools to support multiple access mechanisms 10:39:31 +1 to BrianMatthews proposal 2a 10:39:33 I think the text is better now 10:39:48 +1 10:39:51 +1 10:39:51 +1 10:39:54 +1 10:39:55 +1 to Proposal 1 10:39:57 +1 10:40:01 +1 10:40:03 +1 to proposal 1 10:40:04 +1 to Proposal 1 10:40:07 proposal: to further discuss the federation concept in relation with previous proposal 10:40:11 +1 10:40:15 RESOLVED: ata might be provided via various access mechanisms including (but not limited to) Data catalogues, APIs, SPARQL endpoints, REST interfaces, dereferenceable URIs - and best practice is that data publishers should make use of available tools to support multiple access mechanisms 10:40:25 +1 to carlos 10:40:26 s/ata/data 10:40:36 +1 10:40:52 for voting now proposal 2: to further discuss the federation concept in relation with previous proposal 10:41:06 +1 10:41:18 +1 10:41:19 0 10:42:35 ok, after the break will the whole group reconvene? 10:43:09 +1 10:43:23 According to the agenda 10:43:32 For voting now Brina's proposal Proposal 2-a: The registration of data within data-set catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) should be supported so that data can be found easily. [07:39] <@Caroline> ... maybe we could change "multiple" for "federated" 10:43:36 +1 10:43:37 +1 to proposal 2-a 10:43:38 we continue the discussion in griups 10:43:38 +1 to brian 10:43:43 +1 10:43:53 +1 for prosal 2a 10:43:57 s/Brina/Brian/ 10:44:04 +1 to Proposal 2-a 10:44:29 RESOLVED: The registration of data within data-set catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) should be supported so that data can be found easily. [07:39] <@Caroline> ... maybe we could change "multiple" for "federated" 10:44:43 s/griups/griups 10:44:48 s/griups/groups 10:44:50 RESOLVED to further discuss the federation concept in relation with previous proposal 10:45:18 RESOLVED: to further discuss the federation concept in relation with previous proposal 10:45:27 it is time to break? 10:46:57 Yes, nathalia — I think they are getting coffee 11:04:23 Vagner_Br has joined #dwbpbestpractices 11:12:30 ok 11:14:06 zakim, generate minutes 11:14:06 I don't understand 'generate minutes', Caroline 11:16:13 markharrison has joined #dwbpbestpractices 11:16:32 Scribe: JohnGoodwin_ 11:17:38 Your minutes are here: http://www.w3.org/2014/04/01-dwbpbestpractices-irc 11:17:49 yes 11:18:05 TOPIC: Privacy/security 11:18:56 I am a total newbie in this topic 11:19:06 Caroline: is everybody awake? 11:20:21 Capability URLs: http://www.w3.org/TR/capability-urls/ 11:20:27 Ig_Bittencourt: we could look at capabiliy URLs as a means to hide data on the web 11:20:39 I think we should start by reviewing the challenges in the spreadsheet 11:21:36 BrianMatthews has joined #dwbpbestpractices 11:21:48 as listed in the spreadsheet it is not about confidentiality-integrity and availability of the data itself 11:21:55 it is about the content of data 11:22:01