09:21:28 RRSAgent has joined #dwbpbestpractices 09:21:28 logging to http://www.w3.org/2014/04/01-dwbpbestpractices-irc 09:21:29 Scribe: Caroline 09:22:06 Vagner_Br has joined #dwbpbestpractices 09:22:26 BernadetteLoscio_: do we fo trhough the list or we choose some? 09:23:22 Ig_Bittencourt: let's start with the subjects related only with BP. If we have time to discuss the others that are also related with Q&G we will do it later 09:23:57 BernadetteLoscio_: We can skip metadata since we have discussed it yesterday 09:24:02 I agree 09:24:14 laufer_: we can check what we have written about it 09:24:23 long discussion about it yesterday 09:24:40 BernadetteLoscio_: check https://docs.google.com/spreadsheet/ccc?key=0AhTZf3B9yQ3odGVvU3pBazFsY3pyUVppNDFSZGtyQkE&usp=sharing#gid=5 09:25:38 which tab are we looking at? 09:25:49 the one above 09:25:55 sorry 09:26:03 this one https://docs.google.com/spreadsheet/ccc?key=0AhTZf3B9yQ3odGVvU3pBazFsY3pyUVppNDFSZGtyQkE&usp=sharing#gid=6 09:26:20 laufer_: how can we define real time? 09:26:47 ... if we have an update time do we have the old data archived? 09:27:01 to me a challenge seems to be that data is often about phenomena in reality which change 09:27:13 so data may be added or change to reflect that 09:27:16 q+ 09:27:24 ack BernadetteLoscio_ 09:27:33 I can't hear you guys on the hang out 09:27:34 q+ 09:27:34 BernadetteLoscio_: real time 09:27:55 ... if we have a dataset we can have it in a catalogue, it might be in an API 09:28:06 laufer_: we can have this in dataset that are not in real time 09:28:16 q+ to say bp should be just provide data in a timely manner and then elaborate defs or whatever on that basis 09:28:18 ... we have a close time to update the data 09:28:55 now we can hear!!! 09:28:58 q+ to say that metadata should express frequency of updates, timestamp of last update 09:29:06 Zakim, who`s in the queue 09:29:06 I don't understand 'who`s in the queue', Caroline 09:29:13 BrianMatthews has joined #dwbpbestpractices 09:29:17 q? 09:29:29 +1 to CarlosIglesias view. 09:29:36 According to Wikipedia...Real Time Data: Real-time data denotes information that is delivered immediately after collection. There is no delay in the timeliness of the information provided. Real-time data is often used for navigation or tracking. 09:29:37 BernadetteLoscio_: real time is to update the data 09:29:49 markharrison_: very real time can be observation data 09:29:50 gatemezi has joined #dwbpbestpractices 09:30:21 There are two issues: the data/time when the observation was made 09:30:29 The time it takes for data to reach the intended audience 09:31:02 If the the time if takes for data to reach the intended audience (from observation) is known, then, the data/time when the observation was made can be derived 09:31:26 laufer_: in the position of the consumer: I need data. Some data with one week of update is okay 09:31:34 To = the data/time when the observation was made 09:31:36 ... they say they will update weekyl 09:31:44 q? 09:31:46 why this a best practice? 09:31:48 Td = The time it takes for data to reach the intended audience 09:32:01 The problem is that Td is usually unknown 09:32:08 ... if the data is 3 weeks without update could be a problem 09:32:21 It all depend on the type of data... 09:32:23 So, the best practice is how to deal with the fact that we may need to know To 09:32:27 ... if it is olny one data set we must garantee this data set is updated 09:32:35 yes, agree gatemezi 09:32:37 BernadetteLoscio_: this is one requirement 09:33:01 q+ 09:33:05 q+ to say that in addition to specifying expected update frequency for data, there is an (SLA) expectation to honour that update frequency 09:33:15 q+ 09:33:18 ... the question is: one of the use cases is on real time. 09:33:19 Data may have to be indexed in time 09:33:33 q? 09:33:37 ... this data can be available 09:33:46 ack CarlosIglesias 09:33:46 CarlosIglesias, you wanted to say bp should be just provide data in a timely manner and then elaborate defs or whatever on that basis 09:33:47 ack me 09:33:56 someone is making noise 09:34:02 ack laufer 09:34:05 In weather domain, you might need more frequent update (10 minutes?) , while geodata for districts can be updated each year ?§ 09:34:09 CarlosIglesias: we should try to define what best practices are 09:34:19 zakim, who ismaking noise? 09:34:19 I don't understand your question, Vagner_Br. 09:34:22 ... defining what is real time 09:34:40 Zakim can't track the sound of hangout 09:34:44 http://sunlightfoundation.com/policy/documents/ten-open-data-principles/ 09:35:06 ... if you have real time data you must update it 09:35:18 .... some data have vaule only on real 09:35:51 q+ 09:35:52 ... we could have at least the titles of the best practices 09:36:22 ack markharrison_ 09:36:22 markharrison_, you wanted to say that metadata should express frequency of updates, timestamp of last update and to say that in addition to specifying expected update frequency for 09:36:25 ... data, there is an (SLA) expectation to honour that update frequency 09:36:35 markharrison_: the metadata should express the frequency 09:36:44 q- 09:36:52 ... the expectation to provide the data with some frequency 09:37:01 markharrison +1 09:37:11 yes! 09:37:30 +1 markharrison_ 09:37:34 Data is a requirement: data may have to be indexed in time, in order to cope with the fact that we do not know Td 09:37:36 q? 09:37:57 ack Vagner_Br 09:38:06 Vagner_Br: I want to support CarlosIglesias 09:38:31 ... and to add that in terms of requirements the point is that data and metadata should be available 09:38:45 ... we should change the title 09:39:25 BrianMatthews: the data should be availave in a determined frequency 09:39:26 I meant "that is a requirement" 09:39:28 Is it possible to have metadata without the data is about? 09:39:45 laufer_: so the publisher has the obligation to do it on time 09:39:50 ack Ig_Bittencourt 09:40:11 Useful to declare update frequency in metadata to avoid the need to poll more frequently than the update frequency 09:40:15 Ig_Bittencourt: if you have data from the stock market you must update it in every 5min 09:40:50 q+ 09:41:09 ... in this case, the best practices should stablished that the publisher should release the data with a certain frequency 09:41:13 s/data and metadata should be available/data and metadata should be available in a timely manner/ 09:41:22 but when you upadate you may: (i) add new time-indexed entries or (ii) change the content of data [no time-indexing] 09:41:32 these are two different approaches 09:41:52 CarlosIglesias: we can write also general best practices regarding this issue 09:41:54 q+ to ask about support for standing query (publish/subscribe) capabilities for streaming data feeds? 09:42:08 q? 09:42:50 ack gatemezi 09:42:52 gatemezi: Is it possible to have metadata without the data is about? 09:42:57 ack JoaoPauloAlmeida 09:43:06 JoaoPauloAlmeida: but when you upadate you may: (i) add new time-indexed entries or (ii) change the content of data [no time-indexing] 09:43:16 ... these are two different approaches 09:43:24 laufer_: I think this is an issue of archiving 09:43:34 ... you must mantain the old data 09:44:13 q+ to respond to laufer - it depends whether the dataset is journalled or not - the metadata should declare whether the data is journalled (and time-stamped) 09:44:25 q? 09:44:35 ack BrianMatthews 09:44:47 BrianMatthews: I don't think we need to worry about the tec are being used 09:44:54 ... we should stick to the method 09:45:08 ... what policy and frequency the publisher 09:45:14 +1 to BrianMatthews point 09:45:18 ... not worry about tec 09:45:21 q? 09:45:27 ack markharrison_ 09:45:27 markharrison_, you wanted to ask about support for standing query (publish/subscribe) capabilities for streaming data feeds? and to respond to laufer - it depends whether the 09:45:31 ... dataset is journalled or not - the metadata should declare whether the data is journalled (and time-stamped) 09:45:41 markharrison_: we never delete anything regarding data 09:45:54 laufer_: we now have to make a sentence summaring all this 09:46:13 markharrison_, if the data is not time-indexed then we have to delete! 09:46:16 q? 09:46:27 so, there should be best practices for time-indexing, this is my point 09:46:48 CarlosIglesias: we could make an action to people detail it 09:47:42 +1 João Paulo 09:48:06 ok, nice 09:48:40 PROPOSAL: Metadata should declare 1) expected/scheduled frequency of update, 2) if the dataset is journalled (i.e. no deletions, only append), 3) if the dataset it timestamped (can request data for a specific time interval), 4) actual timestamp of last update 09:49:22 +1 09:50:06 +1 09:50:13 s/it timestamped/is timestamped/ 09:50:28 Just to understand the 2) point..you mean adding in a different URI? 09:50:39 This is a good proposal, I think we should just also note that there should be guidelines/best practices for the specification of time 09:51:00 ack markharrison_ 09:51:01 ..when you "append"... 09:51:17 if you point the laptop to the person with the floor, it will help us a lot (sorry to ask you guys that) 09:51:24 markharrison_: ir doesn't change 09:51:37 thanks 09:51:54 No just to understand laufer_ 09:51:58 gatemezi: URI / access method for the dataset should not change, in my opinion 09:52:14 laufer_: we are not saying how the data will be provided to the consumer 09:52:17 because imagine you were already consuming data in time to 09:52:25 ... you can have an URI or an API 09:52:33 ok 09:52:35 ... we don't know how the publisher will define the scheme 09:52:49 +1 then 09:52:52 +1 09:52:54 +1 09:52:58 +1 09:53:02 +1 09:53:07 +1 09:53:08 +1 09:53:09 +1 09:53:16 +1 09:53:19 +1 09:53:54 Why we don't use wiki for this? 09:54:29 Ok Carol, I understand 09:54:47 https://docs.google.com/spreadsheet/ccc?key=0AhTZf3B9yQ3odGVvU3pBazFsY3pyUVppNDFSZGtyQkE&usp=sharing#gid=6 09:55:03 it is in the Group Challenges 09:55:13 Caroline: Can someone put on the wiki later? 09:55:20 RESOLVED: Metadata should declare 1) expected/scheduled frequency of update, 2) if the dataset is journalled (i.e. no deletions, only append), 3) if the dataset it timestamped (can request data for a specific time interval), 4) actual timestamp of last update 09:55:24 I can put 09:56:09 ACTION: nathalia will put RESOLVED 1 on the Wiki (RESOLVED: Metadata should declare 1) expected/scheduled frequency of update, 2) if the dataset is journalled (i.e. no deletions, only append), 3) if the dataset it timestamped (can request data for a specific time interval), 4) actual timestamp of last update) 09:58:50 Caroline: lets talk about "tools" 09:59:07 I'm not seeing you 09:59:13 Vagner_Br: who could explain better what is the idea about "tolls" 09:59:29 TOPIC: Tools 09:59:52 laufer_: we must look at the use cases to understand what are tools 09:59:57 s/tolls/tools 10:00:01 the camera is looking to the roof 10:00:22 much better now 10:00:40 Ig_Bittencourt: Berna, can you explain about the tools? 10:00:55 scribe: Caroline 10:01:12 Bernadete: is related to skill and expertise 10:01:23 Vagner_Br: when you talk about tools as a challenge 10:01:33 ... how can we generalize that as a challenge? 10:01:44 Bernadette: I saw this in the use case from Recife 10:01:51 q+ 10:01:54 laufer_: NYC uses Socrata for example 10:02:00 Vagner_Br: tools about catalogue? 10:02:06 Bernadette: in general 10:02:19 CarlosIglesias: provide a single or centralized access point for the data 10:02:27 ... could be a ckan catalogue 10:02:37 ,,, or another kind 10:02:52 ... matching access to data 10:03:15 s/,,,/... 10:03:30 Bernadette: what could be a best practices for this 10:03:34 q? 10:03:39 ack Ig_Bittencourt 10:03:51 Ig_Bittencourt: we should be agnostic 10:04:12 ... a question on documentation: if we have components with APIs, we should provide documentation 10:04:15 q? 10:04:22 ack laufer_ 10:04:37 q+ 10:04:42 laufer_: the publisher might have a best practice to publish the data 10:04:58 ... he must choose a tool that can do what the publisher wants 10:05:28 ... the choise of the toll will remain on what the publisher wants 10:05:39 .. if there is a tool that can do what he or she expects 10:05:55 ... people who have excell needs a kind of tool 10:06:03 I am wondering if we should recommend a tool here... 10:06:21 ... if the pulisher thinks that tool is good, we can recommend what is the best practice to choose a tool 10:06:23 q+ 10:06:31 No gatemezi. According to the charter, we need to be agnostic. 10:06:43 laufer_: I think we have to think about the consumer and the publisher 10:07:05 ... the publisher has to choose a tool that can do what we want 10:07:09 ack Vagner_Br 10:07:24 Vagner_Br: I agree with Ig_Bittencourt and laufer_ 10:07:35 agree with Laufer 10:07:38 ...it is hard to find any kind of requirements 10:07:41 q+ 10:07:52 so maybe we can skip this and come back later ? 10:07:53 ... even if we must be agnostic 10:08:12 ... if we say any kind of tools you use should be well documented 10:08:20 q+ to say the tool is just a mean, the bp is provide centralized access to data 10:08:24 ... if you don't want to say that just drop this topic 10:08:44 q? 10:09:00 A tool can also be an implementation of our bp 10:09:12 Caroline: we should try to resolve this 10:09:20 q+ to say that Tools are very useful for helping data publishers to check that translation of data into different formats retain their meaning 10:09:22 ... at least to have a way to go 10:09:25 ack BrianMatthews 10:09:46 BrianMatthews: we can write a recommendation about a method description about a dataset 10:09:59 .... we can say a data said could publish a data description 10:10:12 I can't hear a thing 10:10:13 laufer_: this is a requirement of the publisher 10:10:17 s/data said/dataset 10:10:26 ... the tool he will choose it will understand what he needs 10:10:33 +1 João Paulo 10:10:43 ... we can make a recommendation that he should use 10:11:06 ... you should interoperate with the tools in a standard way 10:11:25 q+ to compare this discussion with the a11y use case 10:11:32 ack caroline 10:11:38 Caroline: has talked already 10:11:59 q? 10:12:09 ack CarlosIglesias 10:12:09 CarlosIglesias, you wanted to say the tool is just a mean, the bp is provide centralized access to data and to compare this discussion with the a11y use case 10:12:09 ack me 10:12:49 CarlosIglesias: the best practices does not require any tool 10:13:14 ... sometimes it will be an API, sometimes data catalogue, or another thing 10:13:16 BrianMatthews, you meant we could make a recommendation about metadata about the tool used to publish the data? 10:13:46 ... the BP could provide different set of tool 10:13:52 s/tool/tools 10:14:17 .. we could do something similiar to?? 10:14:34 ... one one side you have a content for a city guidance 10:14:57 ... on the other side after creating it we can have a data tools guidelines 10:15:44 q? 10:15:49 ack markharrison_ 10:15:49 markharrison_, you wanted to say that Tools are very useful for helping data publishers to check that translation of data into different formats retain their meaning 10:16:23 markharrison_: tools are very useful for data publishers to expose data in multiple formats 10:16:32 similar case to WCAG and UAAG use case 10:16:38 ... but then can also be useful to check that the meaninful of the data is not lost 10:16:54 Vagner_Br: we are not defining any kind of requiremnts, only few recommendations 10:17:06 laufer_: we don't have to say what is the tool 10:17:14 ... but a best practice to use the tools 10:18:03 one is general best practices and the other is about how tools should implement best practices 10:18:05 q? 10:18:29 we can follow here a similar approach 10:18:54 ok, waiting for the proposal.. 10:19:01 proposal: bp is to provide a single access point for data 10:19:10 ? 10:19:18 I don't understand the proposal 10:19:23 Me neither 10:20:05 is a general bp for providing an access point (i.e data catalog, api, sparql endpoint, etc.) 10:20:12 technology agnostic 10:20:25 but "single" is quite strong 10:20:39 centralized? 10:20:46 is that better? 10:21:04 centralized to me is not good, ... the web has a distributed nature 10:21:14 I think centralised is not good 10:21:31 agreed to JoaoPaulo 10:21:33 +1 to JoaoPauloAlmeida 10:21:42 New text is coming out 10:22:37 Proposal: Data might be provided via various access mechanisms including (but not limited to) Data catalogues, APIs, SPARQL endpoints, REST interfaces, dereferenceable URIs - and best practice is that data publishers should make use of available tools to support multiple access mechanisms 10:23:02 ok 10:23:08 much better 10:23:09 now I get it 10:23:38 ack CarlosIglesias 10:23:46 q+ 10:24:01 CarlosIglesias: "single' or "centralized" means to catalogue 10:24:09 ack 10:24:12 ack brian 10:24:21 q? 10:24:34 BrianMatthews: could we provide a mechanism vocab? 10:24:55 q+ 10:25:44 ... regarding centralized issue 10:26:02 ... if you can find a description of the dataset you can find them in different places 10:26:22 laufer_: specification IDRA is a way to specify APIs 10:26:34 hydra 10:26:35 HYDRA 10:26:39 http://www.hydra-cg.com/spec/latest/core/ 10:26:41 s/IDRA/HYDRA 10:26:44 could we extended VOID http://www.w3.org/TR/void/#access 10:26:45 is a way of describing web apis 10:26:45 q? 10:27:05 ack JohnGoodwin_ 10:27:30 ack meq? 10:27:30 JohnGoodwin_: maybe we could extented to VOID 10:27:40 +w 10:27:41 +q 10:29:02 In Dcat,http://www.w3.org/TR/vocab-dcat/ there are different ways to access a dcat:Distribution 10:29:41 ..like dcat:accessURL , API and so on 10:30:06 q+ 10:30:23 q? 10:30:50 Proposal 2: There is value in provision of a small number of well-known data catalogues - and registration of data with such catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) - so that data can be found easily 10:31:12 q? 10:31:28 ACk CarlosIglesias 10:31:36 ack Caroline 10:31:51 Caroline: we have another proposal "2" 10:31:53 ack Vagner_Br 10:32:07 Vagner_Br: I don't think makes sense to talk about centralized data 10:32:33 ... I agree with JoaoPauloAlmeida that centralized is aginst the spirit of the Web 10:32:34 rrsagent, make logs public 10:32:49 q? 10:33:10 Second proposal 2 does not mention "tools" ? 10:33:19 +1 to Vagner_Br and markharrison_'s first proposal 10:33:28 I don't understand this proposal 10:34:14 Proposal 2 is additional - to try to address concerns expressed by CarlosIglesias 10:34:43 ok 10:35:17 +1 for the first proposal 10:35:38 q+ 10:35:40 ack to vote 10:35:48 proposal 2 is not related to tools 10:35:48 ack BrianMatthews 10:35:57 the text of the 2nd proposal is obscure 10:36:33 BrianMatthews: proposal 2 is like saying we should not have more than a few catalogues 10:36:43 +1 JoaoPauloAlmeida and Bria 10:36:48 it is not related to the discussed topic 10:36:50 s/Bria/Brian 10:37:30 is not about tools, but about discoverability 10:37:50 CarlosIglesias: we should think more about the feredations concept 10:38:06 laufer_: we are talking about tools to provide access 10:38:16 ... how we are organizing this information 10:38:52 Reworded: Proposal 2-a: The registration of data within data-set catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) should be supported so that data can be found easily. 10:38:56 ... maybe we could change "multiple" for "federated" 10:39:30 Proposal 1: Data might be provided via various access mechanisms including (but not limited to) Data catalogues, APIs, SPARQL endpoints, REST interfaces, dereferenceable URIs - and best practice is that data publishers should make use of available tools to support multiple access mechanisms 10:39:31 +1 to BrianMatthews proposal 2a 10:39:33 I think the text is better now 10:39:48 +1 10:39:51 +1 10:39:51 +1 10:39:54 +1 10:39:55 +1 to Proposal 1 10:39:57 +1 10:40:01 +1 10:40:03 +1 to proposal 1 10:40:04 +1 to Proposal 1 10:40:07 proposal: to further discuss the federation concept in relation with previous proposal 10:40:11 +1 10:40:15 RESOLVED: ata might be provided via various access mechanisms including (but not limited to) Data catalogues, APIs, SPARQL endpoints, REST interfaces, dereferenceable URIs - and best practice is that data publishers should make use of available tools to support multiple access mechanisms 10:40:25 +1 to carlos 10:40:26 s/ata/data 10:40:36 +1 10:40:52 for voting now proposal 2: to further discuss the federation concept in relation with previous proposal 10:41:06 +1 10:41:18 +1 10:41:19 0 10:42:35 ok, after the break will the whole group reconvene? 10:43:09 +1 10:43:23 According to the agenda 10:43:32 For voting now Brina's proposal Proposal 2-a: The registration of data within data-set catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) should be supported so that data can be found easily. [07:39] <@Caroline> ... maybe we could change "multiple" for "federated" 10:43:36 +1 10:43:37 +1 to proposal 2-a 10:43:38 we continue the discussion in griups 10:43:38 +1 to brian 10:43:43 +1 10:43:53 +1 for prosal 2a 10:43:57 s/Brina/Brian/ 10:44:04 +1 to Proposal 2-a 10:44:29 RESOLVED: The registration of data within data-set catalogues (or auto-discovery and indexing/classification by such catalogues based on published metadata) should be supported so that data can be found easily. [07:39] <@Caroline> ... maybe we could change "multiple" for "federated" 10:44:43 s/griups/griups 10:44:48 s/griups/groups 10:44:50 RESOLVED to further discuss the federation concept in relation with previous proposal 10:45:18 RESOLVED: to further discuss the federation concept in relation with previous proposal 10:45:27 it is time to break? 10:46:57 Yes, nathalia — I think they are getting coffee 11:04:23 Vagner_Br has joined #dwbpbestpractices 11:12:30 ok 11:14:06 zakim, generate minutes 11:14:06 I don't understand 'generate minutes', Caroline 11:16:13 markharrison has joined #dwbpbestpractices 11:16:32 Scribe: JohnGoodwin_ 11:17:38 Your minutes are here: http://www.w3.org/2014/04/01-dwbpbestpractices-irc 11:17:49 yes 11:18:05 TOPIC: Privacy/security 11:18:56 I am a total newbie in this topic 11:19:06 Caroline: is everybody awake? 11:20:21 Capability URLs: http://www.w3.org/TR/capability-urls/ 11:20:27 Ig_Bittencourt: we could look at capabiliy URLs as a means to hide data on the web 11:20:39 I think we should start by reviewing the challenges in the spreadsheet 11:21:36 BrianMatthews has joined #dwbpbestpractices 11:21:48 as listed in the spreadsheet it is not about confidentiality-integrity and availability of the data itself 11:21:55 it is about the content of data 11:22:01 markharrison: these are a form of obfscuation? 11:22:08 Ig_Bittencourt: yes 11:23:44 laufer_: we can't have private data published on the web 11:23:45 q+ to say that capability URLs only work for one-time access in a limited time window 11:24:13 Ig_Bittencourt: can provide access to a group of people 11:24:14 q? 11:24:22 ack markharrison 11:24:22 markharrison, you wanted to say that capability URLs only work for one-time access in a limited time window 11:24:49 markharrison: capabilities URLS only work for one time access, but not for repeated access of data 11:25:10 ... we have to respect data protection legislation 11:25:24 q+ need to respect Data Protection legislation - especially for personally identifiable data 11:25:37 q? 11:25:46 Ig_Bittencourt: example - data from health areas 11:26:04 ... medical history of individual people 11:26:44 laufer_: no use cases for publishing such personal data due to issues of privacy 11:27:07 ... what is the metadata about privacy and security? 11:27:20 q+ on his vision on this (1) published data by default with only limitation of privacy/security (2) data combination and merging (3) personal data management 11:27:31 Ig_Bittencourt: also related to quality!? 11:27:44 q? 11:27:44 q+ often the solution / best practice is to publish aggregated data at coarser granularity, so that the original sensitive raw data cannot be extracted / reverse-engineered 11:27:58 q? 11:28:03 q+ to say that often the solution / best practice is to publish aggregated data at coarser granularity, so that the original sensitive raw data cannot be extracted / reverse-engineered 11:28:08 ack CarlosIglesias 11:28:08 CarlosIglesias, you wanted to comment on his vision on this (1) published data by default with only limitation of privacy/security (2) data combination and merging (3) personal 11:28:11 ... data management 11:28:17 CarlosIglesias: three things here 11:28:34 q? 11:30:32 ack markharrison 11:30:32 markharrison, you wanted to say that often the solution / best practice is to publish aggregated data at coarser granularity, so that the original sensitive raw data cannot be 11:30:35 ... extracted / reverse-engineered 11:32:10 Vagner_Br: privacy and security is data publisher responsibiliy - no our concern to make any recommendation/requirement? 11:32:50 markharrison: if publisher is aware raw data is sensitive then they have responsibility whether to publish fine grained data or aggregate so cannt identify individuals 11:32:54 q+ 11:33:14 +q 11:33:25 q+ 11:33:27 ack BrianMatthews 11:33:57 BrianMatthews: are people aware of P3P? W3C initiative closed some years ago. 11:34:05 http://www.w3.org/P3P/ 11:34:18 ... guidelines about privacy etc. 11:34:39 ... P3P could provide useful material to consult 11:34:45 ack Caroline 11:34:56 ack laufer 11:34:59 +q 11:36:18 Vagner_Br: best practice is that publishers should provide mechanisms to control privacy 11:36:59 s/Vagner_Br/Laufer 11:37:09 ack Caroline 11:37:51 +q to say that an example of commercially sensitive data is serial-level traceability data (which can reveal inventory volumes, trading relationships, flow patterns) - such data is shared on a 'need-to-know' basis but consumers might be interested in a high-level summary of this data, without needing access to every observation event 11:38:07 P3P is more about gathering and using private data - focussed on data consumers rather than data providers 11:38:47 q? 11:38:48 Caroline: proposes two directions: 1) Principles for person data, and also consider technical issues e.g. P3P 11:38:50 ack markharrison 11:38:50 markharrison, you wanted to say that an example of commercially sensitive data is serial-level traceability data (which can reveal inventory volumes, trading relationships, flow 11:38:53 ... patterns) - such data is shared on a 'need-to-know' basis but consumers might be interested in a high-level summary of this data, without needing access to every observation 11:38:53 ... event 11:39:50 q+ to say minimum requirement is legislation 11:40:07 markharrison: need permissions 11:40:15 q? 11:40:22 ... it's complicated 11:40:31 ack CarlosIglesias 11:40:31 CarlosIglesias, you wanted to say minimum requirement is legislation 11:41:29 q? 11:41:39 CarlosIglesias: minimal requirement is to comply with data legislation 11:42:25 q? 11:42:32 ack need 11:42:32 need, you wanted to respect Data Protection legislation - especially for personally identifiable data 11:43:10 q+ to ask if there is a Freedom of Information aspect to request not only what data a company collects about an individual - but to ask whether - and where that data is published on the web? 11:43:18 ack mark 11:43:18 markharrison, you wanted to ask if there is a Freedom of Information aspect to request not only what data a company collects about an individual - but to ask whether - and where 11:43:21 ... that data is published on the web? 11:44:13 markharrison: is there consequency for FOIs for publishing data on the web? 11:46:33 provide different permissions to access data 11:46:46 q? 11:47:11 laufer_: why/how can we give permission to access data? 11:47:14 I'm not listening you 11:47:24 q+ to say Data Security/Privacy is a matter of eithe public legislation or internal policy. Shouldn't we avoid any requirements? Should we make any recommendation on traceability and licencing? 11:47:38 q+ 11:47:42 q+ to suggest considering the concept of security realms - to indicate in metadata what security credentials should be presented in order to gain access to the data 11:47:53 q? 11:48:16 ack Vagner_Br 11:48:16 Vagner_Br, you wanted to say Data Security/Privacy is a matter of eithe public legislation or internal policy. Shouldn't we avoid any requirements? Should we make any 11:48:19 ... recommendation on traceability and licencing? 11:49:22 Vagner_Br: data security and privacy are topics for government legislatoin and interal policy of organisation 11:49:35 rrsagent, make logs public 11:49:40 \o/ 11:52:51 q? 11:52:56 ack BrianMatthews 11:53:28 BrianMatthews: are we in a good position today to make concrete recommendations - should we park this discussion for now? 11:53:42 +1 11:53:46 +1 11:53:47 +1 11:53:48 ack markharrison 11:53:48 markharrison, you wanted to suggest considering the concept of security realms - to indicate in metadata what security credentials should be presented in order to gain access to 11:53:51 ... the data 11:53:53 +1 11:56:03 agree with markharrison 11:57:20 An RDF Schema for P3P 1.0: http://www.w3.org/TR/p3p-rdfschema/ 11:57:57 and http://www.w3.org/TR/P3P/ 11:58:09 Proposal: Acknowledging that much further discussion is needed on security, metadata could include information about security realms (see OASIS SAML/XACML) that apply to restricted-access data on the web. Realms indicate which security credentials need to be presented in order to be considered for access to the data. 11:59:10 +1 11:59:15 `=1 11:59:22 +1 11:59:28 +1 11:59:50 +1 11:59:54 +1 12:00:13 +1 12:00:21 +1 and also note that other technologies such as http://www.w3.org/TR/P3P/ may also be relevant to consider in metadata 12:00:47 +1 12:00:59 RESOLVED: Acknowledging that much further discussion is needed on security, metadata could include information about security realms (see OASIS SAML/XACML) that apply to restricted-access data on the web. Realms indicate which security credentials need to be presented in order to be considered for access to the data. 12:00:59 +1 12:01:14 Proposal: Lunch now? 12:01:14 0 12:01:15 +1 12:01:27 +1 12:02:00 +10 12:02:09 +1 to breakfast 12:02:47 tks 12:03:01 JoaoPauloAlmeida_ has joined #dwbpbestpractices 12:03:07 +1 12:03:12 +11111 12:03:31 s/houe/hour 12:03:42 good breakfast to you JoaoPaulo 12:03:48 see you 12:19:43 JoaoPauloAlmeida has joined #dwbpbestpractices 13:04:04 HadleyBeeman has joined #dwbpbestpractices 13:12:02 rrsagent, draft minutes 13:12:02 I have made the request to generate http://www.w3.org/2014/04/01-dwbpbestpractices-minutes.html HadleyBeeman 13:15:26 markharrison has joined #dwbpbestpractices 13:17:01 We are moving on to skills and expertise. 13:17:10 scribenick: hadleybeeman 13:18:25 Laufer: A best practice is to study. 13:19:03 laufer has joined #dwbpbestpractices 13:19:12 +laufer 13:19:28 Ig_Bittencourt has joined #DWBPBestPractices 13:19:32 CarlosIglesias_ has joined #dwbpbestpractices 13:20:04 CarlosIglesias has joined #dwbpbestpractices 13:20:11 q+ 13:20:16 BrianMatthews has joined #DWBPbestpractices 13:20:18 ack ig 13:20:33 http://www.w3.org/TR/2014/NOTE-ld-bp-20140109/ 13:20:35 IG_bittencourt: The W3C has some best practices about that. 13:21:00 …: For example, best practices on publishing linked data. Not useful for any kind of data, but useful for linked data. 13:21:08 q+ to add again that, again, we should not focus only on data offer but include also demand 13:21:39 ack carlos 13:21:39 CarlosIglesias, you wanted to add again that, again, we should not focus only on data offer but include also demand 13:22:50 Carlosiglesias: We should broaden the focus to include data demand too. Engage with data resuers, help them to acquire the skills needed for data reuse. Universities with business, the civil society organisations, etc. Not just about the skills of the government; all those in the ecosystem. 13:23:44 … We should help data reusers as well. Skills and collaboration. Better data culture within society, etc. 13:24:45 Laufer: Searching for people who are interested in the same things, and participating in a community will encourage the reuse of the data. 13:25:52 CarlosIglesias: Will also help address the problems faced by data reusers. For example, civil society organisations often don't have skills in IT. Most people don't have these skills. It's the entire data chain, involve them all in the process of opening the data. 13:26:47 q+ so can we recommend that publishers of data on the web should provide some simple examples of how it can be used / accessed and what it can be used for? 13:27:00 ack mark 13:27:28 Some useful references for this point: 13:27:30 http://www.businessofgovernment.org/report/designing-open-projects-lessons-internet-pioneers 13:27:33 +1 markharrison 13:27:39 markharrison: We should recommend to publishers of data on the web to include simple examples of how to use their data. Gives data reusers confidence to try. 13:27:43 And other good documentation 13:27:43 http://www.timdavies.org.uk/2012/01/21/5-stars-of-open-data-engagement/ 13:28:00 JohnGoodwin has joined #DWBPBestPractices 13:28:07 were already mentioned first f2f day 13:28:11 +1 markharrison 13:31:31 Caroline has joined #dwbpbestpractices 13:32:00 TOPIC: Skills/Expertise 13:33:33 hadley has joined #dwbpbestpractices 13:33:41 google hangout https://plus.google.com/hangouts/_/72cpi53b5160goccfvbb33ho18 13:34:24 sure 13:34:57 it is ok here too 13:35:04 Srcribe: markharrison 13:35:12 Scribe: markharrison 13:35:33 scribenick: markharrison 13:36:02 q? 13:36:13 Thank you Hadley! 13:36:36 q+ 13:36:48 q- 13:37:04 Vagner_Br has joined #dwbpbestpractices 13:37:58 +q 13:38:07 ack CarlosIglesias 13:38:35 CarlosIglesias: was discussing about capability / capacity of data publisher and also data re-users / potential re-users. 13:39:10 should we consider the '5-stars of data engagement' http://www.timdavies.org.uk/2012/01/21/5-stars-of-open-data-engagement/ and build on these 13:40:04 ... Not only provide the data best practices for data provider. Data providers should encourage re-use by building capacity for external re-users 13:40:09 JoaoPauloAlmeida has joined #dwbpbestpractices 13:40:40 laufer: useful to incentivise community of data providers and data users - so both sides can enhance their skills and understand each other's needs 13:40:53 ack JohnGoodwin 13:41:23 JohnGoodwin: 5-star scheme of data engagement - see link above (Tim Davies) 13:41:31 http://www.businessofgovernment.org/report/designing-open-projects-lessons-internet-pioneers 13:41:57 CarlosIglesias: and also see link http://www.businessofgovernment.org/report/designing-open-projects-lessons-internet-pioneers 13:42:39 Vagner_Br: concern about burdening data publishers with burden of encouraging re-use 13:43:01 ... more of a recommendation than requirement 13:43:15 CarlosIglesias: perhaps a SHOULD not a MUST 13:43:29 q? 13:44:02 One way of encouraging reuse is to have metrics for data, by "promoting" them each time they are reused and reported 13:44:11 ... also include for each best practice, some real-world example of how it is being done 13:44:33 ... serves as a proof of implementation of best practices 13:44:52 laufer: so it's a use case we can point to? 13:45:12 q? 13:46:06 Vagner_Br: just wanted to say that we should take care not to only consider the perspective / responsibilities of the data provider 13:46:43 ... expectations in terms of expertise, opening up the data. Would also like to consider the perspective of the consumers and re-users of the data 13:47:12 ... they should encourage government / publishers to open up the data - it's a two-way street 13:47:31 q? 13:47:50 q+ 13:47:59 laufer: CarlosIglesias also raised these points - and the need for synergies between data providers and data re-users / consumers - and incentivisation and feedback 13:48:15 +1 13:48:27 Vagner_Br: entire data ecosystem need to play a role 13:48:41 ack Caroline 13:48:41 I'm not hearing you 13:49:41 Caroline: should we explain which roles / actors are in the ecosystems? - data publishers, data re-users, end-consumers of data 13:50:58 CarlosIglesias: also talk about collaboration and co-operation 13:51:03 http://blog.okfn.org/2011/03/31/building-the-open-data-ecosystem/ 13:51:54 Participation and collaboration lessons from Internet pioneers: 13:52:03 Caroline: also mention about the value of the communities that are already engaged 13:52:04 1 - Let everyone play 13:52:14 2 - Play nice 13:52:24 q? 13:52:25 Caroline: to encourage existing communities to grow 13:52:29 3 - Tell what you are doing while you are doing it 13:52:39 4 - Use multiple communication channels 13:52:46 5 - Give it away 13:52:47 +1 to CarlosIglesias 13:52:54 6 - Reach for the edges 13:53:02 7 - Take advantage of all organizations 13:53:12 8 - Design for participation 13:53:19 9 - Increase network impact 13:53:27 and 10 - Build platforms 13:53:44 +1 13:53:45 from http://www.businessofgovernment.org/report/designing-open-projects-lessons-internet-pioneers 13:53:47 +1 to CarlosIglesias 13:53:49 more details there 13:54:45 +q W3C already maintain lists of tools - but do we need more commentary to say why these are useful 13:55:00 +q to say W3C already maintain lists of tools - but do we need more commentary to say why these are useful 13:55:20 on a side note 13:55:28 ack markharrison 13:55:28 markharrison, you wanted to say W3C already maintain lists of tools - but do we need more commentary to say why these are useful 13:55:30 may be worth also looking at http://www.webfoundation.org/wp-content/uploads/2013/06/OGD-Indonesia-FINAL-for-publication.pdf 13:55:58 and http://data.worldbank.org/sites/default/files/1/od_readiness_-_revised_v2.pdf 13:56:39 to see what are the dimensions usually associated to (open) data 13:57:06 e.g. provide more background in addition to what is already at https://www.w3.org/2001/sw/wiki/Tools 13:57:56 q+ 13:58:07 laufer: interaction within ecosystem is a way to improve skills and expertise of all actors 13:58:18 ack Ig_Bittencourt 13:58:20 ... and to provide feedback and incentivisation 13:59:16 q+ to say Are we saying that: The intarction between the ecosystem's actors is the way to increase the expertise and skill among them? 13:59:25 Ig_Bittencourt: even if we provide links to tools, we need to provide more info to guide which tools to use for specific purposes 13:59:42 ... e.g. could also be helpful to provide benchmarking 14:00:27 ack Vagner_Br 14:00:27 Vagner_Br, you wanted to say Are we saying that: The intarction between the ecosystem's actors is the way to increase the expertise and skill among them? 14:00:30 q? 14:01:33 q+ 14:01:40 ack Ig_Bittencourt 14:02:25 Ig_Bittencourt: if we consider all actors, may be interesting to provide some step-by-step guide for publishing data / linked data 14:02:31 q? 14:02:38 ack CarlosIglesias 14:03:05 I guess the interaction among the actors in the ecosystem could help increase the quality of the data (detecting and reporting errors, etc) 14:03:17 I think it is importante consider the consumers too 14:03:25 +1 to gatemezi and nathalia 14:03:32 CarlosIglesias: In all projects, a big part is to consider participation - via hackathons, collaboration with entrepreneurs - we can provide some high-level guidance on this 14:03:36 q+ 14:03:38 how people can reuse the data for another proposals 14:04:11 ... also including educational materials for school and university at every level 14:04:42 ... think about engagement techniques - discuss further later 14:04:42 ack Vagner_Br 14:04:47 q+ to say we may suggest examples of interaction among actor not how to 14:05:44 Vagner_Br: perhaps not a step-by-step guide. Many already exist. See our role is to provide real examples of interactions among actors. 14:06:30 CarlosIglesias: not only hack events, but also direct access to the platform (one click registration) and sharing reuse cases and applications. 14:06:44 Agree that is not to build a stet-by-step but to link them, e.g. http://www.w3.org/TR/2014/NOTE-ld-bp-20140109/ 14:07:59 +1 to Ig e Vagner 14:09:31 rrsagent, draft minutes 14:09:31 I have made the request to generate http://www.w3.org/2014/04/01-dwbpbestpractices-minutes.html HadleyBeeman 14:09:40 q? 14:09:50 ack Vagner_Br 14:09:50 Vagner_Br, you wanted to say we may suggest examples of interaction among actor not how to 14:10:41 Like this? --- the interaction among the actors in the ecosystem could help increase the skils amont them and the value of the data (detecting and reporting errors, etc) 14:10:51 we should make sure that we state certain practices that when followed lead to better interaction among actors in the ecosystem 14:11:03 s/amont/among/ 14:11:09 +1 to JoaoPauloAlmeida 14:11:18 this should be our mission, to them increase the value of the whole ecosystem 14:11:47 if you follow this advice (best practices) then the data you publish can be more valuable to others 14:12:07 +1 14:12:15 I mean in general to all our best practices 14:12:21 q+ 14:12:33 our mission is to produce advice to make this ecosystem viable and valuable 14:14:05 JoaoPauloAlmeida: we can only read you - we cannot hear you by audio on Google Hangout 14:14:49 ok, I am also only able to followw in text. So, perhaps my comment is a bit out of context. I understood that Vagner_Br was reasoning on the mission of the group, ... 14:14:58 laufer: collective effect of collaboration improves the value of the data by improving expertise of all actors 14:15:00 ack Ig_Bittencourt 14:15:04 JoaoPauloAlmeida: I think what we are saying is that there should be a "way" for different actors to come together and "speak about" the data, the way they are used, etc.. And this can be achieved via many channels (hackathon, other events, etc..) 14:15:21 +1 to gatemezi 14:15:31 thanks gatemezi that clarifies 14:15:35 - need feedback loop from users to publishers 14:15:45 who on the hangout is listening us? 14:15:58 I can't... 14:16:04 i'm not listening well