See also: IRC log
<hwest> Hey all - waiting on the phone connection before we get started
<hwest> Anyone on the phone?
<tlr> we can do skype if there's anybody remote.
<tlr> Right now, I'm not sure there is anybody
<rigo> RvE: red orange and green logs
<rigo> red have lots of identifying and precious information e.g. TomTom with geolocation stuff in it
<rigo> in TomTom, they had id and changed it to pseudonymous ID within 24 hours you could translate and after 24 hours, they will break that link
<rigo> unlink data and maintain data that can be only de-identified on behavioral patterns
<rigo> so 1/ de-identification
<hwest> Anyone on the phone?
<rigo> 2/ and manage the risk of re-identification
<rigo> wanted to have speed average information in aggregate, did not want to throw data away
<rigo> SW: de-identification being bundled, technical operational and administrative
<rigo> RvE: de-identification by ?? In order to have it work. Can only work if the time the link is active is limited. To get from orange to green you have to de-link it
<rigo> ... if you want to list all the people used, it will get very difficult. Establish de-identification by throwing away the SALT
<rigo> ... wanted to proove that red-orange-green works well. Full URI == red
<rigo> ... URI + link to identifiable data == orange
<rigo> ... throw away the link == green
<rigo> JeffWilson (JW): did you do final analysis and result
<rigo> RvE: did so after investigation, was able to quantify the risk, risk was low
<rigo> SW: APEC most stringent things in Japan and South Korea. Most what we discuss would satisfy APEC
<rigo> RW: so finding consensus on something that satisfies Art. 29 would allow us to play everywhere
<rigo> SW: yes, would have worded it differently, we have to try to find common ground
<rigo> RvE: color scheme, green is mostly unintersting to DPA as normally also accompagnied with safeguards
<rigo> HW: hashing throw away the SALT, how often should the SALT be rotated?
<rigo> RvE: if you do not throw away the SALT, it remains in orange
<rigo> SW: as soon as you break the SALT to chunks. In the chunks we do not want to be prescriptive.
<rigo> RvE: agree, depends on the purposes of processing
<rigo> RW: what does that mean?
<hwest> Jeff: all contextual as to what the right period here would be or when something carries personal information
<rigo> FW: only for permitted uses, we have already identified concrete purposes there. Can have different things. Web analytics could be days, some traffic measures need less
<rigo> HW: recapitulates discussion to peterswire
<rigo> SW: we haven't created consensus on security. There we can not do de-identification. There retention periods are longer and not de-identify
<rigo> ... there you would rely on technical operational and administrative measures
rob: if you want to store data because you're worreid about click fraud
it would be good to just store the data you need
Rob: look at history of data retentio ndirective in EU
there was long discussion about what was necssary to retain
and it was separate from how long
rob the whole question of you acutally need to accomplish your goal is a relevant one
what i don;t hear in the dnt discussion often is taking account what i really need to acomplish that
Jeff: Rigo mentioned AOL data incident
there was small number of people xposed with a data set that large
<rigo> RvE: Security is not a green card to do whatever. Clickfraud e.g. long discussion about data retention Directive showed that we needed for 6month - 3years. Relevant discussion. Real need and necessity for security data collection should be discussed
Even PHI released for public analsis and research is a higher treshold
I want to make sure that if we're saying that there is something to learn, that it was far under the radar
that should count as deidentified data.
Rigo - m,y point is procedural.
Theoretically in the fufutrre decrypting could take one sefcond
the standard we are doing can't provide this - we can only have a momentarily how to fix this for now
so that it's workable
because we are having in my opinion
we are sitting between 2 extremes - do nothing and run the cart in the wall
how do you do usable privacy with concrete technical hints
You need to go into the red-tape field and ask technicians
our client is the average website provider
I'm concerned that we identify rules that requre certification, then that's the end of it
Shane - I agree
with that sentiment
deID is not a simnply problem.
All concepts of DeID - we have today's staet where there is none. Typically, small to medium size comapnies don't know what to do with that
They don't try to address the data
They probably don't even know what they have
Asking them to rotate keys, do admin controls, etc. - small companies won't know what dto do
They can just delete the data.
They can choose not ot implement DNT
or I do think that this will spawn a new bujsiness and there will be fcompanies
that igve you a server plugin that will do deidentification for server users
I think it's iomportant to keep small and medium size businesses in mind
Rigo - we have a good ujnderstanding of what we could do
I would, on the risk of creating some kind of error, I would compare this to envrionemnt moment
where 30 years ago they were laughed at
and today this creates a billion dollar business.
If we are laying boudnaries for businesses, we should be aware of it. This should be workable for businesses.
Shane - we have Small Biz at Yahoo
we host hundreds of thousands of sites today
There are store systems, payment systems, etc.
Rigo - where is the pain point
Heather - The pain wil lbe around waht is appropriate
Let's note down that we have agreement on Rob's plan - w
we needto drill down on Rob's approach for three color plan
Heather - we need more context
Rigo - we can go to the main grroup about Rob's appraoch
We think this as an approach that coudl be supported
the trouble is now defining how the links would be cosntructed - how they would be thrown away.
How is the risk assesment done?
Heather - i would go further
For now, we can work with the hash, throw away the salt technical mechanism,
we shoujld talk about thje policy
Rigo - I am against that because once you get a tech solution that everyone agrees to, policy people will bicker and wash away the tech side
Heather - in my mind, we are here to ediscuss the policy stuff
talk through the definitions
'if we are going to address lifetime browisng histroy, then what is the policy that shoudl be implemented
Becaues otherwise, there will be differences
Rob - it's about accountability -
<rigo> RvE: time is accountability
Shane - a lot of people see DeIDed and delinked to be the same
<rigo> SW: don' t like colors, need new names
depends on your definitoion
<rigo> => discussion on names for the three buckets we have
Rob - there is a misconception in the US about what is anonymous and pseudonymous
<rigo> RvE: we should not talk about anonymous/pseudonymous
<rigo> stage 1 /2/ 3/
<rigo> JW: raw, transition, de-identified
<rigo> red == raw data
<rigo> orange == obfuscated
<rigo> green == de-identified (that includes de-identified event data and completely aggregated data)
<rigo> orange still change
<rigo> line between orange and green is whether it is still considered personal data anymore. If re-identification is too hard, it goes green
<rigo> red: all of data of red exists also in orange, replace all identifiers with a lookup table. In rotating hash with SALT as orange. You still have the knowledge of SALT and key. Can link in your domain
<hwest> Ok, so raw event data -> managed data -> deidentified data
<rigo> orange == managed data
<rigo> managed data: all characteristics of raw plus some new
<rigo> JW: we could have permitted uses depending on what kind of data is the object. So Security should remain raw. Once it is managed, change requirements
<hwest> Is anyone on the phone?
<rigo> RvE: start with raw. managed, de-identified and then talk about permitted uses
<hwest> Rigo: Several axes - one is time and one is restrictions and one is sensitivity
<hwest> Rigo: The more sensitive it is, the more restrictions we apply.
<rigo> SW: managed means striping
<rigo> RW: what stripping is not clear yet
<rigo> HW: some data will remain high level (gives scoring example)
<rigo> RvE: this is augmentation of identity data
<rigo> RW: We have a rather clear understanding of raw and de-identified, but we have a range of possibilities and hash them out
<rigo> JW: orange data could be stripped to have it less rich than the raw data you got.
This is scribe.perl Revision: 1.137 of Date: 2012/09/20 20:19:01 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) No ScribeNick specified. Guessing ScribeNick: RichardatcomScore_ Inferring Scribes: RichardatcomScore_ WARNING: No "Topic:" lines found. Default Present: Thomas Present: Thomas WARNING: Fewer than 3 people found for Present list! WARNING: No meeting title found! You should specify the meeting title like this: <dbooth> Meeting: Weekly Baking Club Meeting WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth Got date from IRC log name: 11 Feb 2013 Guessing minutes URL: http://www.w3.org/2013/02/11-dnte-minutes.html People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report[End of scribe.perl diagnostic output]