<scribe> scribe: Karen
Wendy: Let's start with our
agenda curation
... and introductions
<wseltzer> https://github.com/AdRoll/privacy/blob/main/SPURFOWL.md
Wendy: We have Mikko and team
from NextRoll offering a discussion of SPURFOWL
... thank you, Mikko
... and James asking about Verizon Media CONNECT ID
... I wonder whether we have a little more clarification on
what the question is there
... to see if we have someone available to share a bit of
information
Wendell: yes, I can speak to that
Wendy: thank you
... any other business that people want to discuss or add for a
future agenda?
Kris: I just emailed
... make sure folks aware that the private click discussion
will be happening in the Privacy CG
Wendy: thanks, Kris. Great reminder that there are discussions on-going in the Privacy Community Group
<kris_chapman> Here's the github issue: https://github.com/privacycg/private-click-measurement/issues/57
Wendy: Kris invites us to take
particular note of the private click measurement reporting
dicussions
... they meet every second and fourth Thursday at noon
Eastern
... do we have any intros from new participants?
Aleessandro Pireno: interested to be here
s/Alessandro
Wendy: welcome, Alessandro
... Let us move to our SPURFOWL agenda
... what would you like to tell us?
<wseltzer> https://github.com/AdRoll/privacy/blob/main/SPURFOWL.md
Mikko: thank you; can you hear
me?
... the link is on irc
... I have some slides
... but I know not everyone is able to see
... [reads acronym]...took a lot of effort
... what is the idea...to be able to do
<wseltzer> https://github.com/AdRoll/privacy/blob/main/SPURFOWL.md#motivation
Mikko: what I would call
complicated reporting when data stays inside the browser
... multi-touch attribution models are one example
... another is a bounce rate you can compute with first party
traffic
... but for organic and non-organic traffic, you need to know
which users saw the impression
<gendler> present
Mikko: to slice
... environment for privacy first
[next slide]
scribe: secure aggregation
framework
... whether we aggregate...by individual
... multiparty aggregation service
... browser computes metrix
s/metrics
scribe: aggregates in such a way
to keep privacy
... based on idea that there is a secure application
framework
... now to the actual proposal
... two things
... first is what we call the "trail store"
<wseltzer> https://github.com/AdRoll/privacy/blob/main/SPURFOWL.md#spurfowl-summary
scribe: all the impression
events, first party events; JS objects and is write-only except
for the JS functions
... they compute the report
... combine these two mechanisms
... click and report in trail store
... go to site and see browsing different locations and put
those into the trail store
... events both from the impression and the first-party
data
... the trail store is write only
<dialtone> I think he muted himself :)
scribe: could code some of
data
... but see what private events happen
... see what JS functions
... that is really the meat
... I will walk through an example of what is going on
... let's say I am shoes.com site
... first thing that happens, new user appears on my site
... when they visit a new page, I have a new piece of JS on the
page
... I can attach whatever metadata I want
<weiler> are these slides available? where?
scribe: like I just logged into
shoes.com, I can store in the trail store
... user has done that
... next steps user navigates away; they did not buy
anything
... some mechanism, they get served some ads, retargeting
... maybe with one of the proposals we have discussed, like
Turtledove
... we want to be able to insert trail store at the impression
time
... and add whatever metadata we have
... what is the bidding price when we pushed this event
... or other metadata
... and store that in the trail store
... second part, finally
... after some time, we have populated some first-party events
in the trail store
... then we fetch the reporting code to compute the
reporting
... sandbox functions...cannot do network host, or do
persistent changes in browser
... we get the read from the trail store, not just the
write
... so we can compute what we want, like attribution
model
... we want this function with whatever metric we want and
return from the reporting function
... final part, report is sent out
... look at some other proposals, this is standard in the @
aggregation proposal
... send report to some address
... this framework computes
... shoes.com never learns what the sandbox functions are
computing
... make sure you cannot compute something that reveals info
about a particular user
... that is one example
... more technical details and code examples on how this would
actually work technically
... in practice it would likely contact a DSP
... first push
... figure out when that would be honoroed
s/honored
scribe: same origina policy
... to do that for same impressions and clicks
... shoes.com...establishes which DSP
... and some security issues
... if users hack the browser....
... mechanism to have a proof
... metric about page view
... you can say that my page views are between 0-100
... but not telling you exactly what that is
... point is there is a way to mitigate this
... cannot see what they are doing
... if someone is trying to do something bad, you can look at
the code
... we have a public Github repository and there is already an
issue
... on how we establish trust between publisher and
aggregator
... Main ideas, high level
... is browser should have concept of a trail store...with
first party data
... and we allow arbitrary functions over that data
... those are the main ideas
... I wanted to say
... I will now take questions
Wendy: Thank you very much
... I saw a question if we might have the slides available to
look at afterwards
... I can post or share a link
Charelieharrison: thanks a lot
for presenting this
... I like this proposal, it is interesting
<charlieharrison> https://github.com/WICG/conversion-measurement-api/blob/master/AGGREGATE.md#multi-touch-attribution-mta
Charelieharrison: in our
aggregate API explainer, we handwaived something similar
... a sandbox to isolate a JS environment for multitouch
attributes
... for rules based, and use own logic
... we handwaived something similar
... you extended it and that's great
... two questions
... you mentioned organic as well as ad clicks
... want to understand how you plan on calling trail store push
in context of when there is only an organic link
... may not be embedded JS with organic clinks
... how do you plan to handle organic clicks
Mikko: I will post slides on
Github
... Charlie, I have not seen the MTA
... I think there was a strawman proposals
... this is kind of an extension to that
... for organic clicks
... first example, may not have been clear
... we may not know
... what is pushed into the trail store
... absence of report
... may have been a bad example, this proposal would not deal
with that
Charlieharrison: you mentioned
like attribution
... as one thing you could be computing
... any other examples of aggregate reports you want to
generate
... takes as input all the trails the user went through
Mikko: like a histogram of
impressions
... how to say
... first party data
<angelina_tl> i would ask for assisted and unassisted conversion reporting
Mikko: and how people behave; want to slice; see if behavior is different
<angelina_tl> and cross site duplication reporting
Mikko: we have a frequency
metric
... cannot access it now
... I have another proposal
... this proposal is not entirely tied to adv
... maybe an event, a teleforum
... see what events people are viewing
... want to be nice and do privacy
... do this telemetry
... doing in such a way as not to show private info
... with this reporting you can push events and get results
back later
... advertising is like a special case
charlieharrison: you are imaging
it as generic reporting
... cross-site data; only way to extricate is through aggregate
report
Mikko: yes
... idea is to install this
... details can be argued about
Valentino: If I may answer
more
... we have some web one proposals coming for ML
... all these proposals rely on this concept of a trail
store
... interesting to use it for a number of problems for
reporting and machine learning
... and other things we have not thought about
Charlieharrison: thank you
Ben: I like this idea, what we
have called a "path to conversion"; trail store makes
sense
... I like way in which we attribute conversions, being
externalized outside browser
... when browser determines choice for all companies, it
centralizes a lot of power in their hands
... doing your own attribution methodology is good; it
decentralizes it
... question...stuck on running arbitrary functions
... do they have to be pre-computed and specified before this
is running
... or run multiple queries that were not envisioned after the
fact
... how do you meet epsilon privacy requirement; data base you
can query multiple times?
Mikko: if you have link
please?
... not 100 percent confident I understood your question
... asking about third party functions after the fact, what
code
Ben: how to support arbitrary
functions; like here is the function on the ad tag
... browser knows and is computing
... and that is sent out
... or raw data exists in some service
Mikko: raw data stays in the
trail store
... first argument is the reporting origin
... where to send JS code
... browser @ the code
... whatever that code is at the time is what it is; not true
fetch
... whoever is running that end point can chase the code
... not so much @
... does that answer make sense to you?
Ben: I think you are saying the
browser maintains the trail store; some time out
... before computation sends out; it downloads computation at
that time...
[missed]
Mikko: I did not talk about this
in the proposal
... periodically reporting code
... part where we download that code at different time
... and download that file
... some details about that
<weiler> Ben: data can only be used by the function that asks it to be collected? [as least, that's how I heard you, Ben]
<btsavage> Similar proposal: https://github.com/w3c/web-advertising/blob/master/privacy_preserving_multi_touch_attribution_and_cross_publisher_lift_measurement.md
Mikko: as long as you push event
to trail store, browser would compute
... no longer fetch
... that's it
... there was a third question, but I forgot
Ben: it doesn't apply
... it looks like you run computation one time in browser and
lose the raw inputs
... cannot put out computation, get result and debug
... that is impossible because raw data is lost, you cannot go
back in time
Mikko: whenever you run function
again
... every time you see the trail of events
... some kinds of cap on how much browser keeps
... we don't remove the data in the trail store when we do the
reporting
... needed to compute the reports
... some attribution models
... keep data in the trail store even when you do multiple
times
Ben: when you run....[a report,
it doesn't delete the trail store?]
... you can potentially run multiple computations and pull out
multiple reports
Mikko: yes, I believe so
... may be an issue there
Ben: If you have some data
base
... if you are running one query against it
... to achieve a differential privacy epsilon of one
number....
... to have two queries...you need more noise
... can combine info together
... usability
Mikko: if we can do this offline on Github
+1
Ben: would you be willing to do
that?
... yes
Mikko: so, open an issue and put
your thoughts there
... I would appreciate that
Wendy: thank you
... I closed the queue because we have a long list of
people
... let's see if we can get short exchanges
... and we can bring back on a follow up call
... and Github for deeper explorations
<weiler> Ben++ for the great question
Brian: I have been thinking about
something along these lines
... for getting info from browser to second and third
parties
... wonder if it makes sense to have an upfront
registraion
... collects info...user involved in negotiation, determine if
data collection is allowed
... only data allowed based on a strict schema
... and data exfiltrated by browser
... and designated to be prvacy preserving, done on
schedule
... browser collects data for attribution, provides that data
that cleans it of user ID
... don't need to know user
... and provide that to server that provides to a third
party
... so browser gets more involved, and has more well defined
semantics
... browser determines trail and how it interacts with outside
world
Mikko: that was one of ideas we
were thinking, instead of arbitrary JS
... a pre-defined schema
... a little more flexible
... maybe we want to do something more exotic than this
Brian: having people register
upfront
... if my domain is interested, I can signal to have it
captured and not interact with browser from that point forward,
and user can say not to collect data on me
Mikko: I understand
Kleber: thank you for the proposal
<wseltzer> s/having people register up-front is more privacy-preserving
Kleber: I agree, I like it; it
seems in line with our privacy sandbox approach on how to
handle data
... I was definitely concerned by some of same questions that
Ben brought upt
... the ability to report out from same data store multiple
times
... agree on what the threat model is
... if ability to write to data store is arbitrary and not
strict schema associated with it
... we have to assume the trail store has my first party
cookie
... that info could be available in the trail store
... only by limiting the trail store
... so whatever restrictions we put on processing and
reporting
... have to be strict enough
... to protect the crown jewels
Mikko: yes, the trail store can
contain sensitive data
... aggregation will preserve the privacy
... and @ from the aggregation site
Kleber: we are in agreement
... only question I have
... can there be a time limit on the trail store
... you mentioned limit on number of events
... wonder about time limit, like clear after 30 days
Mikko: yes, there is a list of
limits to adhere to
... time limit
... and event limit
... yes, there should be a time limit on the events as
well
... browser should be picky on how much time
... how much memory they can take
... encourages whoever is writing to make efficient code
... from privacy perspective makes sense to limit time
Kleber: thank you
brodriguez: Mikko, thank
you
... concepts of arbitrary JS
... and @
... is similar to TERN, PARRROT, SPARROW TD extension
concepts
... explore how to do against multiple use cases
Mikko: yes, I agree
... there is some tie-in
... idea of keeping events in history
... anything about bidding, we should share that mechanism
unless no good reason not to
Sam: I think that Ben's
question
... is going to be a key part of the proposal
... look forward to clarifying that
... wonder if we can more strictly tie the trail store to the
reporting
[missed]
scribe: would that work?
<btsavage> I filed an issue here: https://github.com/AdRoll/privacy/issues/5
Mikko: that might work
... might have a large number of trail stores, but could be
inefficient
... my first thoughts
Eriktaubeneck: follow up on one
thing that Ben and Michael chatted about
... if we are allowing for arbitrary JS to run
... I think we need strongest amount of privacy Michael talking
about
... we would need event level differential privacy for each
event pushed into the trail store
... aggregate functions and pre-defined schema
... we could use global differential privacy
... noise to be added is only once
... instead of once for every input
... more utility for same amount of privacy
... consider an extention
... subset of well defined schemas and functions
... along with flexibility of JS to use differential
privacy
... stick to schema and well known functions
... you can stick use differential privacy
... predefined some of this
... in the global differential privacy model
Mikko: that sounds useful to
me
... and get less noise in the results
... that sounds useful; thank you for writing it on Github
Angelina: I am not the most tech
savvy for coding
... coming from buy side
... I have some questions
... as you develop proposals and how things function
... from media planner and buyer perspective
... how long that info is stored
... sometimes we do pull queries throughout the year
... we would look at different reports with diff outcomes
... like overall reach and frequency over time
... get a different number if pulling on weekly, quarterly or
annual basis
... paths to conversions
... at the aggregate level
<joshua_koran> @Angelina your comment re buyer perspective also raises how fraud detection (malicious script masquerading as browser) can be detected if only partial data is sent
Angelina: I understand there is a
concern if too many queries being recorded there is possible
way to get more insights on an individual
... but if we can keep more anonymous that would be great
... I consistently pull month over month reports.... cross-site
duplication reports, assisted and unassisted,
path-to-conversion...path to conversion
... if there is way to support
... for these queries to be done in the browser, I don't
care
... as long as I can get these insights
Mikko: this is technical details
for how long to store in browser
... question you are asking is secure aggregation service
... not familiar with how to query results
charlie: I can jump in
... similar to question of how many times we use data in trail
store to generate a report
... and include in query multiple times
... and query in a weekly rollup
... and monthly and quarterly rollup
... all this means is
... this gets back to the similar set of mitigations we would
need to implement to follow along
... with Ben's comment
... add more noise as you generate more
... as you query more data
... natural thing to do is add limits on number of times you
query
... and scale nose on that maximum queries cap you set
... we go into a little bit of detail in the aggregation
service explainer on record level and user level noise
... add these limits for how often you do these
... or privacy degrades to no privacy as you query an inifinite
number of times
... simplest thing to understand is every report can contribute
to five queries, and scale noise to the hard limit that we
set
Wendy: thank you, Mikko for this
proposal
... you already have open issues on Github
... invite you to bring back for more discussion for future
agenda
... leave a little time
... for James' question to Wendell
<wseltzer> zakim take up agenum 3
<wseltzer> zakim take up agendum 3
Wendy: on the Verizon Media ID
Wendell: I can give a general overview, unless James, you had a specific question?
Valentino: I don't think James is on the call
Wendell: in roughly beginning of
Dec., Verizon announced product in market now
... licensed product that requires business terms
... not intended to be part of the discussions here in web
platform
... there is a press release
... gives you features, functions, intents and a contact if you
wish to avail yourself of this service in your operations
<wseltzer> -> Press release, https://www.verizonmedia.com/en-au/press/2020/12/02/Verizon-Media-launches-ConnectID-solution
Wendell: Verizon is still
committed to the tech that the web tech players are
developing
... continue to attend here
... and build private web solutions based on what [names
browsers] are building
Wendy: Thanks, Wendell
Ben: glancing at it now
... it does not say how this connect ID computed
... says it's consent based; assume it's #PII
Wendell: ask you to contact the
business team for the specific terms
... and tech being used
... It's not part of the web
... that is the Verizon Media view
Wendy: thanks for that
differentiation
... our project here is to build technologies for open web
interoperability
... good to hear about what is going on in other places
... and hear what feedback that gives us for open web
proposals
... if no other questions for Wendell, I'm happy to go back to
the NextRoll presentation
... @ do you still have a question?
s/Joao
<Matt_Z> joao had to jump
<Matt_Z> i can ask his question
scribe: any other questions?
Kris: more of a comment
... I like the idea
... and interested to see where it goes
... a lot of this we tend to think of being advertising all
within same browser
<Matt_Z> i was double muted! figured it out
Kris: but there are a lot of
situations where you are crossing lines
... things like email marketing
... where not nec able to execute
... a JS in the original environment but only do when clicked
through to end environment
... we should explore further
... when only have one page view
... when things crossing in-app to web browser
... that is a general challenge we have talked about but not
figured out proposals
Matt_Z: here I am
<btsavage> One could sync the "trail store" between the various browsers belonging to the same person (a la this proposal: https://github.com/w3c/web-advertising/blob/master/cross-browser-anonymous-conversion-reporting.md)
Matt_Z: try to capture spirit of
Joao's question
... when Neustar presented PELICAN
... SPURFOWL paradigns are compatible
... one of big questions I think we have a path to
answering
... is how to make these artibrary functions run in a privacy
preserving way
... when ad impression or ad clicks or conversion happens on
advertiser site
... can pass...let this list of three vendors get this
info
... and browser sends encrypted reports to ad tech
vendors
... and vendors uses helper servers to run arbitrary code
... server doesn't let out data that is not aggregated
... as long as metadata associated with trail store
events
... [,missed]
... one way to do this
... to define roles more
... instead of running in code
Mikko: I missed question
... trail store
... I refer to motivation for why to do things
... there is a link to the PELICAN proposal
M: we will publish information and make comments
<angelina_tl> will need to allow for multiple reporting companies - marketers can have multiple agencies using different accounts, across different platforms.
Wendy: great
... we are at time
... sounds as though we have lots of interest and
discussion
... scribe asks to fill in gaps where concepts got missed
<angelina_tl> BTW - just wanna say - Karen is great Scriber
Wendy: you can do that in irc and when minutes are posted afterwards
<kleber> Apologies, I won't be able to join next week
Wendy: thanks for all the energy; see you next week
<wseltzer> [adjourned]
This is scribe.perl Revision of Date Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) WARNING: Bad s/// command: s/Alessandro Succeeded: s/alternate Thursdays/every second and fourth Thursday/ Succeeded: s/segure/secure/ WARNING: Bad s/// command: s/metrics Succeeded: s/wseltzer: can we mute wendell?// Succeeded: s/normally point/the trail store/ WARNING: Bad s/// command: s/honored Succeeded: s/@/MTA/ Succeeded: s/cross-site report/aggregate report/ Succeeded: s/@/Valentino/ Succeeded: s/missed/a report, it doesn't delete the trail store?/ Succeeded: s/get negotiation/user involved in negotiation/ WARNING: Bad s/// command: s/having people register upfront Succeeded: s/having people register upfront/having people register up-front is more privacy-preserving/ Succeeded: s/TD/TERN, PARRROT, SPARROW TD extension/ Succeeded: s/level privacy/level differential privacy/ Succeeded: s/[missed]/ cross-site duplication reports, assisted and unassisted, path-to-conversion/ Succeeded: s/Ben/charlie/ WARNING: Bad s/// command: s/Joao Present: kleber Karen wbaker Mikjuo dialtone ionel kris_chapman bmay cpn Mike_Pisula_Xaxis tomkershaw arnoldrw wseltzer dkwestbr eriktaubeneck imeyers AramZS pl_mrcy zerth mlerra GarrettJohnson blassey jrobert mjv hober apascoe weiler pedro_alvarado Found Scribe: Karen Inferring ScribeNick: Karen Agenda: https://lists.w3.org/Archives/Public/public-web-adv/2020Dec/0006.html WARNING: No date found! Assuming today. (Hint: Specify the W3C IRC log URL, and the date will be determined from that.) Or specify the date like this: <dbooth> Date: 12 Sep 2002 People with action items: WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]