Improving Web Advertising BG -- 08 Dec 2020

<scribe> scribe: Karen

Wendy: Let's start with our agenda curation
... and introductions

Agenda-curation, introductions

<wseltzer> https://github.com/AdRoll/privacy/blob/main/SPURFOWL.md

Wendy: We have Mikko and team from NextRoll offering a discussion of SPURFOWL
... thank you, Mikko
... and James asking about Verizon Media CONNECT ID
... I wonder whether we have a little more clarification on what the question is there
... to see if we have someone available to share a bit of information

Wendell: yes, I can speak to that

Wendy: thank you
... any other business that people want to discuss or add for a future agenda?

Kris: I just emailed
... make sure folks aware that the private click discussion will be happening in the Privacy CG

Wendy: thanks, Kris. Great reminder that there are discussions on-going in the Privacy Community Group

<kris_chapman> Here's the github issue: https://github.com/privacycg/private-click-measurement/issues/57

Wendy: Kris invites us to take particular note of the private click measurement reporting dicussions
... they meet every second and fourth Thursday at noon Eastern
... do we have any intros from new participants?

Aleessandro Pireno: interested to be here

s/Alessandro

SPURFOWL

Wendy: welcome, Alessandro
... Let us move to our SPURFOWL agenda
... what would you like to tell us?

<wseltzer> https://github.com/AdRoll/privacy/blob/main/SPURFOWL.md

Mikko: thank you; can you hear me?
... the link is on irc
... I have some slides
... but I know not everyone is able to see
... [reads acronym]...took a lot of effort
... what is the idea...to be able to do

<wseltzer> https://github.com/AdRoll/privacy/blob/main/SPURFOWL.md#motivation

Mikko: what I would call complicated reporting when data stays inside the browser
... multi-touch attribution models are one example
... another is a bounce rate you can compute with first party traffic
... but for organic and non-organic traffic, you need to know which users saw the impression

<gendler> present

Mikko: to slice
... environment for privacy first

[next slide]

scribe: secure aggregation framework
... whether we aggregate...by individual
... multiparty aggregation service
... browser computes metrix

s/metrics

scribe: aggregates in such a way to keep privacy
... based on idea that there is a secure application framework
... now to the actual proposal
... two things
... first is what we call the "trail store"

<wseltzer> https://github.com/AdRoll/privacy/blob/main/SPURFOWL.md#spurfowl-summary

scribe: all the impression events, first party events; JS objects and is write-only except for the JS functions
... they compute the report
... combine these two mechanisms
... click and report in trail store
... go to site and see browsing different locations and put those into the trail store
... events both from the impression and the first-party data
... the trail store is write only

<dialtone> I think he muted himself :)

scribe: could code some of data
... but see what private events happen
... see what JS functions
... that is really the meat
... I will walk through an example of what is going on
... let's say I am shoes.com site
... first thing that happens, new user appears on my site
... when they visit a new page, I have a new piece of JS on the page
... I can attach whatever metadata I want

<weiler> are these slides available? where?

scribe: like I just logged into shoes.com, I can store in the trail store
... user has done that
... next steps user navigates away; they did not buy anything
... some mechanism, they get served some ads, retargeting
... maybe with one of the proposals we have discussed, like Turtledove
... we want to be able to insert trail store at the impression time
... and add whatever metadata we have
... what is the bidding price when we pushed this event
... or other metadata
... and store that in the trail store
... second part, finally
... after some time, we have populated some first-party events in the trail store
... then we fetch the reporting code to compute the reporting
... sandbox functions...cannot do network host, or do persistent changes in browser
... we get the read from the trail store, not just the write
... so we can compute what we want, like attribution model
... we want this function with whatever metric we want and return from the reporting function
... final part, report is sent out
... look at some other proposals, this is standard in the @ aggregation proposal
... send report to some address
... this framework computes
... shoes.com never learns what the sandbox functions are computing
... make sure you cannot compute something that reveals info about a particular user
... that is one example
... more technical details and code examples on how this would actually work technically
... in practice it would likely contact a DSP
... first push
... figure out when that would be honoroed

s/honored

scribe: same origina policy
... to do that for same impressions and clicks
... shoes.com...establishes which DSP
... and some security issues
... if users hack the browser....
... mechanism to have a proof
... metric about page view
... you can say that my page views are between 0-100
... but not telling you exactly what that is
... point is there is a way to mitigate this
... cannot see what they are doing
... if someone is trying to do something bad, you can look at the code
... we have a public Github repository and there is already an issue
... on how we establish trust between publisher and aggregator
... Main ideas, high level
... is browser should have concept of a trail store...with first party data
... and we allow arbitrary functions over that data
... those are the main ideas
... I wanted to say
... I will now take questions

Wendy: Thank you very much
... I saw a question if we might have the slides available to look at afterwards
... I can post or share a link

Charelieharrison: thanks a lot for presenting this
... I like this proposal, it is interesting

<charlieharrison> https://github.com/WICG/conversion-measurement-api/blob/master/AGGREGATE.md#multi-touch-attribution-mta

Charelieharrison: in our aggregate API explainer, we handwaived something similar
... a sandbox to isolate a JS environment for multitouch attributes
... for rules based, and use own logic
... we handwaived something similar
... you extended it and that's great
... two questions
... you mentioned organic as well as ad clicks
... want to understand how you plan on calling trail store push in context of when there is only an organic link
... may not be embedded JS with organic clinks
... how do you plan to handle organic clicks

Mikko: I will post slides on Github
... Charlie, I have not seen the MTA
... I think there was a strawman proposals
... this is kind of an extension to that
... for organic clicks
... first example, may not have been clear
... we may not know
... what is pushed into the trail store
... absence of report
... may have been a bad example, this proposal would not deal with that

Charlieharrison: you mentioned like attribution
... as one thing you could be computing
... any other examples of aggregate reports you want to generate
... takes as input all the trails the user went through

Mikko: like a histogram of impressions
... how to say
... first party data

<angelina_tl> i would ask for assisted and unassisted conversion reporting

Mikko: and how people behave; want to slice; see if behavior is different

<angelina_tl> and cross site duplication reporting

Mikko: we have a frequency metric
... cannot access it now
... I have another proposal
... this proposal is not entirely tied to adv
... maybe an event, a teleforum
... see what events people are viewing
... want to be nice and do privacy
... do this telemetry
... doing in such a way as not to show private info
... with this reporting you can push events and get results back later
... advertising is like a special case

charlieharrison: you are imaging it as generic reporting
... cross-site data; only way to extricate is through aggregate report

Mikko: yes
... idea is to install this
... details can be argued about

Valentino: If I may answer more
... we have some web one proposals coming for ML
... all these proposals rely on this concept of a trail store
... interesting to use it for a number of problems for reporting and machine learning
... and other things we have not thought about

Charlieharrison: thank you

Ben: I like this idea, what we have called a "path to conversion"; trail store makes sense
... I like way in which we attribute conversions, being externalized outside browser
... when browser determines choice for all companies, it centralizes a lot of power in their hands
... doing your own attribution methodology is good; it decentralizes it
... question...stuck on running arbitrary functions
... do they have to be pre-computed and specified before this is running
... or run multiple queries that were not envisioned after the fact
... how do you meet epsilon privacy requirement; data base you can query multiple times?

Mikko: if you have link please?
... not 100 percent confident I understood your question
... asking about third party functions after the fact, what code

Ben: how to support arbitrary functions; like here is the function on the ad tag
... browser knows and is computing
... and that is sent out
... or raw data exists in some service

Mikko: raw data stays in the trail store
... first argument is the reporting origin
... where to send JS code
... browser @ the code
... whatever that code is at the time is what it is; not true fetch
... whoever is running that end point can chase the code
... not so much @
... does that answer make sense to you?

Ben: I think you are saying the browser maintains the trail store; some time out
... before computation sends out; it downloads computation at that time...

[missed]

Mikko: I did not talk about this in the proposal
... periodically reporting code
... part where we download that code at different time
... and download that file
... some details about that

<weiler> Ben: data can only be used by the function that asks it to be collected? [as least, that's how I heard you, Ben]

Mikko: as long as you push event to trail store, browser would compute
... no longer fetch
... that's it
... there was a third question, but I forgot

Ben: it doesn't apply
... it looks like you run computation one time in browser and lose the raw inputs
... cannot put out computation, get result and debug
... that is impossible because raw data is lost, you cannot go back in time

Mikko: whenever you run function again
... every time you see the trail of events
... some kinds of cap on how much browser keeps
... we don't remove the data in the trail store when we do the reporting
... needed to compute the reports
... some attribution models
... keep data in the trail store even when you do multiple times

Ben: when you run....[a report, it doesn't delete the trail store?]
... you can potentially run multiple computations and pull out multiple reports

Mikko: yes, I believe so
... may be an issue there

Ben: If you have some data base
... if you are running one query against it
... to achieve a differential privacy epsilon of one number....
... to have two queries...you need more noise
... can combine info together
... usability

Mikko: if we can do this offline on Github

Ben: would you be willing to do that?
... yes

Mikko: so, open an issue and put your thoughts there
... I would appreciate that

Wendy: thank you
... I closed the queue because we have a long list of people
... let's see if we can get short exchanges
... and we can bring back on a follow up call
... and Github for deeper explorations

<weiler> Ben++ for the great question

Brian: I have been thinking about something along these lines
... for getting info from browser to second and third parties
... wonder if it makes sense to have an upfront registraion
... collects info...user involved in negotiation, determine if data collection is allowed
... only data allowed based on a strict schema
... and data exfiltrated by browser
... and designated to be prvacy preserving, done on schedule
... browser collects data for attribution, provides that data that cleans it of user ID
... don't need to know user
... and provide that to server that provides to a third party
... so browser gets more involved, and has more well defined semantics
... browser determines trail and how it interacts with outside world

Mikko: that was one of ideas we were thinking, instead of arbitrary JS
... a pre-defined schema
... a little more flexible
... maybe we want to do something more exotic than this

Brian: having people register upfront
... if my domain is interested, I can signal to have it captured and not interact with browser from that point forward, and user can say not to collect data on me

Mikko: I understand

Kleber: thank you for the proposal

<wseltzer> s/having people register up-front is more privacy-preserving

Kleber: I agree, I like it; it seems in line with our privacy sandbox approach on how to handle data
... I was definitely concerned by some of same questions that Ben brought upt
... the ability to report out from same data store multiple times
... agree on what the threat model is
... if ability to write to data store is arbitrary and not strict schema associated with it
... we have to assume the trail store has my first party cookie
... that info could be available in the trail store
... only by limiting the trail store
... so whatever restrictions we put on processing and reporting
... have to be strict enough
... to protect the crown jewels

Mikko: yes, the trail store can contain sensitive data
... aggregation will preserve the privacy
... and @ from the aggregation site

Kleber: we are in agreement
... only question I have
... can there be a time limit on the trail store
... you mentioned limit on number of events
... wonder about time limit, like clear after 30 days

Mikko: yes, there is a list of limits to adhere to
... time limit
... and event limit
... yes, there should be a time limit on the events as well
... browser should be picky on how much time
... how much memory they can take
... encourages whoever is writing to make efficient code
... from privacy perspective makes sense to limit time

Kleber: thank you

brodriguez: Mikko, thank you
... concepts of arbitrary JS
... and @
... is similar to TERN, PARRROT, SPARROW TD extension concepts
... explore how to do against multiple use cases

Mikko: yes, I agree
... there is some tie-in
... idea of keeping events in history
... anything about bidding, we should share that mechanism unless no good reason not to

Sam: I think that Ben's question
... is going to be a key part of the proposal
... look forward to clarifying that
... wonder if we can more strictly tie the trail store to the reporting

[missed]

scribe: would that work?

<btsavage> I filed an issue here: https://github.com/AdRoll/privacy/issues/5

Mikko: that might work
... might have a large number of trail stores, but could be inefficient
... my first thoughts

Eriktaubeneck: follow up on one thing that Ben and Michael chatted about
... if we are allowing for arbitrary JS to run
... I think we need strongest amount of privacy Michael talking about
... we would need event level differential privacy for each event pushed into the trail store
... aggregate functions and pre-defined schema
... we could use global differential privacy
... noise to be added is only once
... instead of once for every input
... more utility for same amount of privacy
... consider an extention
... subset of well defined schemas and functions
... along with flexibility of JS to use differential privacy
... stick to schema and well known functions
... you can stick use differential privacy
... predefined some of this
... in the global differential privacy model

Mikko: that sounds useful to me
... and get less noise in the results
... that sounds useful; thank you for writing it on Github

Angelina: I am not the most tech savvy for coding
... coming from buy side
... I have some questions
... as you develop proposals and how things function
... from media planner and buyer perspective
... how long that info is stored
... sometimes we do pull queries throughout the year
... we would look at different reports with diff outcomes
... like overall reach and frequency over time
... get a different number if pulling on weekly, quarterly or annual basis
... paths to conversions
... at the aggregate level

<joshua_koran> @Angelina your comment re buyer perspective also raises how fraud detection (malicious script masquerading as browser) can be detected if only partial data is sent

Angelina: I understand there is a concern if too many queries being recorded there is possible way to get more insights on an individual
... but if we can keep more anonymous that would be great
... I consistently pull month over month reports.... cross-site duplication reports, assisted and unassisted, path-to-conversion...path to conversion
... if there is way to support
... for these queries to be done in the browser, I don't care
... as long as I can get these insights

Mikko: this is technical details for how long to store in browser
... question you are asking is secure aggregation service
... not familiar with how to query results

charlie: I can jump in
... similar to question of how many times we use data in trail store to generate a report
... and include in query multiple times
... and query in a weekly rollup
... and monthly and quarterly rollup
... all this means is
... this gets back to the similar set of mitigations we would need to implement to follow along
... with Ben's comment
... add more noise as you generate more
... as you query more data
... natural thing to do is add limits on number of times you query
... and scale nose on that maximum queries cap you set
... we go into a little bit of detail in the aggregation service explainer on record level and user level noise
... add these limits for how often you do these
... or privacy degrades to no privacy as you query an inifinite number of times
... simplest thing to understand is every report can contribute to five queries, and scale noise to the hard limit that we set

Wendy: thank you, Mikko for this proposal
... you already have open issues on Github
... invite you to bring back for more discussion for future agenda
... leave a little time
... for James' question to Wendell

<wseltzer> zakim take up agenum 3

<wseltzer> zakim take up agendum 3

Wendy: on the Verizon Media ID

Verizon Media ConnectID

Wendell: I can give a general overview, unless James, you had a specific question?

Valentino: I don't think James is on the call

Wendell: in roughly beginning of Dec., Verizon announced product in market now
... licensed product that requires business terms
... not intended to be part of the discussions here in web platform
... there is a press release
... gives you features, functions, intents and a contact if you wish to avail yourself of this service in your operations

<wseltzer> -> Press release, https://www.verizonmedia.com/en-au/press/2020/12/02/Verizon-Media-launches-ConnectID-solution

Wendell: Verizon is still committed to the tech that the web tech players are developing
... continue to attend here
... and build private web solutions based on what [names browsers] are building

Wendy: Thanks, Wendell

Ben: glancing at it now
... it does not say how this connect ID computed
... says it's consent based; assume it's #PII

Wendell: ask you to contact the business team for the specific terms
... and tech being used
... It's not part of the web
... that is the Verizon Media view

Wendy: thanks for that differentiation
... our project here is to build technologies for open web interoperability
... good to hear about what is going on in other places
... and hear what feedback that gives us for open web proposals
... if no other questions for Wendell, I'm happy to go back to the NextRoll presentation
... @ do you still have a question?

s/Joao

<Matt_Z> joao had to jump

<Matt_Z> i can ask his question

scribe: any other questions?

Kris: more of a comment
... I like the idea
... and interested to see where it goes
... a lot of this we tend to think of being advertising all within same browser

<Matt_Z> i was double muted! figured it out

Kris: but there are a lot of situations where you are crossing lines
... things like email marketing
... where not nec able to execute
... a JS in the original environment but only do when clicked through to end environment
... we should explore further
... when only have one page view
... when things crossing in-app to web browser
... that is a general challenge we have talked about but not figured out proposals

Matt_Z: here I am

<btsavage> One could sync the "trail store" between the various browsers belonging to the same person (a la this proposal: https://github.com/w3c/web-advertising/blob/master/cross-browser-anonymous-conversion-reporting.md)

Matt_Z: try to capture spirit of Joao's question
... when Neustar presented PELICAN
... SPURFOWL paradigns are compatible
... one of big questions I think we have a path to answering
... is how to make these artibrary functions run in a privacy preserving way
... when ad impression or ad clicks or conversion happens on advertiser site
... can pass...let this list of three vendors get this info
... and browser sends encrypted reports to ad tech vendors
... and vendors uses helper servers to run arbitrary code
... server doesn't let out data that is not aggregated
... as long as metadata associated with trail store events
... [,missed]
... one way to do this
... to define roles more
... instead of running in code

Mikko: I missed question
... trail store
... I refer to motivation for why to do things
... there is a link to the PELICAN proposal

M: we will publish information and make comments

<angelina_tl> will need to allow for multiple reporting companies - marketers can have multiple agencies using different accounts, across different platforms.

Wendy: great
... we are at time
... sounds as though we have lots of interest and discussion
... scribe asks to fill in gaps where concepts got missed

<angelina_tl> BTW - just wanna say - Karen is great Scriber

Wendy: you can do that in irc and when minutes are posted afterwards

<kleber> Apologies, I won't be able to join next week

Wendy: thanks for all the energy; see you next week

<wseltzer> [adjourned]

- DRAFT -

Improving Web Advertising BG
08 Dec 2020

Attendees

Contents

Agenda-curation, introductions

SPURFOWL

Verizon Media ConnectID

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output

- DRAFT -

Improving Web Advertising BG 08 Dec 2020

Attendees

Contents

Agenda-curation, introductions

SPURFOWL

Verizon Media ConnectID

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output

Improving Web Advertising BG
08 Dec 2020