W3C

– DRAFT –
Improving Web Advertising BG

02 March 2021

Attendees

Present
apireno_groupm, AramZS, arnaud_blanchard, arnoldrw, blassey, BLeparmentier, bmay, Brendan_IAB_and_eyeo, brodriguez, dialtone, dinesh, dkwestbr_, dmarti, ErikAnderson, eriktaubeneck, GarrettJohnson, gendler, hober, imeyers, jrosewell, Jukka, Karen, Kelda, kleber, kris_chapman, lbasdevant, mjv, nics, nlesko, pedro_alvarado, pl_mrcy, wbaker, weiler, wseltzer
Regrets
-
Chair
Wendy Seltzer
Scribe
Karen, Karen Myers

Meeting minutes

<wseltzer> https://github.com/microsoft/privacy-preserving-ads/blob/main/Parakeet.md

<wseltzer> https://github.com/google/ads-privacy/blob/master/proposals/dovekey/dovekey_auction.md

<wseltzer> https://w3c.github.io/web-advertising/dashboard/

<wseltzer> https://github.com/microsoft/privacy-preserving-ads/blob/main/Parakeet.md

<wseltzer> https://github.com/google/ads-privacy/blob/master/proposals/dovekey/dovekey_auction.md

<wseltzer> https://w3c.github.io/web-advertising/dashboard/

<nics> present

Wendy: Welcome folks
… reminder that we use irc channel to queue
… so please join there as specified at the bottom of the agenda,
… and "present" yourself to let us know you are here
… and "q+" will add us to the speaking queue
… Our agenda for today

[Wendy reads through agenda items]

Agenda-curation, introductions

Wendy: Any new participants who would like to introduce themselves?

<jrosewell> Are we going to go through the outstanding questions from last week?

Airey Baringer: Hi, Airey Baringer from Triple Lift

Rob Beeler from @

Wendy: Welcome
… when we get to outstanding questions from last week, we will start with those
… and then come back to the questions

PARAKEET

Wendy: First substance agenda is to hear about the PARAKEET proposal
… Kelda emailed us with an introduction to that proposal
… and said some colleagues from the Microsoft team would present that
… Kelda, please kick us off

Kelda: Super excited to be here today

<wseltzer> https://github.com/microsoft/privacy-preserving-ads/blob/main/Parakeet.md

Kelda: to introduce PARAKEET
… Eric and Mehul will present

Mehul: Thank you everyone
… can you see my screen?

Wendy: Looks good

Mehul: need to go through objectives

Aswath_Mohan: We'll go first through the objectives and then go through the privacy and anonymization aspects
… so please wait until end and then we will do Q&A at the end

Aswath: Talk about main objectives
… We want to improve user privacy and the ability to individually identify users across sites
… we want to introduce ability to do measurable privacy parameter
… so you can think about what the trade-offs should be between monetization and privacy
… and maintain key monetization functions so web thrives for everybody
… and also see if we can limit the churn to existing models to the extent possible
… and Introduce an idea where privacy function is handled by browser
… but the bidding and optimization stays with the DSPs and SSPs
… Now I will hand off to Mehul and Erik

Mehul: High level, advertiser site flow is similar to Turtledove

<wseltzer> https://github.com/microsoft/privacy-preserving-ads/blob/main/Parakeet.md#api-flow-for-ad-serving

Mehul: a JS API, user features or IGs to browser storage

<bmay> I'm not seeing slides, are others?

Mehul: browser stores those user features or IGs and talk about how anonymized
… request initiatives through pre-defined API, similar to @ API
… that goes through proxy
… browser adds user features or IG vector and passes through this proxy
… that forwards to ad network, anonymizes
… that we will explain more in detail
… once proxy hands over to network
… you can see...and now adwork can leverage all the information
… retrieval, auction and pricing, and check if budget, then gives ad back to proxy
… and ad gets rendered into fenced frame on publisher side
… clicks go through that anonymizing proxy
… click is registered
… and updated ranking models based on click feedback
… We are still hashing out some of the inputs
… will work similar to degredated...
… will discuss more later
… important part is ad network has access to the anonymized context
… and completes functionality
… browser takes key role of managing privacy of user
… rest of functionality stays on the network side
… browser is thinking about what are key user features
… let me walk you through what we mean by transforming user features
… Key problem to solve
… each advertising adding features
… we think user is global binary vector of dimension
… talking about 100K features or higher
… this is global binary vector across domains
… when we send this global binary vector, we want to make sure there are privacy features
… a simplest approach
… if we cluster binary vectors into K clusters
… think of 7 or 8 billion
… number of users is much smaller

[Mehul walks through mathematics on slide]
… We can show this construct manages privacy request
… If we don't do any clustering and keep current construct as is, then epsilon is 23, very high
… if we just do anonymization of features
… or segregate
… if we do just 10 percent, E drops to 22.5
… but if p= 0.9, then E= 18.3
… alternatives to define, key area of research
… when system is up and running a lot of variations could be done
… further analysis
… if we do something more...could implement faster
… and leverage p
… pick near bi-cluster
… and call it metric DP
… or use n clustering, pick cluster from different clustering technique
… if we introduce privacy in the original advertiser network
… we are trying to make it present
… Talking through anonymization
… going back to original diagram
… S prime
… how to translate publisher context C
… how to transfer to C time
… three key signals: contextual signals
… IAB forums are making progress on taxonomies
… we will drop title in the ad request
… ID active IP
… so that is specific location
… pick up IP in time epoch
… for device signal, client has succinct UA string
… When we pass S prime, that network cannot memorize
… final construct
… if request fail to meet privacy parameter
… we won't add to s request
… time of the hour so low, and one user active, then s prime will not be passed
… or if signals highly required
… not lose anything from contextual signal
… we provide such a parameter
… Let's quickly talk about the key advantages and key challenges
… Advantages, people see c, s...in ad request to support retrieval...
… proxy to control finger printing
… some monetization trade-offs
… jointly work towards improving that
… sort of measurable privacy
… One of challenges, c prime, s prime together
… take great care of having this together in some sort of trusted server
… this requires great care
… why we are trying to put this anonymization step
… try to avoid attack
… continuously work on it
… we also have a more advanced proposal
… there is a bit of noise on prime, that could affect user
… ad could have seen
… we have mindful of that
… we feel there is a very clear tradeoff
… third is need trusted service for segment and ad request anonymization
… Let me quickly see relation to all the current proposals
… before we go to next step
… This forum is familiar with the proposals

[Explains Turtledove]
… led to FLEDGE
… where retargeted ad can be given
… browser in both cases issues private report
… APIs talking about different noise
… differentially private report
… PARAKEET is kind of similar, user IG feature
… propose user feature
… when ad requests are initiated, we put c prime in ad request itself
… and c' s'
… see auction and bid model in box
… in all three cases
… network can target
… where feedback is available
… looking side by side comparisons
… Next steps...Erik

<wseltzer> https://github.com/microsoft/privacy-preserving-ads/blob/main/Parakeet.md#alternative-solutions

<wseltzer> https://github.com/microsoft/privacy-preserving-ads/issues

Erik: Call out some next steps
… we would love to get feedback in GitHub
… sense of urgency
… if feedback later, that's fine, but if immediate, please file in next 2-3 weeks
… we are happy to have a dedicated session
… and we can figure out logistics
… and come back here or future meetings

Mehul: We can open up queue
… for some discussion

Wendy: thank you for that

Aram: thank you for this presentation
… helpful to parse out your proposal
… I had two questions that popped out to me
… first one
… noticed that it would not send the page URL or the page title as part of this process
… this strikes me as an immediate issue
… ad world is interested in block-listing and content adjacency
… signal...harder to tamper wtih
… say advertise against news, and show up on NYTimes.com but not on Breitbart.com
… how does this proposal handle that
… wtih removal of bid server from process, does it increase difficulty of direct sales
… advertisers who go direct
… but not necessarily through an SSP
… but through the publisher directly?

Mehul: first question first
… to answer question, we are using the contextual siganl
… publisher paradigms
… if publisher is NYTimes or WSJ
… we know that
… additional thing, as long as we don't see a risk to be unique enough in that time, we will let it pass
… if that page is so unique, and only one or a few users
… publisher will help to featurize the flow
… publisher works with SSP or own web service
… who creates those signals
… page placement
… high level
… publisher ID will pass through but ad signal will not
… risk of privacy on that
… to answer question
… if advertiser is worried about certain things, they can participate or pull out of auction
… done as pull URL
… Second question
… header bidding
… once hand over request to S end point
… have it part of auction, can see who is part of auction; SSP
… it can pull in the pre-sold

Aram: a lot of the direct logic happens on page
… would you be open to an issue diving into this on GitHub?

Mehul: yes, that would be great

Michael: First of all, thank you very much
… delighted to see this proposal and have it out there as part of the conversation
… great to have additional implementers
… and I appreciate your explaining how PARAKEET relates to TD and FLEDGE
… in terms of details of proposal
… I would really love for a proposal like this to actually meet the needs of the ad ecosystem
… I would love to land on something like this
… In Chrome, we explore some ideas similar to this
… I will point out the things that are hard about this approach
… which is why we ended up separating the two
… As you say, there is always a privacy vs. utility tension
… Aram describe well that tension
… something that comes up on utility side in ad industry
… as soon as you change contextual signals, there are the grand sequential issues
… called the "plane crash" signal
… if you don't know if "plane crash" will be propogated through proxy
… or if it doesn't come through ad network, it's hard to know if it's ok to run an ad run on a particular page
… this is why TD and FLEDGE lets it pass through
… it seems hard
… intentioned with is the privacy problem
… with this approach, with a server inbetween ad network
… fundamental privacy problem is the joinability
… if you assume the pub page and ad network are also talking directly to one another
… pub and ad network...s prime
… and two connects c' and s' going through proxy and one not going through proxy
… those things happening at same time, is there enough information for ad network to join those two things up
… if there is
… if possible, I would be happy if your proxy version could work
… but it's hard enough that we thought it not possible to put this in

Mehul: thank you, Michael

[please slow down Mehul]
… other than policy control
… c' is good enough
… constructs we are working on
… incrementally improving this further
… c working but not in plain text
… we are looking at this vigorously what can be done
… c' reduced in granularity to address joinability risk
… what functionality it loses with publisher sensitivity control
… that is what we are thinking

<wseltzer> ... that's why we have c'

Michael: thanks

Brendan: In eyeo context
… echo Michael's comments on your honoring the efforts of TD and FLEDGE
… this also strikes me as something similar to eyeo
… this type of discussion on SPECTACLE and joinability are current
… and would like to have the team at eyeo talk with your team at MS
… we spoke two weeks ago and look at the end points

Mehul: If you can add your interest, we can follow up offline
… thank you

ErikT: Thank you for sharing this
… echo earlier comments; great to see this area maturing and getting more attention
… One thing I am unclear on is the proxy step
… where u turns into u' and c turns into c'
… seems like there is some entity that has a 'god view' of all requests
… and can do this clustering in a global environment
… and figure out how much obfuscation has to be removed from context to keep it in the privacy constraints
… the FLoC proposal does that with algorithm in the browser
… but with publisher, looks like it would not happen independently

Mehul: great question
… there are two steps divided
… steps of user anonymization
… turning s into s'
… doesn't need to know actual user ID
… can be different from request time
… I acknowledge this is a trusted service model
… it does have info with s' be a direct request
… and s turning into s' is separate step
… both steps could be done in peer to peer model
… we kept it high level
… but no reason this could not run in a distributed manner
… we are not commenting about it
… cloud service model, but doesn't need to have hosted service all the time

ErikT: is trusted service, or more like MP solutions that CHrome has proposed
… amount of work and design of how you cluster
… would be significantly affected by that decision
… you are constrained in the MPCC case

Mehul: not just trusted entity
… second thing...we do think there is a potential scenario
… service model works more like centralized service
… MPC where helper one and helper two are working together
… and algorithm can run in that construct

[missed]
… parties could do...with good compute resource

Angelina: this is not specifically directed to you but kind of directed
… what occurred to me when you did the side by side comparison
… there is no indication of the ad server
… and how specific ads would be triggered by parameters coming from SSPs and DSPs to Id which ads to actually serve
… you are following what other proposals have framed
… but ask we consider the actual ads that are being programmed and set for particular audiences and placements
… rotation schedules and such
… ask that also be considered
… talked about how browser hosts creative
… but what happens when client wants to swap out creative
… time sensitive
… could be things in example of plane crash
… may not pull ad but swap the creative in the ad
… just a general comment for everyone

Mehul: I can spend a minute or so on that
… but also add to GitHub
… we assume ad coming back should adhere to the rotation
… for size of ad rendered
… carries ad with on the publisher side
… we can further discuss if I misunderstood your question

Angelina: more of ask to include publishers and have full representation of the ad-call flow

Mehul: We will discuss

Angelina: And the same for TD and FLEDGE, the Google Chrome team

Andrew: thanks for presentation
… first question is a math question

<AramZS> @angelina I think we have similar questions. Would appreciate your feedback on the GitHub issue on PARAKEET I'm working on

Andrew: you saw that with probablilty P you return the true closes centroid
… with 1-p over k-1 you return a random centroid
… these don't add up to one

Mehul: for each random cluster, so adding to 1-p
… if you replace with metric dp, it's proportional

Andrew: So like an equal probability

Mehul: we are trying not to deep dive into the math
… highlight is if clear trade off to be monetized
… look for ways to improve that

Andrew: Do you have any documents for the epsilon calculation?

Mehul: yes, I can provide those

Viraj: thanks a lot for proposal
… how would this work in the time dimension when differential privacy needs to account for budget and multiple requests?

Mehul: where method comes into play
… not lose effectiveness, but provide reasonable amount of time
… any privacy construct does not hold with infinite time...
… noise to be added is proportional to @@@#

[too fast, sorry]
… this field is less likely to happen; for active user

Viraj: in reporting piece
… you might have less features to worry about
… ad call
… worry about two dimensions
… lower on time part
… when you actually get request for one page visit and get many ad requests
… more prominent
… I think this makes sense

Brian: thanks again for presentation
… I'm wondering if you have thought at all on impact on the long tail
… and what mitigations for long tail publishers?

Mehul: great questions; could be two impacts
… on publisher and user
… we are trying to get some prototype up and running and get some analysis
… short answer, when we do c' and s' it will impact long tail publishers
… ad network would not get in plain text; run on functions
… next step is a proposal for that

Brian: Ok, I'll keep my eyes peeled

Wendy: Thank you for pointing us to the GitHub repo and the open issues
… if you want to set aside an adhoc meeting
… Thanks for the presentation; if you can make the slides available, that would be a useful link
… we look forward to the ongoing discussion

Mehul: thank you

Dovekey Auction

Wendy: We have nine minutes left
… Gang, you wanted to talk about DOVEKEY auction proposal

Gang: 8 mins left, I can talk at super high level
… and next time we can go into more details on how it works

<wseltzer> https://github.com/google/ads-privacy/blob/master/proposals/dovekey/dovekey_auction.md

Wendy: Sounds good

Gang: let me present
… First a little background
… at end of last year, Google published explainer
… DOVEKEY where we will cache
… from buy side

<wseltzer> https://github.com/google/ads-privacy/tree/master/proposals/dovekey

Gang: when ad tech generates request to key server
… server is simple
… after we published explainer, we got a lot of feedback

<wseltzer> https://github.com/google/ads-privacy/blob/master/proposals/dovekey/dovekey_auction.md#motivation

Gang: we discussed how to further improve the proposal to satisfy business needs of industry partners
… and still satisfy privacy needs of users
… we added more functionalites
… to provide more functionality and privacy at same time
… We added auction functionality into DOVEKEY server
… that will pick from among bids and take winner back to browser
… this will benefit user by reducing bandwidth need and reducing JS for other functionalities we had to put on the browser
… because of that we can put additional functionality into DK server
… add triggering moments
… address users experience
… and prevent microtargeting to prevent creepy ads for users
… and put precise and timely budget enforcement into the same server
… On top of those basic functionality
… we envision others to be executed other functionality to be executed in the DK server
… this is overall architectural diagram
… DK server sites between browser and SSP
… DK server has trust model to handle sensitive user data
… like user's IG membership
… Let me walk through how DK server works
… If there is ad slot running in browser
… send out ad request to DK server
… ad request would be contextual ad request generated by SSP ad request
… send to SSP
… in this contextual request you have page URL, ad slot IDs, whatever other contextual info that SSPs and DSPs need
… access controls, branding controls

<wseltzer> https://github.com/google/ads-privacy/blob/master/proposals/dovekey/DovekeyCombinedAuctionArchitecture.png

Gang: send to DSPs...return two kinds of response
… first one is unconditional bit

<kleber> DoveKey server diagram looks pretty similar to PARAKEET diagram!

Gang: wait to pay $1.20 for this ad created
… another one called a conditional bid
… based on sensitive user data that browser is not willing to share
… such as the IG membership
… example of conditional bid
… if user belongs to this IG
… that is a conditional bid
… both conditional and unconditional bids go back to SSP
… SSP picks highest unconditional bid and returns to DK server and all the conditional bids
… has trust model by which browsers are willing to send IG membership to DK server
… so DK can evaluate if user belongs to this IG or not
… some conditional bids are rejected
… because condition is not satisfied
… others are accepted to the auction
… those bids compete
… and DK picks a single winner
… we come pack to step six
… sends impression back to DK server
… based on that ping back DK can do k-anonymity check
… can I show to at least K users
… or budget says this much
… or other requirements
… at super high level this is proposal
… DK model can access sensitive user data and can access on behalf of SSPs and DSPs
… next week I can provide more details

Wendy: Thank you
… people can queue up issues beforehand
… and continue to send us requests for discussions and new proposals
… See you again next Tuesday, March 9, same time [11:00am ET]
… thank you all

<wseltzer> [adjourned]

[adjourned]

Minutes manually created (not a transcript), formatted by scribe.perl version 127 (Wed Dec 30 17:39:58 2020 UTC).

Diagnostics

Succeeded: s/Aswatch Moham/Aswath_Mohan/

Succeeded: s/Aswatch/Aswath/

Succeeded: s/K++/K clusters/

Succeeded: s/as is/as is, then epsilon is 23, very high/

Succeeded: s/define/define, key area of research/

Succeeded: s/progress/progress on taxonomies/

Succeeded: s/@/UA string/

Succeeded: s/uses/to control/

Succeeded: s/@ blocking and/block-listing and content adjacency/

Succeeded: s/bidder heading/header bidding/

Succeeded: s/@/pre-sold/

Succeeded: s/and/reduced in granularity to address/

Succeeded: s/io/eyeo/

Succeeded: s/@/SPECTACLE/

Succeeded: s/MP/MPC/

Succeeded: s/they/for each random cluster, so adding to 1-p/

Succeeded: s/add up to 1 minus 2//

Succeeded: s/time/infinite time/

Succeeded: s/@/ad requests/

Succeeded: i/We have nine minutes/Topic: Dovekey Auction

Succeeded: s/surver/server/

Succeeded: s/bits/from among bids/

Succeeded: s/do/prevent/

Succeeded: s/bit/bid/

Succeeded: s/model;/model by which browsers are willing to/

Succeeded: s/@/rejected/

Succeeded: s/@/k-anonymity/

No scribenick or scribe found. Guessed: Karen

Maybe present: Andrew, Angelina, Aram, Aswath, Aswath_Mohan, Brendan, Brian, Erik, ErikT, Gang, Mehul, Michael, Viraj, Wendy