Meeting minutes
<wseltzer> https://
<wseltzer> https://
<wseltzer> https://
<wseltzer> https://
<wseltzer> https://
<wseltzer> https://
<nics> present
Wendy: Welcome folks
… reminder that we use irc channel to queue
… so please join there as specified at the bottom of the agenda,
… and "present" yourself to let us know you are here
… and "q+" will add us to the speaking queue
… Our agenda for today
[Wendy reads through agenda items]
Agenda-curation, introductions
Wendy: Any new participants who would like to introduce themselves?
<jrosewell> Are we going to go through the outstanding questions from last week?
Airey Baringer: Hi, Airey Baringer from Triple Lift
Rob Beeler from @
Wendy: Welcome
… when we get to outstanding questions from last week, we will start with those
… and then come back to the questions
PARAKEET
Wendy: First substance agenda is to hear about the PARAKEET proposal
… Kelda emailed us with an introduction to that proposal
… and said some colleagues from the Microsoft team would present that
… Kelda, please kick us off
Kelda: Super excited to be here today
<wseltzer> https://
Kelda: to introduce PARAKEET
… Eric and Mehul will present
Mehul: Thank you everyone
… can you see my screen?
Wendy: Looks good
Mehul: need to go through objectives
Aswath_Mohan: We'll go first through the objectives and then go through the privacy and anonymization aspects
… so please wait until end and then we will do Q&A at the end
Aswath: Talk about main objectives
… We want to improve user privacy and the ability to individually identify users across sites
… we want to introduce ability to do measurable privacy parameter
… so you can think about what the trade-offs should be between monetization and privacy
… and maintain key monetization functions so web thrives for everybody
… and also see if we can limit the churn to existing models to the extent possible
… and Introduce an idea where privacy function is handled by browser
… but the bidding and optimization stays with the DSPs and SSPs
… Now I will hand off to Mehul and Erik
Mehul: High level, advertiser site flow is similar to Turtledove
<wseltzer> https://
Mehul: a JS API, user features or IGs to browser storage
<bmay> I'm not seeing slides, are others?
Mehul: browser stores those user features or IGs and talk about how anonymized
… request initiatives through pre-defined API, similar to @ API
… that goes through proxy
… browser adds user features or IG vector and passes through this proxy
… that forwards to ad network, anonymizes
… that we will explain more in detail
… once proxy hands over to network
… you can see...and now adwork can leverage all the information
… retrieval, auction and pricing, and check if budget, then gives ad back to proxy
… and ad gets rendered into fenced frame on publisher side
… clicks go through that anonymizing proxy
… click is registered
… and updated ranking models based on click feedback
… We are still hashing out some of the inputs
… will work similar to degredated...
… will discuss more later
… important part is ad network has access to the anonymized context
… and completes functionality
… browser takes key role of managing privacy of user
… rest of functionality stays on the network side
… browser is thinking about what are key user features
… let me walk you through what we mean by transforming user features
… Key problem to solve
… each advertising adding features
… we think user is global binary vector of dimension
… talking about 100K features or higher
… this is global binary vector across domains
… when we send this global binary vector, we want to make sure there are privacy features
… a simplest approach
… if we cluster binary vectors into K clusters
… think of 7 or 8 billion
… number of users is much smaller
[Mehul walks through mathematics on slide]
… We can show this construct manages privacy request
… If we don't do any clustering and keep current construct as is, then epsilon is 23, very high
… if we just do anonymization of features
… or segregate
… if we do just 10 percent, E drops to 22.5
… but if p= 0.9, then E= 18.3
… alternatives to define, key area of research
… when system is up and running a lot of variations could be done
… further analysis
… if we do something more...could implement faster
… and leverage p
… pick near bi-cluster
… and call it metric DP
… or use n clustering, pick cluster from different clustering technique
… if we introduce privacy in the original advertiser network
… we are trying to make it present
… Talking through anonymization
… going back to original diagram
… S prime
… how to translate publisher context C
… how to transfer to C time
… three key signals: contextual signals
… IAB forums are making progress on taxonomies
… we will drop title in the ad request
… ID active IP
… so that is specific location
… pick up IP in time epoch
… for device signal, client has succinct UA string
… When we pass S prime, that network cannot memorize
… final construct
… if request fail to meet privacy parameter
… we won't add to s request
… time of the hour so low, and one user active, then s prime will not be passed
… or if signals highly required
… not lose anything from contextual signal
… we provide such a parameter
… Let's quickly talk about the key advantages and key challenges
… Advantages, people see c, s...in ad request to support retrieval...
… proxy to control finger printing
… some monetization trade-offs
… jointly work towards improving that
… sort of measurable privacy
… One of challenges, c prime, s prime together
… take great care of having this together in some sort of trusted server
… this requires great care
… why we are trying to put this anonymization step
… try to avoid attack
… continuously work on it
… we also have a more advanced proposal
… there is a bit of noise on prime, that could affect user
… ad could have seen
… we have mindful of that
… we feel there is a very clear tradeoff
… third is need trusted service for segment and ad request anonymization
… Let me quickly see relation to all the current proposals
… before we go to next step
… This forum is familiar with the proposals
[Explains Turtledove]
… led to FLEDGE
… where retargeted ad can be given
… browser in both cases issues private report
… APIs talking about different noise
… differentially private report
… PARAKEET is kind of similar, user IG feature
… propose user feature
… when ad requests are initiated, we put c prime in ad request itself
… and c' s'
… see auction and bid model in box
… in all three cases
… network can target
… where feedback is available
… looking side by side comparisons
… Next steps...Erik
<wseltzer> https://
<wseltzer> https://
Erik: Call out some next steps
… we would love to get feedback in GitHub
… sense of urgency
… if feedback later, that's fine, but if immediate, please file in next 2-3 weeks
… we are happy to have a dedicated session
… and we can figure out logistics
… and come back here or future meetings
Mehul: We can open up queue
… for some discussion
Wendy: thank you for that
Aram: thank you for this presentation
… helpful to parse out your proposal
… I had two questions that popped out to me
… first one
… noticed that it would not send the page URL or the page title as part of this process
… this strikes me as an immediate issue
… ad world is interested in block-listing and content adjacency
… signal...harder to tamper wtih
… say advertise against news, and show up on NYTimes.com but not on Breitbart.com
… how does this proposal handle that
… wtih removal of bid server from process, does it increase difficulty of direct sales
… advertisers who go direct
… but not necessarily through an SSP
… but through the publisher directly?
Mehul: first question first
… to answer question, we are using the contextual siganl
… publisher paradigms
… if publisher is NYTimes or WSJ
… we know that
… additional thing, as long as we don't see a risk to be unique enough in that time, we will let it pass
… if that page is so unique, and only one or a few users
… publisher will help to featurize the flow
… publisher works with SSP or own web service
… who creates those signals
… page placement
… high level
… publisher ID will pass through but ad signal will not
… risk of privacy on that
… to answer question
… if advertiser is worried about certain things, they can participate or pull out of auction
… done as pull URL
… Second question
… header bidding
… once hand over request to S end point
… have it part of auction, can see who is part of auction; SSP
… it can pull in the pre-sold
Aram: a lot of the direct logic happens on page
… would you be open to an issue diving into this on GitHub?
Mehul: yes, that would be great
Michael: First of all, thank you very much
… delighted to see this proposal and have it out there as part of the conversation
… great to have additional implementers
… and I appreciate your explaining how PARAKEET relates to TD and FLEDGE
… in terms of details of proposal
… I would really love for a proposal like this to actually meet the needs of the ad ecosystem
… I would love to land on something like this
… In Chrome, we explore some ideas similar to this
… I will point out the things that are hard about this approach
… which is why we ended up separating the two
… As you say, there is always a privacy vs. utility tension
… Aram describe well that tension
… something that comes up on utility side in ad industry
… as soon as you change contextual signals, there are the grand sequential issues
… called the "plane crash" signal
… if you don't know if "plane crash" will be propogated through proxy
… or if it doesn't come through ad network, it's hard to know if it's ok to run an ad run on a particular page
… this is why TD and FLEDGE lets it pass through
… it seems hard
… intentioned with is the privacy problem
… with this approach, with a server inbetween ad network
… fundamental privacy problem is the joinability
… if you assume the pub page and ad network are also talking directly to one another
… pub and ad network...s prime
… and two connects c' and s' going through proxy and one not going through proxy
… those things happening at same time, is there enough information for ad network to join those two things up
… if there is
… if possible, I would be happy if your proxy version could work
… but it's hard enough that we thought it not possible to put this in
Mehul: thank you, Michael
[please slow down Mehul]
… other than policy control
… c' is good enough
… constructs we are working on
… incrementally improving this further
… c working but not in plain text
… we are looking at this vigorously what can be done
… c' reduced in granularity to address joinability risk
… what functionality it loses with publisher sensitivity control
… that is what we are thinking
<wseltzer> ... that's why we have c'
Michael: thanks
Brendan: In eyeo context
… echo Michael's comments on your honoring the efforts of TD and FLEDGE
… this also strikes me as something similar to eyeo
… this type of discussion on SPECTACLE and joinability are current
… and would like to have the team at eyeo talk with your team at MS
… we spoke two weeks ago and look at the end points
Mehul: If you can add your interest, we can follow up offline
… thank you
ErikT: Thank you for sharing this
… echo earlier comments; great to see this area maturing and getting more attention
… One thing I am unclear on is the proxy step
… where u turns into u' and c turns into c'
… seems like there is some entity that has a 'god view' of all requests
… and can do this clustering in a global environment
… and figure out how much obfuscation has to be removed from context to keep it in the privacy constraints
… the FLoC proposal does that with algorithm in the browser
… but with publisher, looks like it would not happen independently
Mehul: great question
… there are two steps divided
… steps of user anonymization
… turning s into s'
… doesn't need to know actual user ID
… can be different from request time
… I acknowledge this is a trusted service model
… it does have info with s' be a direct request
… and s turning into s' is separate step
… both steps could be done in peer to peer model
… we kept it high level
… but no reason this could not run in a distributed manner
… we are not commenting about it
… cloud service model, but doesn't need to have hosted service all the time
ErikT: is trusted service, or more like MP solutions that CHrome has proposed
… amount of work and design of how you cluster
… would be significantly affected by that decision
… you are constrained in the MPCC case
Mehul: not just trusted entity
… second thing...we do think there is a potential scenario
… service model works more like centralized service
… MPC where helper one and helper two are working together
… and algorithm can run in that construct
[missed]
… parties could do...with good compute resource
Angelina: this is not specifically directed to you but kind of directed
… what occurred to me when you did the side by side comparison
… there is no indication of the ad server
… and how specific ads would be triggered by parameters coming from SSPs and DSPs to Id which ads to actually serve
… you are following what other proposals have framed
… but ask we consider the actual ads that are being programmed and set for particular audiences and placements
… rotation schedules and such
… ask that also be considered
… talked about how browser hosts creative
… but what happens when client wants to swap out creative
… time sensitive
… could be things in example of plane crash
… may not pull ad but swap the creative in the ad
… just a general comment for everyone
Mehul: I can spend a minute or so on that
… but also add to GitHub
… we assume ad coming back should adhere to the rotation
… for size of ad rendered
… carries ad with on the publisher side
… we can further discuss if I misunderstood your question
Angelina: more of ask to include publishers and have full representation of the ad-call flow
Mehul: We will discuss
Angelina: And the same for TD and FLEDGE, the Google Chrome team
Andrew: thanks for presentation
… first question is a math question
<AramZS> @angelina I think we have similar questions. Would appreciate your feedback on the GitHub issue on PARAKEET I'm working on
Andrew: you saw that with probablilty P you return the true closes centroid
… with 1-p over k-1 you return a random centroid
… these don't add up to one
Mehul: for each random cluster, so adding to 1-p
… if you replace with metric dp, it's proportional
Andrew: So like an equal probability
Mehul: we are trying not to deep dive into the math
… highlight is if clear trade off to be monetized
… look for ways to improve that
Andrew: Do you have any documents for the epsilon calculation?
Mehul: yes, I can provide those
Viraj: thanks a lot for proposal
… how would this work in the time dimension when differential privacy needs to account for budget and multiple requests?
Mehul: where method comes into play
… not lose effectiveness, but provide reasonable amount of time
… any privacy construct does not hold with infinite time...
… noise to be added is proportional to @@@#
[too fast, sorry]
… this field is less likely to happen; for active user
Viraj: in reporting piece
… you might have less features to worry about
… ad call
… worry about two dimensions
… lower on time part
… when you actually get request for one page visit and get many ad requests
… more prominent
… I think this makes sense
Brian: thanks again for presentation
… I'm wondering if you have thought at all on impact on the long tail
… and what mitigations for long tail publishers?
Mehul: great questions; could be two impacts
… on publisher and user
… we are trying to get some prototype up and running and get some analysis
… short answer, when we do c' and s' it will impact long tail publishers
… ad network would not get in plain text; run on functions
… next step is a proposal for that
Brian: Ok, I'll keep my eyes peeled
Wendy: Thank you for pointing us to the GitHub repo and the open issues
… if you want to set aside an adhoc meeting
… Thanks for the presentation; if you can make the slides available, that would be a useful link
… we look forward to the ongoing discussion
Mehul: thank you
Dovekey Auction
Wendy: We have nine minutes left
… Gang, you wanted to talk about DOVEKEY auction proposal
Gang: 8 mins left, I can talk at super high level
… and next time we can go into more details on how it works
<wseltzer> https://
Wendy: Sounds good
Gang: let me present
… First a little background
… at end of last year, Google published explainer
… DOVEKEY where we will cache
… from buy side
<wseltzer> https://
Gang: when ad tech generates request to key server
… server is simple
… after we published explainer, we got a lot of feedback
<wseltzer> https://
Gang: we discussed how to further improve the proposal to satisfy business needs of industry partners
… and still satisfy privacy needs of users
… we added more functionalites
… to provide more functionality and privacy at same time
… We added auction functionality into DOVEKEY server
… that will pick from among bids and take winner back to browser
… this will benefit user by reducing bandwidth need and reducing JS for other functionalities we had to put on the browser
… because of that we can put additional functionality into DK server
… add triggering moments
… address users experience
… and prevent microtargeting to prevent creepy ads for users
… and put precise and timely budget enforcement into the same server
… On top of those basic functionality
… we envision others to be executed other functionality to be executed in the DK server
… this is overall architectural diagram
… DK server sites between browser and SSP
… DK server has trust model to handle sensitive user data
… like user's IG membership
… Let me walk through how DK server works
… If there is ad slot running in browser
… send out ad request to DK server
… ad request would be contextual ad request generated by SSP ad request
… send to SSP
… in this contextual request you have page URL, ad slot IDs, whatever other contextual info that SSPs and DSPs need
… access controls, branding controls
Gang: send to DSPs...return two kinds of response
… first one is unconditional bit
<kleber> DoveKey server diagram looks pretty similar to PARAKEET diagram!
Gang: wait to pay $1.20 for this ad created
… another one called a conditional bid
… based on sensitive user data that browser is not willing to share
… such as the IG membership
… example of conditional bid
… if user belongs to this IG
… that is a conditional bid
… both conditional and unconditional bids go back to SSP
… SSP picks highest unconditional bid and returns to DK server and all the conditional bids
… has trust model by which browsers are willing to send IG membership to DK server
… so DK can evaluate if user belongs to this IG or not
… some conditional bids are rejected
… because condition is not satisfied
… others are accepted to the auction
… those bids compete
… and DK picks a single winner
… we come pack to step six
… sends impression back to DK server
… based on that ping back DK can do k-anonymity check
… can I show to at least K users
… or budget says this much
… or other requirements
… at super high level this is proposal
… DK model can access sensitive user data and can access on behalf of SSPs and DSPs
… next week I can provide more details
Wendy: Thank you
… people can queue up issues beforehand
… and continue to send us requests for discussions and new proposals
… See you again next Tuesday, March 9, same time [11:00am ET]
… thank you all
<wseltzer> [adjourned]
[adjourned]