Improving Web Advertising BG -- 22 Oct 2020

<wseltzer> Meeting: Improving Web Advertising BG vF2F, day 2

<scribe> Chair: Wendy Seltzer

<scribe> Scribe: Karen

Wendy: Welcome back!
... Give people a few moments to get back online [for day 2]
... Share the introductory slide again

[on webex screen]

scribe: Agenda is linked in our Github repository
... we are on irc once again
... using the irc mechanism of "q+" to speak
... and if you use "q+ to say" then zakim will remind you at time you are acknowledged, and helps the chair to see that comments are related to one another
... you need to include the word "to"
... so zakim doesn't think you are added people to the queue
... Also "present+" yourself to give us a record of attendance
... We are working under the W3C professional code of ethics and conduct

Wendy: Starting off today on state of the art on multi-party computation
... Does anyone have any logistical questions before we dive into the agenda?
... hearing none, Ben Savage, Facebook and Andrew Knox, are you ready to start?

State of the art on multi-party computation (MPC)

Ben: can you make our presentation public, Andrew?

Wendy: Invite anyone who is new to us today and who would like to introduce themselves

Erik: I have been away for a few months, happy to be back with Ben and Andrew

Ben: now you can hear and see my screen

Ben: quick recap on why we are discussing this
... as we look at new APIs, there are two high-level approaches
... client only and client and server combination aspects
... ask ourselves what is privacy guarantee if offers and what is the trust model
... hope we're doing the right thing or offer extreme guarantees
... talk about MPC systems and their trust offers
... Secure Multi-Party Computation (Secure MPC or sMPC)
... is an older technology back to 1980s
... high level headline is you don't actually have to trust...
... there are helper servers
... don't have to trust that they are honest or not curious
... only trust that one is not colluding actively with the other
... talk about the "secret share" type
... you have a private input value
... some conversion value, or other info

<wseltzer> btsavage: in the model we're discussing, so long as one is non-colluding, you're secure

Ben: in advertising land, a value you want to keep safe
... "M" secret shares
... if you have M-1 you have nothing
... look at super basic way of doing things
... pick some modulo base
... pick 100 to keep it simple

<ajknox> link to slides Ben is currently presenting: https://docs.google.com/presentation/d/1brmw1Hkv3L9XjCyQHEfD7IiC3GltBfek45kUnrsrqO0/edit?usp=sharing

Ben: and it M

[Ben walking through slides]

<wseltzer> [slides 4-5, example of shared secret]

[slide 6]

scribe: compute sum of these MPC
... measure return on ad spend, have basket value
... and do that in private way
... we have made up some secret shares to make up the number five

[walks through first row]

scribe: helper A only receives helper A; cannot reverse engineer this
... true for all three helpers
... the cool thing is that if you add all the rows or add the columns, you get same result
... Helper A column addition
... get these radomish looking numbers
... if you are an ad tech platform querying these
... you get 75 and at no time did you learn what the inputs or helpers were
... but you could compute the total sum
... Look at colluding helpers
... say helper A and B are bad and sharing every piece of info with the other colluding server
... but helper C is not colluding
... what can helper A and B do together?
... Nothing
... can add the two numbers together
... without that third secret share you have nothing
... if you get any rows or columns added together you have nothing
... even if helper C is looking at these inputs, you maintain the privacy of the inputs
... what if you poison system by feeding fake data
... MPC does not guarantee privacy of output
... trying to reveal the sum of things together
... think about 4 of 5 browsers, controlled remotely, one controlled by human
... only one of values is real person's value
... when MPC computation is complete, you get sum of inputs, 75, you can reverse engineer what the sum of the individual input was
... difference is 15
... you reverse engineered this private info
... kept security of input but output is not necessarily private
... this is to show what is or is not MPC
... solution is "global differential privacy"
... Take total and add random noise
... needs to be proportional to one input
... versus adding noise to every single input
... five people contributing
... one sixth will be random noise

<wseltzer> [slide 11]

scribe: one thousand browsers reporting is relatively small
... much more sophisticated ways to do it
... but go with simplistic way to show this
... assume inputs are on some range
... can find that in advance; find some range we agree upon
... each private input is in range of 0-25
... simplist way is to generate negative random value between -25 and 25
... how to run the computation in your MPC
... helper takes secret shares to get helper total
... it generates random value
... add to local total and return the 49

[going through each column totals]

scribe: add the three helper total numbers to get 91
... but not the true answer which was 75
... we added a total of 16 in randomness
... what this effectively does is protect the privacy of the output
... we have five inputs, four of which are fake
... sum was 60
... revealed value came back as 91
... compute difference of 26
... that is not the true value
... we have protected the privacy
... you cannot know the actual real value
... I used Excel to generate a Monte Carlo simulation
... and generated random numbers, added them together 1000 times for actual set of outputs

[slide 17]

scribe: here is the distribution chart
... give a more realistic example
... use 100 as mod base
... if sum of all inputs is more than 100 it would truncate and we would lose interest

[back to slide 14]

scribe: differential privacy operates on large volumes of data

[slide 15]

scribe: showing sums
... generate a 1000 random numbers
... add this random noise to sums
... got negative 9
... returns 75376
... not a particularly large difference
... if this is number of cents
... being 9 cents off is not a big deal
... illustrates that global differential privacy does good job of preserving utility
... but if values were fake
... you would be getting a number nine off
... ran this Monte Carlo simulation to get the distribution
... you could see anything

[slide 17]

scribe: if private value is 12, you could see -75 to +75
... and you can do better than that
... we would use random noise pulled from plos
... to get better utility

[PRIO slide]

scribe: only trust that one of parties is colluding
... if you want to protect privacy of output statistic
... have to have program, add a bit of random noise
... to protect privacy of output stat
... PRIO is cool
... real innovation here is that they found a way of doing range proofs that limit the effectiveness of fraud

<wseltzer> https://crypto.stanford.edu/prio/paper.pdf

[Range of Proofs slide]

scribe: instead of generating 3 secrets, you could produce 3 random secret shares to produce value outside of range

[walks through example of numbers]

scribe: they appear to be smaller than mod base, but no way to know invalid
... but you get this huge sum
... if you expect range of 0-25, then get this huge number, that's a problem
... PRIO sends data to helper server
... only tells it proof that secret shares are in that range
... all it tells you
... if it gives an invalid proof, you ignore it
... makes it hard for them to screw up the entire measurement
... if you only expect to zeros and ones
... it can only affect by total sum by one
... PRIO is also fast
... works nicely
... Google aggregated reporting API
... appears to be compatable
... seems to match with secret sharing
... trust model holds
... adds Laplace noise
... to differential privacy
... seems to draw
... see if Charlie wants to join

[slide 21]

Charlie: I have a slot later
... maybe wait

Wendy: maybe take to take questions for Ben
... great presentation of some of the primitives and tools we have available
... take questions, and then Charlie, thanks for offering your input

<GangWang> great presentation!

Charlie: quickly say +1 what Ben said; it's compatible with what we have published
... not just one tool we use, investigating MPC aware
... we are iterating internally
... and taking heavy inspiration from PRIO

Wendy: Great

blepartmentier: thank you, Ben
... very interesting
... jumping in to say that I disagree with one of your statements on Global Differential Privacy
... what you mean by "work"
... have this discussion as measurement session
... works if what you want computes on ROAS overall
... it works if you only have a basket
... we work with rare data
... 25 you add is not the range we are used to working in
... zero and one click
... 1000 display
... one click is very rare
... just adding one @ is a lot of noise
... equivalent to 50K displays
... a lot of noise is huge
... we are working in very fat-tailedd world
... one example
... look at income wealth, if Bill Gates walks into a bar, the average income goes up enormously
... if you want to protect some privacy, the number is huge
... I don't think this is well adapted for advertising
... for reporting purposes
... not look at global but look at small spaces, every publishers
... is it on top of the web page or not
... make that measurement noise
... if on very big group
... you can see
... this noise is very big
... first point
... dimentionality with all these publishers and placements, location data
... differential privacy is huge
... wanted to save this for next talk
... I have issues with differential privacy; not well adapted to the work we do

Ben: I would summarize your statement
... when we say differential privacy works
... we mean it protects privacy of the inputs
... you could get anything in 0-25 range, with equal probability
... doesn't tell you real value
... utility is a different question

<dkwestbr> protecting the privacy of the “XX” - what? Ben broke up - for me at least

Ben: Privacy CG virtual F2F
... conversation about difference between optimization and measurement
... global diff privacy is good tool for measurement, counting things
... good tool to use
... for optimization, I agree, you have a lot of noise
... the range really matters
... if range is 0-1, great
... won't radically change
... but 0 to 5billion range, adds same level of noise

bleparmentier: If only once every 5K times
... I fully agree with what you just said

kleber: Did you say noise of one event messes everything up?
... draw conclusions based on a single conversion?

bleparmentier: I do mean that
... xcif you have noise of one
... to every
... with ML
... big research is you work with @
... learn statistically significant

<ajknox> I would like to respond very quickly to Basile on the BIll Gates point

<wseltzer> qq+ ajknox

bleparmentier: you will have one display for one advertiser and one publisher with one size
... and only one conversion
... if only that I cannot learn anything
... if same size
... I can see that size is better than another one

[asks Basile to please summarize]

scribe: you have a lot of uniqueness

<wseltzer> bleparmentier: when you look at the groups by all of our features, many are unique

scribe: if you are not able to merge

<btsavage> ...let's move on to the next question

scribe: data will help me learn things with other data
... rarity of level
... and dimensionality of the input
... rarity is big noise
... cases with only one sale

Wendy: Thanks, I think this is one of the reasons that the use cases document is so important

<wseltzer> https://github.com/w3c/web-advertising/blob/master/support_for_advertising_use_cases.md

Wendy: to make connections between what the use case needs are and what the proposals do and don't do to solve it
... think about which cases are covered by these tools and which use cases need a different set of tools
... we can keep thinking about it

Andrew: Speak quickly
... some areas are legit
... others have convincing responses
... the Bill Gates example is known as windsorization
... some issues there
... active research in differential privacy
... million dollars may reveal info
... subtly beyond scope of this conversation

<wseltzer> https://en.wikipedia.org/wiki/Winsorizing

Andrew: there is a lot research on how this adds bias
... even one event is critical
... that critical signal is already noisy
... based on parameters you have available

<Zakim> ajknox, you wanted to react to bleparmentier

Andrew: if that event is interesting, the amount of noise the differential @ is adding...
... using stats you can back out; called de-biasing

<scribe> ...new techniques coming out for stricter paradigms

UNKNOWN_SPEAKER: in our research we are seeing some positive things
... about our ability to get info from privacy

bleparmentier: problem is not....I agree with you
... so much we need to learn
... we can use all together
... group everything, have such noise
... heard a lot of work
... big issue is that it's a binary problem when binary is almost always equal to zero
... ways to circumvent some of them
... way differential privacy was deisnged
... is a little weak when there is so much dimentionality
... want to say that we can do better
... just afraid because of how whole idea is set up
... what you are talking out won't give back reasonable performance

Wendy: Suggest this level of detailed conversation
... move to an issue for further discussion because we have a long queue gathering

appacoe: Thanks for presentation, Ben
... at NextRoll we are starting to think about reporting proposals
... you give this toy example
... see max of one input
... you say it's applicable to aggregate reporting API
... two questions on noise
... I understand that differential privacy....
... lots of records could be produced by one individual
... to protect groups

<wseltzer> differential privacy allows plausible deniability that an individual was in the data set

appacoe: if you have epsilon differential
... you would require K epsilon differential
... would increase noise linearly based on max number of records in the data set
... second question is the noise to be added

<bleparmentier> Just want to state that we have actually already created issue, even a blogpost (https://github.com/Pl-Mrcy/privacysandbox-reporting-analyses/blob/master/differential-privacy-for-online-advertising.md) and a full reporting proposal in SPARROW (https://github.com/WICG/sparrow/blob/master/Reporting_in_SPARROW.md)

appacoe: also seems to scale with the number of queries you engage against the data base
... with toy example
... one query
... add a little noise and get away of it
... but with aggregate reporting API with tens of thousands of queries and computations across that data set
... sums across 100K campaigns, then noise scales
... do you see problems?
... is noise applicable to aggregate reporting API?

ajknox: Lifetime privacy guarantees
... if you have multiple queries, you can add the epsilons together
... two plus four is six

@: Don't believe that's true
... composition is additive in the noise
... multiple queries means you increase noise to maintain diff privacy
... run same query twice
... epsilon per query

[who is speaking?]

<wseltzer> eriktaubeneck:

@: the more queries, the lower the epsilon per query and the higher the noise
... mechanism defines the epsilon
... build mechanism to support queries
... you can tune it
... with three queries and budget of three
... depending upon what is more important

AP: when we want to add lots of queries
... means epsilon needs to be very low
... noise coming back will be high

<bleparmentier> +1 This is what I tried to say about dimesionnality

AK: you need to define queries against same data base
... jeopardy against the same interaction
... query every single day
... legit to think about those queries separately
... stuff happening today...new database tomorrow
... get around idea of lots of queries all the time
... so many people get so many shots
... epsilon adding up that row

AP: more concrete

ErikT:

AP: DSP receives encrypted logs
... and can maintain as long as they want to support multiple queries later
... problem doesn't not jive with aggregate reporting

<Garrett_Johnson> Rare moments where I'm glad for taking real analysis courses: https://twitter.com/rootbert/status/1318278899881537536?s=20;)

AK: sure question that Charlie and Michael will address
... kind of specificity
... choose database carefully
... we have had success with types of problems we are trying to solve
... but you have to make your choices carefully to get a good utility

BL: you are forced to make choices or you loose information

Michael: your high dimentionality is another question
... overlapping, repeating queries that take same data into account is what is being discussed

BL: publisher size, two are there together
... a lot of dimentionality
... want size, position

Ben: let's ask Charlie
... how many times can you make a batch request for a particular row?

Charlie: we call this out in the explainer
... we need to have some bound on the number of times...participate in the query
... could choose K is not equal to one
... to do this kind of dimentionality; different slices of data
... need to build in
... there are more sophisticated ways to do privacy budgeting
... might be able to do something more clever
... not get into all the details, but there is research on privacy budget
... big distinction if talkinga bout multiple queries over disjoint data
... don't get this privacy flaw to add the epsilons
... means we can say things
... and bound epsilons in time windows
... making daily queries and roll-ups
... day one doesn't impact day two so you can analyze differently
... happy to take questions in my session if we don't have time here

Wendy: discussion is going deep and scribe having hard time keeping up
... wondering if we schedule a separate white board session
... thanks for raising these questions
... let's call this back
... for another discussion
... head back to the queue

dkwestbr: thanks, Ben for pulling this together

<kleber> Garrett_Johnson -- definitely not *that* epsilon!

dkwestbr: we are looking into this as well
... my first question
... you layed out a security mechanism and layering on global differential privacy
... you have some mechanisms and an overall scheme for privacy
... people like me may want to go deeper
... is there a catalogue of MPC mechanisms and schemes?
... Google blog doesn't cover everything
... we did not go into all the different mechanisms here
... curious if there were catalogues available?

Ben: you can do anything you want in MPC, although it mght be slow

<charlieharrison> +1 to Ben there are some universality proofs afaik

<ajknox> https://github.com/MPC-SoK/frameworks

Ben: compiles exist

<ajknox> this is a good survey of the state of the art about a year ago

Ben: this is thing I want to do, make it happen in MPC, and go
... catalogue would be good for things to do efficiently

ajknox: I just dropped one into irc
... academic researchers are doing work here; best public one I know of

DaveW: MS has blog posts, Baidu as well
... other question, is there any practical
... Google and Facebook are mature in your understanding here

<eriktaubeneck> slides from the Real World Crypto presentation for MPC-SoK https://rwc.iacr.org/2020/slides/Hastings.pdf

DaveW: any practical applications you can point to?
... I don't know how far I'm moving into what is ok to talk about publicly?
... any real world examples?

ajknox: Erik and I work on team and we try to release things publicly

Dave: Thank you

charlieharrison: I want to take cahnce to ask some of our
... peer browser vendors
... Erik and Tess on the call
... curious if you have opinions on security model of MPC and models that add diff privacy to outputs
... curious to hear Apple and Microsoft perspectives?

Wendy: anything anyone wants to say?
... or respond to the question later?

Tess: I can ask around

<ErikAnderson> Mic problems on my end apparently...

Charlie: great, would be curious to hear

Wendy: thank you

<ajknox> we released this paper and open source library earlier this year regarding how to compute an identity that is useful for MPC https://engineering.fb.com/open-source/private-matching/

Wendy: let us know later

ErikA: a lot of folks know who I am
... from the Edge team
... we have been evaluating a lot of service driven helpers
... to get more privacy guarantees
... we are interested in multi-party computations
... another discussion today on auditability of services and how we set up
... how to audit services, how they get funded
... if we believe services are needed to support monetization sufficiently
... we want to reduce risks of what the service understands
... we may end up discovering things, may be too limiting
... to Basile's point a bit
... at high level, willing to collaborate with everyone to solidify this

Kris: take a step back
... this is really interesting
... but found myself thinking this is a big tech answer that ignores a data privacy concern
... that transparency aspect
... this is a complicated system
... targeting ads
... publishers don't know
... let alone consuers
... when I think about this, it elevates the complexity that much further
... and we would have a very tough time explaining why you got this ad
... I feel like it is steering in the direction of "we are smart and trustworthy" in this big black box
... wondering as we dive into these possible solutions
... what thought is going toward how to make these solutions understandable so that the average consumer knows what is going on
... is anyone looking at this?

Ben: I disagree with your analysis as applied to aggregate
... questions raised yesterday about where data is going, how did they obtain it; how is it being used
... if specific data sharing is not transparent and clear
... that is a very different question than can I measure the sum of a thousand numbers

Kris: from a marketers' standpoint
... why does FB say 1000 and Google says 1010
... that level is complicated enough
... how do we get around those conversations to make it easier for a marketer

Ben: Those are orthogonal question
... data privacy is different from why FB and Google's numbers don't match
... if we split up into solutions for targeting, measurement and optimization
... if you wonder about a creepy ad
... you wonder how the targeting work
... nothing has changed
... use magic tech, then we may not have solved root of people's concerns
... aggregate measurement not a substitute to provide that optimization or targeting
... you cannot provide opt and targeting with aggregate measurement
... at great expense of other things
... we could achieve a lot
... as for the marketers and if they understand
... why FB and Google don't match is they don't know one another
... there you need some different type of solution

Kris: I think you are focusing on the wrong part of what I am asking
... not also saying it does not work
... I am saying this is very difficult for most people to follow and understand the process
... gets into I just have to trust
... across the board, we are making things more complicated
... it may not be avoidable
... but as we make things more complicated, we have a responsibility to explain simply to folks about what is going on
... we have a problem with big tech not being transparent
... don't see a forum for addressing that in these conversations

Ben: in order for a system to be comprehensible
... doesn't have to be explainable

<blassey> (I'm queued to respond to Kris's question)

Ben: average consumer doesn't know how a transistor works
... but we can explain the output of noisy @

WendellB: other voices

<wseltzer> s/doesn't have to be explainer/doesn't mean every component has to be simple/

WendellB: point is that the transistor and other inert things are not attacking you back
... Alysia's point is the stack is attacking you at all levels

<jrosewell> I agree with Kris that we need to improve trust and how well a solution improves trust is an important consideration.

WendellB: explaining how things work is a principle from accounting
... we would be well served to understand the algorithmic solutions
... and take the social and moral questions elsewhere
... I have question about whether we can master the algorithsm
... some people who don't have computers...
... businesses at a wide range of sophistication have to understand

Dave: add one other thing
... on challenges of how, why we are showing 1K v 1100 for same campaign
... and on conversions
... if we got into the other mechanisms, like the Google security blog
... there are opportunities to do accounting level validation
... assure the output is accurate but don't know the user
... have control on that final output and both parties get some level of independent verification
... comes back to catalogue
... and independent veritcations
... comes down to how we build it

<Zakim> dialtone, you wanted to ask about the usecase for this reporting (ML vs billing reporting)

dialtone: thanks for the presentation
... bring back to original MPC and reporting API
... tie together comments
... we need to understand if this API
... is this the idea that it will solve both optimization and the billing reporting
... or just solving for the billing reporting
... ML solves different thing
... we look for co-currennce
... run a ton of overlaps to get the right set of labels
... we do a lot of overlapping queries on same set of data
... how much is spent
... that's the end of it
... same time, this data should remain constant throughout the years
... customer will complain if the numbers go sideways
... interesting to understand

AK: go back and compute numbers from three years ago
... you may only get one shot at the data; that day is over
... use those numbers in perpetuity
... you cannot go back to well again and again
... it's a difficult thing
... operationally it's important to restate the data
... makes that specific thing difficult
... other point about optimization
... we are focusing on the measurement use case
... not highest value but it's a simpler problem to solve, and it's worth solving
... the problems of optimization are harder and more contentious
... less alignment on whether it's a problem worth solving
... measurement is the right place to start

Charlie: From Chrome's perspective
... we are trying to solve this from both ends at the same time
... event level for converstions and agggregate API for conversions...
... looking at results in aggregate
... both are useful; iterating on both
... event doesn't work for all use cases
... we see a big benefit in providing event level data
... high dimensional feature vectors
... and becomes hard to train these ML models

AK: you are better off with event-level for high-dimensionality
... advantage of global differential privacy is if you have few queries
... more success with that level of randomization

Valentino: dimensionality turns into a complex problem in all the current specs
... I am not sure we are getting closer to that solution
... perhaps some misalignment with parties on what we are trying to achieve with optimization
... maybe we need another conversation

Wendy: I will note more conversations about optimization are a good thing

Brad: go back to what Kris was saying about consumer understanding this complicated math
... we have analogous cases with TLS and certificates
... very few people understand the math behind the certificates
... they ahve to put some level of trust
... unlike the transistor example, this is a thing being attacked
... not so much about the math, need to trust experts that the math is correct and that it provides the benefits that it claims
... consumer understanding about why I am seeing this ad
... there are audit schemes to talk about
... and make those explanations more clear and more truthful
... so that pipeline that served you that ad can make reasonable claims as to why you are seeing that ad

Aram: thanks for the very detailed presentation; very illuminating
... not get into details now
... but there are questions around differential privacy, but when agents control many stages of the pipeline
... not sure how effective that is there
... objections come out of how a lot of control could decode this; whether you control top and bottom
... or if not decode, potentially fake it
... I'm not clear if this is answered by the math
... note this is the objection I hear in discussing these issues

eriktaubeneck: In most sophisticated MPC models, you would include a zero knowlege proof
... a tradeoff in performance with every addition and multiplication you do
... can prove based on outset plan, or abandon
... a gradient of how much trust
... do those proofs randomly every 100 operations
... and decide what that probability should be

<alextcone> FWIW, Tech Lab is having a very active and prioritized discussion on how to do standards and guidelines for evidence based proofs

eriktaubeneck: there are schemes to make sure it's impossible for one of those helper servers to actively cheat
... nothing you can do other than the range proofs to prevent garbage in
... you have to have some trust in the client providing the iput data

<wseltzer> thanks alextcone, would be great to hear more at a future discussion

Aram: getting that info is helpful to get past the objections

Measurement: Reporting APIs, WebView, cross-site and cross-device measurement

Wendy: let's move onto the more generalized discussion of measurement

<kleber> scribenick: kleber

charlieharrison: Update on Privacy Sandbox measurement proposals progress
... working at Google on measurement use cases for web advertising
... First, Event-Level API
... you get event-level data about the ads engagement that led to a particular conversion
... you don't learn much about the actual conversion event, but about what came before
... happens by moving the attribution into the browser
... When you see an ad, there is a unique ID for the ad (here the ad click event)
... The ad click link specifies a bunch of additional attributes which communicate information to the browser about the ad event
... When the user clicks on the ad — which is important, only supports clicks right now, looking to add things like view later — the browser stores all this metadata in a new browser storage area
... If the user converts, either that day or up to a month later, the conversion page signlas to the browser "Hey were there are previous impressions that led to this site?"
... If so, the browser links up these two events, the conversion with the prior ad event in browser storage
... Implementation mechanisms involves an image tag and an HTTP redirect, which is a familiar method for ads
... At conversion time, this redirect gives an opportunity for a server to provide some low-entropy data about the conversion (3 bits)
... Then after some delay, the browser schedules a report being sent to ad tech about the report that was saved in the browser (which ad it was, and the low-entropy data about the conversion)
... We care about this API for a number of reasons
... First, important for training ML models, as just discussed
... Second, it's nice becuase there are no helper servers etc, it's entirely on-device, so it's easy to launch quickly
... In fact, it's ready for external testing right now!
... We published a blog post about it on web.dev, and partners can register for Origin Trials
... An Origin Trial is a Chrome thing — a way for you to register for a token that lets you try about experimental new stuff even before it's launched on the web

https://web.dev/conversion-measurement/#setting-up-your-browser-for-local-development

scribe: Working on privacy enhancements that could allow view-through conversions, but with new limits to avoid the possibility of abuse (don't want to let a malicious site register a million impressions just to track the user)
... Also looking at anti-fraud mechanisms, same goals as PRIO range proofs, but simpler since we're not doing secret shares / MPC stuff. Still a challenge ...Second: Aggregate API
... We published an aggregate measurement proposal in our conversion repo, very similar to the ideas Ben was talking about in the MPC slot this morning
... We can ramp up features and get differentially private output
... we're prototyping this internally, stay tuned for udpate on github even though it's only a rough prototype to prove out ideas
... We hope this general architecture will support a host of other generic aggregate reporting use cases, beyond just conversions
... Third, Other things we're investingating:
... Since our mechanism so far is linking things up in the browser, it's difficult to handle App-to-web (where the events are not both seen by the browser) or cross-device (where it's not the same browser)
... We're investigating technologies to make these use cases possible
... In app-to-web case, we lose some about of verification that the browser can perform in the click/ad event itself — browser needs to know the user actually clicked on an ad to help prevent abuse, so harder in apps where the browser doesn't have insight
... In cross-device, there's prior work that Facebook published in the web-adv github repo to join across devices privately
... But there are privacy and security risks, how do we know the devices are really owned by the same person, can an adversary get at any user data through this sync'ing mechanism? We're actively looking into it
... These are important use cases, and weaknesses of on-device, but all things we would like to support
... Fourth, more generic aggregate reporting use cases
... We published some tentative ideas last year for a generic JS API to interface with the helper service MPC thing
... Final API will probably look very different from what we publshed, but keeping our eye on the same use cases
... Of particular note: Reach Measurement, goal = to count distinct viewers of an ad campaign across different sites
... Also working on Fenced Frames, of use in TURTLEDOVE API and A/B experiments
... Those fenced frames can lead to situations where reporting needs to be aggregate, so looking at those use cases as well
... Happy to hear about your use cases also
... Happy to get updates from other browser vendors about the reporting story as well, or I (Charlie) can take Q's

wseltzer: Apple's John Wilander was double-booked so not here now, indicated he would join later
... Please add link to slides!

jeffwieland: I have a high-level but maybe simplistic use case

<charlieharrison> https://docs.google.com/presentation/d/1gixF934l4LRuVNpDSIuicEJSAC0DWf6dV4goE8ZSZW0

<johnsabella> doc needs access?

jeffwieland: Is the expectation that ad tech companies will use these measurement APIs to go bill their clients?
... We bill DSPs and pay publishers. How can we expect them to pay us if we give them a bill with non-exact numbers?

charlieharrison: This is indeed a big unresolved question, and something we need to get right
... I'm not qualified to say what you should do to get billing with noisy data, but we understand that it's a huge challenge
... as we understand it, there is already noise tolerated in billing today, so maybe the extra noise added by DP is in the same range as what people are content with now

jeffwieland: I'm not qualified to know for sure, but seems like people could be scratching their head about how to handle this

charlieharrison: We want these to be usable, otherwise nobody will adopt our solutions, so we're willing to explore lots of options to provide what is needed by the industry
... could include different policies on reporting used for billing than for other use cases, these are all worth exploring

Basile: On the conversion reporting API, do we have one line per conversion, or one line per click?

Charlie: In the API published so far, you only get one line per conversion — only for attributed reports
... We could augment the API to get reports for non-attributed conversions also, but I don't think it would change the utility of the output much

Basile: We need negative examples too, for ML training! Quite important from my point of view
... If I have both, then at least I can understand post-click behavior. Having only positives is too little

Charlie: From what we've talked about, you can recover something like what you're asking for if you have a measure of unattributed clicks/views and also attributed conversions, then you can subtract
... These are easy in the non-TURTLEDOVE case, but when you get into ads that need to render in the secure Fenced Frame environment it's trickier
... We can look into reports for the non-conversion use cases too, if it's needed

Valentino: Regarding exact numbers for billing, it's not just about numbers, it's about who is the trusted party
... Today the trusted party is the exchange which is resolving the auction
... Publisher side and buyer side each compare their reports against the exchange
... If the auction happens in the browser and uses multi-browser aggregation API, and all of them use their own helpers, and all helpers add different noise, and no one party knows all the data, then how can we trust?

Charlie: Good point, and aligns with the topics from Ben's MPC talk earlier today
... How do we show that the output really is the result of summing all of these inputs, and that the inputs really are trustworthy?
... There are techniques to do this, if the exchange has a way of authenticating these inputs to the helper infrastructure, that helps a lot — these bills are the sum of verified browsers from verified auctions signed in some way
... There are definitely holes to this; there are some things where we can only know the bill contribution is in a range, so a client could mess with values within that range; we're iterating on techniques to improve this, authenticate inputs better
... Also working on providing flexibility so that advertiser can delegate validation responsibilites to the party they trust the most, e.g. the exchange

Valentino: But if the auction happens on the device, as in TURTLEDOVE, only the browser knows about the bid. If the browser is captured by a malicious actor, there's no way to know the bidding was legitimate

Charlie: Certianly Dovekey/SPARROW makes this easier — we can get the Gatekeeper to be trustworthy, and later reporting from browser can exploit signing mechanisms
... If this all happens on browser and there's no pinging any server, it becomes tricky to verify what's happening
... There are approaches like Trust Tokens to verify the device, but that's very coarse-grained, we probably want something that can bind to verified events

Valentino: Sharing screen of Criteo's SPARROW

<wseltzer> https://github.com/WICG/sparrow/blob/master/Reporting_in_SPARROW.md#example-of-ranked-privacy-preserving-report

Valentino: Maybe you should provide a companion spec for how to generate bids for an auction, to show what replaces the "joined log with labels" that are how bids are generated today
... We generate a joined table to compute the probability that a conversion will happen, assuming there is a click beforehand
... If you're changing this, then it would be helpful to understand how you could do this with the event-level API, how you expect bidders to use this
... Maybe we're not fully aligned on understanding the end-to-end plan and that's part of our communication challenges

Charlie: My quick response is that for the event-level API, we hope this will be 1-to-1 change in how things work
... In the simple case, where it's not a remarketing ad and you do have access to the publisher context, this diagram is very similar to the one you just showed
... The table includes a unique ID for the ad serving event, so you can look up all the contextual signals that you want from ad serving time
... The label is the 3-bit conversion data value
... This lets you build the table of clicks along with 0's if the user didn't convert and 1's (or a little more information) if the user did convert
... The biggest challenge is in the remarketing case, where we can't provide event-level data in the same way, where we don't want the publisher to learn information about the interest group of the user
... If you had an event-level reporting that showed both publisher context and interest group, it would satisfy your use case, but of course would violate the privacy guarantee — publisher could learn each user's interest groups
... Don't have a great answer to your question here

<alextcone> +q

Valentino: My focus is: ML and learning can be quite complex
... This is about a whole pipeline, and it's hard to understand how the whole flow works
... If conversions are 1-in-50,000 events and you only receive it a week later, there are a lot of consequences
... I think the W3C group could help find and understand viable alternatives that are privacy-preserving, if we talk about this conversion-optimization flow in concrete terms, not abstractly
... A skeleton of the flow end-to-end could help facilitate that discusison

Basile: Let me point out that in SPARROW, we proposed to do that with k-anonymity
... Not as powerful as differential privacy, but we think it would be OK to use it for this use case
... So there is a proposal out there that addresses this question

wseltzer: More use cases, and more detail to the use cases, so that we can understand what is met and what still needs work
... We're at over-2-hours now, and I know people need to stretch. Let's try to keep the rest of this topic tight
... Also I see John Wilander is here, and folks are interested in hearing from him. let's keep it short

jrosewell: We seem to have reached a point where proof-of-concepts are coming forward, there's a lot to follow
... Also it's nearly the ads industry holiday season freeze
... We're almost at the time where we have only a year left, need multiple implementations, test and deploy them — I can't get my head around how we can achieve that in a year
... It would be helpful if Google could make a public statement about their intentions, but I can't align that with the timeline I see now

Mehul Parsana: Thank you for walking through the detailed proposal

scribe: We've been thinking continuously about aggregated measurement and differential privacy
... Have you thought about event-level but anonymized?

Charlie: What is "anonymized"? Is this about stripping out PII, or about DP?

<jrosewell> We have now reached the point where PoCs and ideas are coming forward. It’s been fascinating exploring these ideas today. I followed about 70%. There are clearly still many gaps.

<jrosewell> Halloween is in 9 days and marks the start of the annual change freeze window for many businesses.

Mehul: DP needs to be done at aggregate level

<jrosewell> We are working to a deadline that Google set of 2 years.

<bleparmentier> +1 on k annonymity

<jrosewell> In practice we have 1 year to conclude a design, gain consensus at the W3C – Google have set a public expectation they will achieve W3C consensus before shipping anything under privacy sandbox - create multiple implementations, test and deployment, and train an entire global industry to switch to a any processes and business models.

<jrosewell> I’m sure it is not just me who can’t fathom how this will be achieved. It would be helpful to other W3C members if Google could make a public statement in the coming week concerning their intentions.

Mehul: k-anonymity and PII-stripping provides information but also event-level data
... as long as the underlying events are actually k-anonymous

Charlie: That's a good direction to explore, but the nature of the data that you're collecting really is aggregate in practice — if the data is k-anonymous, then aggregate reporting really is a natural way to report it (with a small about of noise)
... If we didn't apply DP to aggregate measurement, then it just reduces to k-anonymity, and that aligns with techniques we've talked about before

Mehul: I'd like to discuss more, but not the time

Marcal: Thanks to Google for delivering this API, good to have tangible things
... It seems like we get a key corresponding to the reporting origin destination
... What if an advertiser uses two different reporting origins for running two campaigns? Each will claim 100

Charlie: You're exactly right, thanks for bringing it up — since we're scoping attribution to a reporting origin, if you have multiple origins then attribution could get messed up
... We're working on how to handle it — send reports to multiple endpoints, scope to union of two origins, etc — to let you use multiple entities to measure. Not easy, but looking into it

wseltzer: Thank you! I know people are interested in hearing from John Wilander on Safari's story

wilander: Sorry, prior commitment scheduled 2 months ago, so I havne't heard the past 2 hours of conversation
... update on Private Click Measaurment, which is in the Prviacy Community Group
... We do not have the "unlinkable tokens" that would provide fraud prevention
... There is the open issue with how many bits of entropy on the click/conversion side

<wseltzer> Find a repository…

wilander: Was originally 6+6, was proposed to change to 8+4 or maybe 8+3 bits

<wseltzer> https://github.com/privacycg/private-click-measurement

<wseltzer> s///

wilander: Working to come to joint solutions on names etc, so that Chrome and Safari proposals can look the same as far as servers are concerned
... Recently opened an issue about having multiple origins that we report to, to send it to both advertiser and publisher (click source) — please comment on the issue

<wseltzer> https://github.com/privacycg/private-click-measurement/issues

wilander: Re unlinkable tokens: What is the merit of shipping PCM in Safari before we have the tokens for fraud prevention, then adding the tokens later? Could let you try it all out even before the complexity of tokens has landed

AramZS: I think there would be interest in testing these things as long as they are not blocking existing methodologies in the current release

csharrison: I'm also interested in opinions on sending reports directly to publsihers and advertisers instead of to 3rd-party ad tech

Valention: It seems reasonable to send it directly to advertiser, might help with discrepancies
... remains to be discussed who is the source to trust in all of this

David Westbrook: Worth noting that most advertisers get most of their reporting from Google anyway

AramZS: I think most advertisers get data from a lot of different sources, as many as possible, even if Google's is most trusted you don't really trust anyone

Alex Cone: That's maybe not the way it should happen, just the facts on the ground

AramZS: There is clearly interest in a lot of different sources of reporting, especially in cases like VPAID for video, with lots of spec'd additional analytics in reporting
... That also needs to be limited in some respect as well... when the amount of reporting is too large, it slows things down too

Jon Burns: (representing shopify, >1M advertisers)

<charlieharrison> Sorry I realzied my previous link to my presentation was not the published version. Here https://docs.google.com/presentation/d/e/2PACX-1vQgO8VejseyeX9R7-guDYrjQYhr81K9JbiaftPm7YtEyHPnOlp0Fimi9stk1QgzuzV_7NuQy3L3Reo4/pub?start=false&loop=false&delayms=3000

scribe: We would love a chance to try the API
... We use third party reporting and also our own models
... Most of our measurement now is post-click, but if we could use these proposals to get at it directly, would be valuable

eriktaubeneck: One of our concerns when we were first thinking about the "blind token" approach was about an open-to-the-web API endpoint
... If you want to authenticate then it's an identifier, which is why you need blind tokens
... If there is a way to sign the request with a key that only the browser has, so that you know the token goes back to the right party, it would mitigate some of the obvious attacks

wseltzer: Break! At least 10 minutes
... See you at :35 or :40 to talk about Gatekeeper and browser-server trade-offs

Gatekeeper: browser/server tradeoffs, certification

<inserted> scribenick: charlieharrison

Jeff: thanks everyone. Jeff Wieland, work for magnite, formerly rubicon. In ad tech industry for ~10 yrs
... part of this group since feb/jan. Heard lots of compelling arguments for a more private web
... that meets ad tech use-cases
... took to heart Garret Johnson's statement, could identify the argument based on where they work
... we all want the same thing generally, w/out corporate hats. We want a safe web
... Find content that is cheap / free. Not unique to our work
... trying to walk a middle path, open web, ad tech and browser needs
... Brad Rodriguez also with me, colleague
... Gatekeeper not just scoped to SPARROW bidding, or cohort assembly. There is an independent entity for more sensitive aspects
... those things don't need to be handled exclusively by browsers
... Have a demo, similar to what FLoC does
... also talk about certification and deep dive into sparrow style bidding

<kleber> (FLoC analysis last night, that Jeff referred to: https://github.com/google/ads-privacy/blob/master/proposals/FLoC/README.md)

Jeff: we believe cohorts are an _improvement_ over UID based ad targeting
... but for cohorts to be a viable alternative, we insist the assembly be done in a way that is open and transparent

<Garrett_Johnson> Can we get a link to the deck? Thanks.

Jeff: Three main proposals. FLoC --> browser side cohort assembly, added to RTB auctions in place of 3p ids / cookies
... FLoC + server: essentially same thing but cohorts assembled on server by a trusted entity
... cohorts added to RTB auctions in place of 3p ids
... Finally, hybrid of the two "Proprietary cohorts". Client side cohort assembly but there is some JS that is owned and provided by a DMP
... We want cohorts in place of cookies in RTB, we are really just discussing how that gets done
... privacy goals? Want to replace UIDS, want targeting done in a private way (no cohorts of size 1). No even economically viable
... Want reporting done in a sufficiently private manner (no realtime reporting)
... waiting 24h or some number of hours is not reasonable. 24 hrs too long to wait
... What are the advertising goals? Assemble cohorts in an open and transparent manner
... dont accept a browser centric cohort assembly
... Figure out what the cohort represents and bid accordingly Does not work for us
... Solutions we come up with must acknowledge the size of a publisher/marketer. If the output of this group is that more spend moves to walled gardens and open web atrophies away. This is a problem for all of us
... When we talk about solutions we need to talk about alternatives when something will work for a large publisher but not small publisher
... Going into demo
... Cohort assembler needs to generate a performant cohort. Converts at rates we expect. Reachable at the scale we expect
... Takes as input a session ID and a domain
... needs to be auditable. Log inbound / outbound requests
... some number of days
... that data available to audit
... clustering code should be open sourced
... all of us should be able to look at it and cut prs against it
... transparent
... give publishers / brands dials to turn to tune cohort assembly

Jeff: essentially what the demo does is return a cohort ID
... simulate a client / server interaction. Imagine a publisher sending a session ID to gatekeeper and returning back a cohort
... session ID + domain
... get a 4 char cohort and a confidence score
... won't go into data science about confidence, but represents this session ID vs. all the other session IDs
... matches at 83%
... minimum feedback a cohort assembler should be prepared to return to a consumer
... fairly similar to what I read in FLoC
... So why do we need independent cohort assembly
... make an economic arg
... fairly rational
... We need market forces to drive innovation
... markets dont thrive when there is only one producer of a good or service
... Trust: ad tech should not be asked to accept a dependency on Chrome generated cohorts while at the same time competing with google ads business
... hard for the open web to trust a black box Chrome generated cohort
... Finally adoption, as a human we need to stop micro and phycographic targeting
... not blaming ad tech but part of these targeting techniques, doesn't seem like a good idea for western democracies
... adoption piece is essential
... I appreciate CHrome's willingness to join in these conversations

<jrosewell> Relevant Economist article : https://www.economist.com/leaders/2020/02/20/how-to-make-sense-of-the-latest-tech-surge

Jeff: believe Chrome when they say they support the open web
... If chrome becomes the only cohort assembler, could become a lightning rod for bad conclusions people draw about advertising
... find it hard to believe GOogle would want to accept that level of responsibility
... for something like cohort assembly

scribe: Michael Kleber: thank you. Didn't talk exactly about the cohort provider in a lot of detail. Want to be clear
... This server that puts people into cohort
... For each session ID (a single browser across time), server will learn all the websites that a particular session includes
... While it won't be enough to uniquely identify, surely there are a lot of people whose browsing could be identified
... if this server is malicious it oculd leak browsing data
... underlying difficulty of cohort creation outside the browser, need to spend time talking about

Jeff: yes if the server is malicious that is possible

Kleber: MPC might make this possible without the server learning the actual information
... potentially difficult
... haven't said anything about the algorithm
... Point out how this fits into the other privacy sandbox proposals. This server is doing 2 jobs. 1 is training (looking at browsing profiles from lots of different people)
... some clustering on those
... also doing evaluation step
... slotting users into clusters
... training POV, want to ask if training is something you could do with aggregate data
... at midnight, browsers send fulll set of sites visited today
... but you only get k-anon versions
... unique browsing history would be omitted or blurred
... Would that be good enough for you to train?

Tom kershaw: where is the assmebly taking place?

scribe: Gatekeeper is doing that assembly

Kleber: thing this morning should let us provide an API that inputs list of sites a browser visited, goes through a processing scheme in the cloud that the property of the output is <set of sites, # of people who visited>
... you don't get reports one at a time, you get a pre-aggregated report
... Maybe noise added, or thresholding

Tom kershaw: would not work

scribe: who is the you? The gatekeeper is a trusted entity
... not an untrusted entity
... out there in the cloud, some mystical thing, who is doing that?
... you could have the gatekeeper do MPC

<dialtone> charlieharrison: the deck link you shared earlier seems to not have permissions open

Brad Rodriguez: to michaels point the aggregate data _could_ be sent to an untrusted server

<dialtone> oh ok, sorry

dialtone: yes I am busy taking meeting notes :)

<dialtone> sorry sorry :)

dialtone: in this proposal , gatekeeper is trusted

Tom: we do agree that only groups / aggregate should be presented to untrusted entity
... trusted entity has the same ability as cloud server / browser

Kleber: right. That's exactly the problem
... who is trusted / what they are trusted with is not a boolean attribute
... goal of all the server stuff is how to get utility from servers with _minimum_ trust
... in aggregated reporting scenarios, the servers do not learn anything at all
... from ben's presentation
... servers just do computation on random looking data
... and Chrome proposal we add additional protection on output
... What you are talking about is a server that learns very sensitive data
... trust required is much higher

Tom: possible to say the gatekeeper is distributed
... .we are talking about the same thing
... same level of trust as a browser

Kleber: difference is that my browser only knows about me
... not a good answer

Tom: can be designed such that the data the gatekeeper can provide is limited. Can add MPC

Kleber: would love to see a design

<alextcone> Michael, does a server know anything about your browser history when the sync with Google control is set in the browser?

Kleber: second key differene
... question is how the results end up getting used
... in FLOC, everyone is in only 1 cohort

<alextcone> +q

Kleber: cohort non sensitive
... can be joined to a user's contextual browsing data
... <url + cluster> in RTB
... Really need to be careful that it is not too sensitive
... urge folks to take a look at the paper published last night
... in TURTLEDOVE family of proposals is about assigning people to cohorts
... but the trade-off is that cohort is not joinable to contextual data
... independent to contextual, but can target to ads
... this idea could work if we scope it to generating TURTLEDOVE-style interest groups
... browser can't know how sensitive the cohorts are

Jeff: difference between FLOC and cohort assembly and the birds
... turtledove is just re-targeting
... stops and ends there

Kleber: but what if the entire clustering server was to create a TURTLEDOVE interest group
... server could put users in lots of IGs
... want to put people into clusters, this is a way to let you

Jeff: advertiser has to see th euser before they can be added to an IG

Kleber: disagree

Jeff: I have understood TURTLEDOVE as retargeting

Kleber: retargeting is just one thing. It is about grouping people and showing ads to groups of people

Jeff: How could that happen?

Kleber: Your existing idea

Brad: The servers could chain together s
... history. TURTLEDOVE as currently proposed could not do that.
... more aggregate of behaviors

Kleber: This is what you should be proposing
... server that does the job of server learning aggregate browsing behavior in some way
... use that to let browser know which IGs to join
... Magnite's IGs
... Magnite's IG can be used to target ads
... every person can be in many IGs

Wendy: apologize for opening up queue

<bleparmentier> Just want to say that we have a second prez going^^

Wendy: get back to the conclusion of the presos

Jeff: just want to close with a comment, would love to see details on how we Magnite could get access to aggregate browsing data

<kleber> Sorry I got carried away, everyone

Basile: we have been discussing FLOC today
... I am surprised because we almost never discuss FLOC
... POV that Google is throwing lots of birds to hide the FLOC
... nothing on FLOC at TPAC
... Wondering is it going to be a main proposal or not
... should be put forward more from Google if so

Kleber: quickly answer
... FLOC got a lot of discussion at end of 2019
... Since 2020 spending more time on TURTLEDOVE
... discourse threads on both FLOC & TURTLEDOVE

<jrosewell> +1 to Basile - no other participant supports at WICG - https://discourse.wicg.io/t/proposal-federated-learning-of-cohorts-floc/4473

Kleber: For FLOC not a lot of messages
... why there is more interest in TURTLEDOVE
... encourage folks to show interest in FLOC if interested

Basile: want to say there is two parts
... how can we trust gatekeeper, could be extended to other use-cases
... why we think there is value in certifying sparrow GK
... 3 aspects to billing trust
... 3. Design will have a lot of impact
... must agree on the role of 3p server

<brodriguez> Also related, it seems the closest thing we have to Michaels suggestion of private cohort generation is Proprietary Cohorts: https://discourse.wicg.io/t/proposal-proprietary-cohort-generation/4704

Basile: once we agree on role, we can see how we can technically help wit htrust
... MPC, etc
... open source
... ad tech is quite complex, technical may not be enough
... 2. process auditing

1. tech constraints

scribe: Zoom on auditing process
... code of conduct. Certification entity can audit based on
... certification can be costly, imprortant to have a way of paying ofr things
... if no interest in running a server, no one will do it
... trusted server needs to pay for certification
... Happy to find another way
... this is something widely accepted, but happy to find another wya
... if GK does not follow rules, loss of certifications. E.g. could not receive interest groups
... GK also needs to be payed
... in sparrow mostly advertiser that will pay
... other possibilities though
... important not to forget incentives
... may adjust audit process based on technical details
... good way of introducing a trusted server. Want to hear feedback
... maybe better way to do it?

AramZS: biggest issue is that there is a lot of incentive to misuse, all data going to GK
... misuse privelge in holes in the system. ALready have plenty of examples of "legit" organizations within ad-tech misusing
... a new place where misuse could be centralized is dangerous
... servers would also become targets for hackers
... or just distrupting ad-tech economy
... lastly, certification. Troubling that the payment is coming from advertisers
... Don't have an alternative, but advertisers would have the most incentive to de-anonymize user data
... Incentive misalignment
... Conflict of interest that would turn off most people from this proposal

Basile: happy to see another way of payment
... This is not a neww problem
... lots of industries have gone with this model
... Other point, this is why the full design needs to be taken into account. when you receive extremely sensitive data that could be used for tracking, need really strong design
... MPC
... If trusted server only gets partial information (like SPARROW). requirements still high but lower
... code of conduct for each proposal
... then we have audits, technical mechanism, same for all GKs but not all GKs wil lbe the same
... Naive GKs will require smaller requirement
... Idea is to agree on a bar, then we have a way to check that the bar is met

Jeff: there are different payment models e.g. by CPM
... requests by pubs / advertisers
... to your other point, honeypot of user data. The alternative isn't any better
... alternative hands keys to Chrome that also runs an ads business
... turned the balance of power between browsers and open web in favor of browsers
... wont have an open web left

AramZS: Standard trying to meet, can you identify user data

<jrosewell> +1 Jeff we are talking about the future of the open web, not just advertising

AramZS: if you are sending user data to a server on the web, people will have objections. Alternative living in Chrome, hope other browsers will adopt
... even if other browsers did not adopt, I think existing within the browser is a place (regardless of Google's browser), we can create more accountability
... more than the many ad tech companies and cause problems
... Agree would prefer not to be reliant on a company that owns an ads and browser business
... Have trouble trusting other players too, and them being on servers is much less transparent. Can open up browser and see what it is doing
... Prepping for other feedback

Tom kershaw: If trust is in the hands of 2-3 companies, you will have more problems than a distributed

scribe: pluses and minuses
... there has to be trust somewhere
... what we are trying to do with TURTLEDOVE is no trust anywhere. Admirable but we need to make sure we can accomplish

AramZS: A system where no trust is required seems to be the way to go

<wseltzer> s/somewhere, or put trust in engineering/

<Karen> Scribenick: Karen

<charlieharrison> thanks :D

Wendy: thank you Karen and Charlie
... as we go into the final stretch, we have a queue
... work to wrap up the session
... James

James: To pick up on Aram's point on audit
... auditors who deal with problems
... that question can be overcome, works well in other industries like finance and oil
... ask those industries to help inform the dialogue
... businesses must be allowed to choose whom they trust
... protect ability of businesses to make a choice
... to pick up on malicious server, trust

<AramZS> to be blunt here: I would not consider the energy industry or banking industry to be good examples of well audited industries at this current time.

James: leads to questions less about engineering and more about policy
... we need to understand that explicitly
... what evidence is there if people trust Google as their browesr
... facets of info being processed
... what role would identifiers play in that

<bleparmentier> I just disconnected

James: some of the largest fines for privacy in Europe have involved Google
... yesterday we saw from Aram the research
... that dropping cookies is less prevalent in Europe
... thank you for arranging that session
... these are issues for which we created the success criteria document

<AramZS> Sorry, are we talking about Garrett's presentation? I wouldn't want to take credit for that.

James: that document is now part of the chartering of the decentralized identifier group
... take that to next stage
... also there is a breakout session on trust
... ultimately I am convinced, having heard the conversations,
... that these policy issues need to be resolved first or we will go round and round

<jrosewell> I think the big 4 auditors, or other auditors, should answer Aram’s question about this is overcome. It happens in many industries. Agree it is not a new problem. Maybe we should ask them to be involved? Any business can trust others. It’s important to have choice.

James: not hearing you; we will come back

<jrosewell> Pick up on Michael’s, Jeff’s, Tom’s and Aram’s point about malicious server, resolving the question of trust, creation of groups. This leads to questions.

Gang: thank you, folks for publishing the Gatekeeper proposal

<jrosewell> Under what conditions, if any, could Google trust another entity? What with?

Gang: if I understand it

<jrosewell> What evidence is there that people trust Google or its browser? Or do people prefer one company to process all their personal data vs have fragmented facets of their information processed by a competitive ecosystem? The largest fines for privacy breeches in Europe have involved Google.

<jrosewell> What role do pseudonymous IDs play compared to directly-identifiable IDs?

Gang: sensitive user must be on that gatekeeper

<jrosewell> Aaron showed yesterday laws like GDPR have a demonstrable impact the privacy of the open web.

<jrosewell> This is the sort of issue others people and I created the success criteria document to provide a method of answering back in the spring.

Gang: a lot of code in one server has a lot of IP for a lot of adtech companies

<charlieharrison> OK third time's the charm for my presentation: https://docs.google.com/presentation/d/1dOfUXD8lQy8svDeoi60yB5V3vI5gT0VCWsQlByEpHEU/edit?usp=sharing

<jrosewell> There is now a charter for the Decentralization Interest Group to further these documents and help with these sort of problem. https://w3c.github.io/charter-drafts/decentralized-charter.html

<charlieharrison> sorry for the run-around

Gang: are you concerned that some adtech companies will not want to reveal their IP and be hesitant to adopt
... large code base; adding lots of features to it; can the auditors keep up with all this change
... is the code privacy preserving enough; and will this slow down the development process?

<jrosewell> There is also a breakout session next week to discuss definitions of parties and trust. https://www.w3.org/2020/10/TPAC/breakout-schedule.html#party-time

<jrosewell> Ultimately these policy issues need to be resolved before we do anything else.

bleparmentier: did not hear all of what you said
... speed up development because piece of code is not private enough, I think this is an issue
... even if I don't see many cases arise, but it will
... I think it's going to be an issue with whatever proposal we do
... pace at which we will be changing stuff is a big issue
... we will need to ask for the browser to do it
... I have no interest to do it
... sorry if I did not answer your questions due to my connection

Gang: I can repeat the questions
... question number one
... are adtech companies comfortable to show their core competitencies
... question two, can auditors keep up
... and how does that impact development?

bleparmentier: on second part
... I don't think any change needs to be in the sense there will be an API
... ensures data that goes in will be @
... datea goes out
... I don't think as long as gatekeeper, no changes in code needed
... big ML modules
... change does not need to be checked by auditors
... as long as function has clear open points
... in Sparrow you get a request
... and you return the bit
... doesn't always need to be checked by auditors
... I don't see how it could be done
... maybe not have two latencies
... not everything needs to be checked for every change
... way happier to trust
... where thee will be access to the @
... JS...
... while still agents
... in the browser
... do feel that from adtech POV
... everything is a trust server is way better
... than having it running on a competitor
... Google is a competitor to most of us

Tom: Adtech industry would support a clear and open process
... majority are built on open source
... opportunities for open source and for billing proprietary on top of it

kleber: apologize for getting carried away earlier and redesigning the Magnite proposal
... on high level question of gatekeepers
... I am extremely supportive of this avenue of investigation
... and make these privacy preserving approaches
... with gatekeeper rather than on device computation
... The on-device models have downsides
... clear ways to have some centralized server involvement with on device computation
... not all gatekeepers are created equal
... point out that three operate at very different levels
... Dovekey is probably compatible with MPC model
... where server doesn't learn anything at all
... aggregated measurement proposal can be aggregated like taht
... Gatekeeper from sparrow...does not build user profile over time
... not on a user by user level
... gatekeeper of clustering in Magnite proposal is to build some information about a person across time
... those are three extremely different things
... device computation is best, but not practical
... where you cannot do entirely on device
... first goal is to do device stuff to be done on browser
... or be in a position if gatekeeper went rogue or was malicious or hacked
... then we do best job we can to protect user's information
... what I just answered is a Chrome centric version
... I know we have people from Safari and Edge
... wonder if you are open to trusted servers, gatekeeper types of things
... are you swayed by arguments that servers learn as much as possible
... or by code audit
... love to hear multi-implementer positions

Erik: At high level MS and Edge team is supportive
... of having some form of servers to have reasonable levels of monetization while still protecting privacy
... we are still trying to reason over what it looks like for third party
... how privacy preserving can we be with various use case
... third party and first party both have inherent risks
... never a data breach, cannot guarantee it
... do users trust us to do what we say we do
... users might trust if they understood who is doing it
... a bit wishy washy
... we would like to make auditing for ourselves as simple as possible
... have not precluded data flows
... comes down to complex set of balances and tradeoff

<jrosewell> We shop stop thinking in terms of third and first parties. It's about trust choice, under what conditions, audit requirements, and supply chain choices. It's not as simple as 1st and 3rd.

bleparmentier: does this make sense?
... fact that we have a design that @
... with the auditing of the process?

ErikA: any auditing
... if browsers have a list of third parties they trust in default
... not sure user would understand that
... if auditor is doing something, cannot spend six years in the bowels of company to understand the code
... what Google was talking about before, wilful IP blindness
... if easier to assert some privacy protections upfront to certify the rest, that would be valuable
... the big three/four auditing firms, comes down to what they can understand
... examples of auditors who did not understand exactly what they signed off on
... issues with complexity

Mehul: interesting from auditor and advertiser perspectives, users also need to understand what is happening

Michael: we talked about that this morning
... there is no chance users are going to understand DNS or device
... at some level we need to make those guarantees, but technical details out of scope

Mehul: from targeting pov, how well it does
... privacy service could provide
... can see aggregate data
... for certain things

<jrosewell> The partners of auditors are personally liable for failing. When they get it wrong they loss everything. Big incentive to do a good job. https://en.wikipedia.org/wiki/Arthur_Andersen#Enron_scandal

Mehul: would come as benefit not just to advertiser, but also benefit to privacy as a cost

Michael: I completely agree with that
... if multiple ways to implement server side API
... can look fore more private ways to improve privacy of the web
... prioritize over those that have privacy based policies
... Tess, eager to hear if Apple has thoughts?

Wendy: not sure

Tess: A big question
... not sure how to succinctly answer it
... that's all i've got

Michael: Ok

Wendy: thank you
... I think it's a good sign that we are having to wrap up in the middle of vibrant discussions
... we are at the end of our time slot
... I want to note a few administrative pieces
... First, thanks to everyone who spoke, scribed and participated
... we will bring topics back to future meetings
... we had over 100 people on these calls
... thank you for the discussions
... TPAC continues next week with self-organized break-out sessions
... schedule online
... most are at 14:00 UTC
... a few scheduled a couple other times
... and a conflict with our usual 11am Tuesday slot
... suggests that our next Adv BG meeting should be Tuesday, Nov. 11th
... I will extract subjects from the minutes
... host the materials and presentations
... and I welcome other inputs from here
... how do we continue these discussions in productive ways
... I gathered a few items
... from places where we were diving deep today

[Wendy reads list]

scribe: we could discuss function and role of a server side component
... can we make that a piece of designs that work to serve our joint goals
... invite people to raise issues in Github
... and raise issue by email
... discussion for future meetings to keep this conversation going
... I will also send a survey around

<Alan> [still 65 people here at the end of the f2f!]

scribe: love to gather feedback on what worked well, what to do differently, and when to do another meeting like this
... after a bit of a break to catch up
... we still have 65 people at the end
... Apologies to all of the questions and conversations we did not get to

<jrosewell> We need to also address the key questions raised earlier. Progress will be so much easier if we do.

scribe: I look forward to getting back to those in future meetings
... thank you
... we are adjourned

<wseltzer> [adjourned]

- DRAFT -

Improving Web Advertising BG
22 Oct 2020

Attendees

Contents

State of the art on multi-party computation (MPC)

Measurement: Reporting APIs, WebView, cross-site and cross-device measurement

Gatekeeper: browser/server tradeoffs, certification

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output

- DRAFT -

Improving Web Advertising BG 22 Oct 2020

Attendees

Contents

State of the art on multi-party computation (MPC)

Measurement: Reporting APIs, WebView, cross-site and cross-device measurement

Gatekeeper: browser/server tradeoffs, certification

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output

Improving Web Advertising BG
22 Oct 2020