W3C

– DRAFT –
Improving Web Advertising BG

23 February 2021

Attendees

Present
AramZS, arnaud_blanchard, aschlosser, blassey, bleparmentier, bmay, Brendan_TechLab_eyeo, btsavage, dialtone, Diarmuid-CRTO, Dinesh, dkwestbr, dmarti, ErikAnderson, eriktaubeneck, GarrettJohnson, gendler, imeyers, joelstach, jrosewell, Karen, kleber, kris_chapman, Mike_Pisula, mjv, mserrate, nics, pedro_alvarado, pl_mrcy, robin, shigeki, wbaker, wseltzer
Regrets
-
Chair
Wendy Seltzer
Scribe
Karen, Karen Myers

Meeting minutes

Wendy: Welcome
… thanks Deepak for the presentation I see on screen

Deepak: Sure, thank you

Wendy: Our agenda today

[Wendy review agenda]
… Next week we have a presentation on PARAKEET by Microsoft
… any additions to the agenda?
… Any introductions?
… Let's get right into our first item
… Thanks a lot Deepak
… it looks like you have materials prepared to present
… and you have questions on the list
… I invite people if hearing this, this is one experiment in which we are gathering data
… we also talked about other data that would be useful in building and understanding the privacy-preserving web
… ask if there are other questions or build other data
… as we look through the proposals
… with that, Deepak, why don't you start

Agenda-curation, introductions

Data-gathering, including FLoC questions. https://lists.w3.org/Archives/Public/public-web-adv/2021Feb/thread.html#msg9 (Arnaud Blanchard and follow-on)

Deepak: thank you, Wendy for the introduction
… can people see the presentation?

[yes we see it]

Deepak: This is a joint work with many people across research teams
… the FLoC principals I have listed here
… present cross-site tracking; cohorts consists of users with similar browsing behavior; use unsupervised algorithms
… we did not want algorithms tied to a specific use case
… and limit use of magic numbers
… and be simple enough to load in a browser
… Two parts to any clustering algorithm
… and how to do clustering
… User representation
… Take each user and put in hyperdimensional cluster space
… take user and browser history; each URL, domain visited could be one dimension
… also other interesting things like account, number of visits
… and dates associated with that and put user in hyper-dimensional space
… one is a domain feature
… it's super easy to implement
… the cons are super coarse
… if a large web site, everyone only gets one entry
… that seems too coarse and not useful, but it's easy to implement
… here is an example
… a user visit NYTimes

<wseltzer> https://github.com/google/ads-privacy/blob/master/proposals/FLoC/FLOC-Whitepaper-Google.pdf

Deepak: arts, info pages
… and you encode as features on right side [of slide]
… another way is vertical feature extraction
… a user is extracted across all verticals
… pros are better granularity
… cons are @
… take an example
… verticals are extracted in the middle
… user visited pages on left side [of slide]
… Current version of Chrome has one hot-encoding of domains
… Moving on to Clustering
… want to group users similar to each other
… and make sure users are at least K which could be 1,000
… doing in Chrome, make sure it's decentralized and auditable
… One of simplest algorithms is decentralized clustering (SimHash)
… relatively easy to implement, but tends to produce a lot of unbalanced clusters
… Sorting LSH...come up with system that is sort of balanced
… user clusters are more evenly distributed; helps in making sure advertising produces @ results while privacy metrics respected
… this is the current Chrome implementation in M80
… Evaluation
… go over the kinds of evaluation we did

Michael: I think there are questions
… should we take questions before going on?

<Zakim> AramZS, you wanted to say Does using verticals as a differentiator require that all sites involved standardize on a small set of ways to indicate a vertical and would that way require specific URL structures, or can it be based off Schema dot org data or something along those lines?

Deepak: Sure

<AramZS> can you hear me?

[no]

<eriktaubeneck> cannot hear you Aram

[Wendy reads the question]

Deepak: Great question
… different ways to do verticals

<AramZS> urk well, the question is in there, I will try to correct

<Robert> ?

Deepak: one way is each publisher...and shows verticals represented by web site; use an automated service
… and use HTML code
… with a schema
… or invite them into verticals with @
… pros and cons
… for publishers already tagged, great
… if not tagged, have chicken and mouse problem
… anything on device has challenge with accuracy and also all the system requirements that go with shipping on device
… or have some sort of combination
… any other questions?

Pedro: good morning
… in this experiment, what was the value of K used?

Deepak: that was my next slide
… we tried from one
… for third party cookies
… and experimented with various versions
… one in Chrome is K=1,000

Arnaud: thank you for this intro
… question on material you used to identify the page
… you plan to relay only on URL?
… on sub-domain components, or more ways to find what the page is really about?
… seems it is vastly different from one web site from another
… the nytimes.com/politics is explicit
… but on YouTube you might watch a trailer
… and the URL might be high level and doesn't bring value to the contribution
… do you plan on having other ways to extract data for what page is about?

Michael: let me jump in
… to clarify roles
… what we are writing for in Chrome, and is being tested, and use for origin trials in March
… cluster we will use will just look at domain
… only thing that matters is what domain name is there
… collapses and this user just visited NYTimes.com
… some more sophisticated things to do are look at entire URL, or look at words on page
… this is harder to do
… in March, we are starting with the simplest thing
… and see if feasible
… and later this year, we will try other experiments
… and assign features, and compare and contrast benefits

Arnaud: Aren't you afraid having those vastly different ways to extract data will give vastly different results?
… let's say YouTube.com, and start to include what video is about
… then it brings a tremendous amount of new info that would make FLoC far more interesting for some parties
… and less for some entitites
… I feel that testing those things changes the dynamics
… and changes way you extract the data
… how do you make those decisions?
… start at domain level; then change and scrap data on web site
… some people pull out
… they feel data is leaking, so they pull out and it decreases the value

Michael: That is an excellent question
… if we had a single answer about how to extract, we could have a conversation
… and if testing diff implementations against each other, it is harder to get a signal from world at large
… there are a lot of hard questions, and that is one of them
… Only once we have something beyond using the domain name that we think is feasible on browser
… can we have an intelligent conversation on feature extraction approach
… answer may be different with different types of extractions
… too early to tell, but you are right that it is an important question

Deepak: The system that extracts verticals....versus only extracts domains; agree with Arnaud's important question
… done on vertical side
… you can see significant differences on the right side

Wendy: thanks
… I do see several people on the queue
… I have closed it for the moment
… take a few questions and then get to the "Evaluation" part of your presentation
… invite others to queue up after

Wendy: Can your question wait until the presentation is done?
… unless a quick question for clarification, go ahead

Angelina: I think one thing to consider is to have
… we have done work on content taxonomy and audience taxonomy and granular topics
… highly recommend considering that

<AramZS> IPTC Media Topics are also a useful topic

Angelina: what we heard from taxonomy group that trusting publishers is a challenge for buyers
… how to ID the topics being identified down to the page level

Deepak: Great point

Robert: every couple of months, on third party cookie basis, you are centralizing the data and you can build the clusters
… is it intension to freeze everyone in time
… or will there be on-going learning process; how will data be centralized; how will browsers collaborate
… what is the learning on the architecture outside of the third-party cookies?

Michael: there is no centralization at all
… everybody's browsing history uses sorting type technique and not sending to central server
… how do you ensure K in that case?
… Making sure each cluster is large enough is right; secure multiparty computations
… differentially private aggregated infrastructure
… Chrome has published an explainer

<GarrettJohnson> Is there a link to the deck and/or updated doc this refers to?

Michael: where all browsers can send messages to some central location
… but impossible to decrypt
… can extract how many people in each FLoC and not allowed to use
… until multiparty computation in place...check K for anonymity
… info saved in a Google server; we count how many in each FLoC
… once multiparty computation is online, we don't count

Robert: centraization of data?
… either it's federated or somehow integrated
… with collaboration between servers?
… tell me what third notion is

Michael: Idea of secure multiparty computation...let me post a link here

Wendy: Aram, something for this round?

<kleber> Chrome's Multi-Browser Aggregation Service Explainer: https://github.com/WICG/conversion-measurement-api/blob/master/SERVICE.md

Aram: Just wondering if it's clear to publishers what topics they are categorized under
… down to page requests in the sytem
… if we go down to individual verticals
… how can publishers see what page is tagged with FLoC?

Deepak: I don't have a very good answer for that
… Michael?

Michael: Aram, can you repeat question?

Aram: if the FLoC system applies to specific pages, is it transparent to publishers?
… you are planning an origin trial
… publishers want to weigh in on the topics, how do they get visibility?

Michael: Only thing we are using is the domain name of the page
… there is no notion of assignments of pages or specific topics
… FLoC will only be useful if users infer based on FLoC; how adtech companies use this and derive some kind of meaning from FLoC
… question is trickier when assigning verticals to pages and assigning people based on verticals
… If I look at verticals someone visited, that could tell me what FLoC means, but that could be misleading
… need to look at what FloC actually does
… even if derived from FLoCs actually visited, rather than inputs to the clusters
… but it's not clear; more research needed to figure out how to use them

Wendy: Deepak, go ahead

Deepak: Evaluation slide
… you have this algorithm, how did we evaluate?
… one is to see if we have a public data set
… running open source data is challenging
… first we did algorithm on million song dataset
… we tried random, simhash, affinity centroid and sorting LSH algorithms
… on X scale we cohort size; on Y scale we have cosine similarity
… as you can see, the random performs the worst
… and affinity does the best
… every point [missed]

<wseltzer> FLOC whitepaper

Deepak: took same algorithm and ran on proprietary Google ads data set
… on display ads we tend to get conversion data
… ran on classifiers and got a tag with it
… we tried to predict based on the simulated FLoC data
… we would compute FLoC on seven-day basis
… standard blocks
… and compare to random baseline
… First algorithm is features
… first one is domains, second was doing some sort of @ with domains, third one doing verticals

<wseltzer> [figs 11a and 11b]

Deepak: for these experiments we changed the value of K
… and computed @ position
… compute the F score
… more sophisticated presentation involves vertical @
… Outside of Google ads, could do same methodology as Google ads
… so these results could be independently verified
… Second experiment is A/B experiemtn
… challenge with black box evaluation
… take one component and try to replace
… and let production system
… replace with third-party cookiers
… let system run as is
… had to make sure system run as-is. In second trial, runs
… within the allocated time because of the time constraints and other challenges
… wanted to test and simulate
… part I want to illustrate
… they were done to evaluate Google audiences
… the remarketing will be addressed by FLEDGE and TD
… what does Google audiences mean?
… Links in the explainers
… describe where Google audiences are
… points to link where audiences are described, such as affinity audiences, book lovers, people who travel to Europe
… We wanted to do an evaluation
… way to explain our experiment
… took Web 3P Cookie traffic and divided into not Chrome and is Chrome
… even if third-party cookies, don't have a long time
… need them to do the simulation
… we took the has FLoC part and took two slices
… is it remarketed or not remarketed
… we computed based on standard advertising metrics
… one is conversion per dollar metric
… we did for K for 500
… we got 95% back
… this is my final slide
… with that I'll stop
… we described some algorithms to define privacy utility
… we hope this is a framework for others to define algorithms
… happy to take questions

Wendy: one question is whether slides are linked?

Brian: I have a couple questions related to how much effort we will have to put into
… understanding what a FLoC means to us; how we go about doing that
… how stable over time, based on your experiments

Deepak: first question is how much effort to figure out what FLoC means
… from systems POV a like system for third party cookies
… there is a lot of intelligence that goes on by a DSP to understand what a user means
… can apply FLoC
… you get more info for same space
… could be some amount of work involved
… could see it being a fairly light replacement
… How stable it is, my answer is that an average user
… visits between 3-7 domains on an average day and they tend to be fairly stable over time
… the FLoCs are recomputed every seven days
… so you recompute after 7 days
… FLoC stability can vary
… but tend to remain constant over time
… may change week to week, but over time remain stable

Brian: Generally, users will be in the same FLoC consistently
… at that point there is likely to be shifting around of users

Deepak: FLoCs could change if user behavior changes, or if the algorithm changes

Brian: We could have huge populations migrating from one population to another
… is there going to be some notion of a FLoC versioning for ID; is this one we are familiar with or is it new?

Michael: Brian is right; would be hugely disruptive if too much change

[missed...please fill in]
… if we find something involving topics, we would use those FLoCs
… Chrome will change algorithm in future, and we would give sufficient notice
… if people are going to consume this ML signal
… a lot of effort to keep it as stable as we can

Brian: Go back to question on how we figure out what a FLoC is; as how we gather data about users is depricated
… have no ability to have a relationship with a small population, let alone small number of users
… would we explore part of campaign looking at which boxes performed and see if FLoC conversion was involved?

Deepak: you can look at different ways
… think of FLoC as adding to your giant prediction models
… for this current page I have this user who visited bluebook.com for used cars
… from query you can infer the cross geo
… and get info on what kinds of ads performed
… and get giant prediction
… that is one way
… another way is to bucket FLoC with all these features
… FLoC could be replacement; if coarse there are challenges
… for user defining, FLEDGE is right proposal
… use FLoC on other side of call; when ads sent to auction, something interesting there
… this is some sort of tool kit
… we can see different ways these things can conceptually be the same at a high level

Brian: I was going to let you go until you said use TD instead of FLoC
… question value of TD

Michael: Whey Chrome FLoC and TD have value
… at moment you recognize user and want to ad to your audience
… if you are building up an audience, then FLEDGE is right way
… if you want to target people in the population at large where you don't have previous relationsship or identified @
… the FLoC lets you know something
… about people at large
… depends upon what kind of campaign you are running
… both are useful for different types of ad campaigns
… once it's possible to try out FLoC and FLEDGE
… and if they don't both work out, we'll revisit our development

Brian: I think it's right that they both address two different dimensions
… for FLoC, we'll need to develop long-term relationships

Wendy: We have a long queue

Bleparmentier: if I understand, you replace user data with a bunch of FLoC
… there is no frequencyt capping
… done only on FLoC
… everything is on FLoC, no user data is used?

Deepak: Frequency capping good point
… other than that
… for predictions we make sure, even predictions algorithms we didn't do that
… we let system run as-is
… system uses contextual data
… for any part for third party system was running only for those adverisers

<jrosewell> It would be helpful to explain the last slide. Is the 95% based on the bottom right box compared to the other bottom boxes? If so was that the average of the other boxes or the sum of them?

[missed]
… for third party cookies I agree with you

Ble: you just changed; all the rest based on cookies
… it's a huge over-statement of the performance
… I was under impression that the right way of facing this
… to a lot of journalists
… if we just change audiences
… bidding info; do I understand?

<jrosewell> Could you provide the maths behind the 95% at a greater level of detail?

Deepak: Bidding is now poor
… if you take model spec for FLoC; there is a huge training-vs-scoring skew
… frequency capping is a key use case
… that could be addressed by some gradient of FLEDGE
… more or less closer to production
… skews associated with prediction were not prositively handicapped

Ble: important to understand the scale of the result
… was it in the model or not in the model?

Deepak: Not in the coding part of the model
… what I am trying to say, you have PCR model, no specific keys
… only related to features of users
… not fed into system when scored

Ble: so users removed from pipeline

Deepak: yes

Arnaud: so you mean production model, based on what you used today
… does not use user level data in the bidding

Deepak: we use user features but not user IDs

Arnaud: @@

Deepak: system is actually a lower bound; not sure if I can convince you

Arnaud: not about convincing; confusing whether FLoC would work
… disappointed that comms sent to journalists sent out was we performed at 95% level
… graph showed a tiny use case of what you can do; not precise in any of the articles
… looks like an over-simplification
… creates confusion
… I wish we had those explanations before
… we have an obligation to explain to our clients how things will work
… and having those discussions on one side and info on other, is weird conversation

Deepak: I appreciate that

Arnaud: I will keep question for next round

Ble: more detailed explanations; what was removed; lower bound
… excluding second @
… we are reading what you published
… there are five different interpretations of what you did
… if you can provide clear explanation of what was removed, etc.
… we all have different interpretations

Deepak: did you get a sense of the slicing?
… industrial is hard
… make sure you give right amount of info
… did you have a good understanding from the slides?

<wseltzer> /is/research is/

Ble: better from what was tested
… the white paper had issues because info was not clear
… it would benefit...try to explain better this experiment

Deepak: Ok, thank you

<robin> [FWIW I don't think that "too much information that might confuse people" is a real risk here]

Mehul: thank you
… for presenting the experiment
… I concur with Criteo team's concern about how it's computed
… if you take out the @ cookie; those features will not function
… you are folding 3D cookie to cluster level; seems like misrepresentation
… defining clustering; how much data it has

<jrosewell> For my part I still don't understand the 95%. I would like to see the variables and math involved. No confidential needed to do this. Several meetings ago we were told remarketing was involved.

Mehul: what data to extract; conversion modeling; audience segregation
… clustering for a small DSP; FLoC may look like random noise
… what they care about in retail is what is similar
… put two users together
… with sports or @
… that aggregation could be noisy due to misaligned targeting
… When FLoC ID is used in conjunction with geo, browser or other user features
… not sure privacy still holds
… as FLoC ID itself
… when you introduce cross products it becomes unique to user
… unless K ...

[missed]

Michael: so for the first question
… I think you are absolutely right that some Flocs, some clusters
… may be more useful for retail, some less useful as less relevant for commercial behavior
… we decided when doing clustering for Chrome, do unsupervised
… and not define type of behavior to cluster

<GarrettJohnson> I will add my voice to the Criteo guys. The white paper does not include details on the 95% figure or the "experiment slicing", so we can't assess the work. I will add that the "experiment slicing" raises in my mind experimental validity concerns (e.g. due to market spillovers) and certainly generalizability concerns.

Michael: then conversation would be how to choose which things to cluster or not; which ad network, etc.
… we deliberately are not optimizing for any particular use of FLoC. You are right some clusters are more useful
… FLoCs overall will be more useful; research tends to show that is likely

Deepak: one clarification

<AramZS> I mean even if Chrome *says* it isn't putting its finger on the weight and we don't have transparency into how the FLoC Cluster is being applied... isn't it just as questionable if something is happening as if there *was* weighting built in?

Deepak: I did not mean to say some cluster values are @
… simhash; whole technique could be useful itself
… users don't do activities in silos; go to sports, retail, news

Michael: hard to predict in advance how to...that's why we want to start origin trials
… second question you asked was about K anonymity

[missed]
… other work in sandbox; we are comfortable with FLoC the way it is now

Wendy: I apologize; there are many people waiting to ask questions
… in what forum...should we continue the discussion

Ble: Continue this discussion next week

<joshua_koran> +1 to continuing the discusssion

Ble: this is very important

<jrosewell> If the 95% can be explained as per GarrettJohnson summary for the start of the next meeting that would address my initial question. Very important to do this as soon as possible.

<btsavage> Perhaps we can schedule a breakout session on this topic?

<nics> +1 to go on this discussion

Wendy: we have some other proposals for next week
… I will work offineline and see to which meetings to bring this topic back
… assume Michael and Deepak are willing to speak further

<kris_chapman> thank you!

Michael: And happy to take questions on email and Github

Wendy: Thank you, we are adjourned

Minutes manually created (not a transcript), formatted by scribe.perl version 127 (Wed Dec 30 17:39:58 2020 UTC).

Diagnostics

Succeeded: s/body/value/

Succeeded: s/@/content taxonomy/

Succeeded: s/system/system run as-is. In second trial,/

Succeeded: s/@/remarketing/

Succeeded: s/ID/versioning for ID/

Succeeded: s/@/frequencyt capping/

Succeeded: s/@/as-is/

Succeeded: s/@/training-vs-scoring/

No scribenick or scribe found. Guessed: Karen

Maybe present: Angelina, Aram, Arnaud, Ble, Brian, Deepak, Mehul, Michael, Pedro, Robert, Wendy