IWABG one-off, Q&A on FLoC Origin Trial

01 April 2021


Wendy Seltzer
Wendy: We have a lot of folks on the call; thank you for joining us for this one-off
… Logistically we are meeting as a one-off of the Improving Web Adv BG; using same irc channel for notes and queuing
… channel is #web-adv
… if you could also please present+ yourself to let us know you are here
… Our agenda today is to get deeper into the queue of question people have about the FLoC origin trial

Wendy: while I had offered a presentation time, I heard that there is nothing to present
… You have links to the FLoC materials
… and FLoC proposers are happy to hear questions and provide answers in preparation for the origin trial
… and to learn more about this proposal
… to see how it meets overall goals of use cases, meeting privacy-preserving goals of web platform
… I think that's enough of an introduction
… Does anyone else have introductory question?

Q&A on FLoC Origin Trial

Ben: I would like to dive into details

Wendy: Go fot it

Ben: I would love to be super clear in my head on two things
… as I understand, the test run was a live A/B test

Michael: Could I interrupt before we go on?
… are we talking about the FloC origin trial and making them available to people to try out
… that there was press about

Michael: or are we talking about the A/B test of ads running Google simulation
… I thought it was the Chrome thing
… Looking at agenda, my impression was the origin trial; not prepared to talk about last year

Ben: That's fine
… my question on that topic
… is the following
… Looking at various negative feedback that the FLoC proposal has received, and where that is coming from
… what is the underlying nature of that criticism
… Because of the way the FLoCs are constructed in an algorithm
… we don't know who is in that block
… theoretically there could be one or more FLoC
… where composition of the FLoC varies greatly

Ben: Take some sensitive characteristic of people for whatever definition of that
… and say what is percentage of people with that characteristic, say 3%
… there could also be a FLoC with a higher percentage of representation in that FLoC
… and someone will be able to figure out one FLoC has a higher background rate
… so learning someone's FLoC ID could dramatically improve probability that someone belongs to a sensitive group
… So maybe you can increase to 20% knowledge of in that group
… if API by default, you could leak sensitive data about people to any web site on the web

Ben: in that case, is this an API you plan to gate behind a user's consent
… to potentially be leaking sensitive information about people?

Michael: That was a lot of material
… and a bunch of questions
… Let me try to organize a variety of responses
… First of all you mentioned "unsupervised clustering"
… that is correct; FLoC is about unsupervised clustering based on the browser
… Chrome felt that was an important part of the clustering of FLoCs
… you may think of clustering as a ML problem
… goal, group people together as useful for some N state
… and tell algorithm which people to group together with others, and iteratively improves the algorithm
… as you train it
… that is a supervised, standard ML approach
… that is NOT what we are doing in FLoC
… that actually is what we would do if we used Federated Learning, which is the "FL" in "FLoC
… so there is not ML
… FLoC algorithm is clustering people together without an objective function for what makes people similar to each other
… We measure attributes of people, do some random number generation...carve up into regions
… and what region is which FLoC cohort you end up
… We have mechanism to make sure every region we call FLoC cohort has thousands of people in it
… and there is some sensitivity stuff
… but we are not picking the regions for the purpose of Google Ads for example
… if we were carving up, there would be more discussion
… this unsupervised clustering doesn't have a Google-created objective function
… that is an important part of this effort
… When different people run tests on FLoC
… we expect to see tests useful for a bunch of other things
… that discussion of what FLoC is useful for, will be part of the origin trial
… I hope you, Ben, FB, will participate in the discussion of what FLoC will be useful for
… and we expect to add other clustering techniques
… just one way is M@@
… hope you will experiment and give feedback on
… That's the first thing on various clustering techniques and unsupervised learning
… Next is the sensitivity question; hope people have looked at the white paper we just published
… if someone can drop in a link
… Ben, you asked the right question
… Are there sensitive atributes in population at large
… and in an elevated rate in some cohorts
… that is the right way to think about the question

Michael: The answer we have tried to use in first iteration of FloC
… is to look for cohorts that imply something sensitive
… using only notion of "sensitive" that Chrome has at its fingertips
… have you visited a page that is sensitive according to these pre-determined sensitive page classifiers
… which Google ads uses, and there are policies that relate to sensitive pagese
… you can read all about it from the explainer
… and link to Google policies on sensitive material
… Properties we can evaluate
… does this FLoC make it clearer to you
… is for browsing sensitive content
… I won't say that the only thing about someone is that they browse sensitive content
… many things could be sensitive about something
… Definition that makes most sense in browser context is the visiting of sensitive pages
… that is a starting place
… we hope others will participate in this discussion to expand our notion of sensitive over time
… This is the first clustering origin
… we expect FLoC clustering utility will improve
… and we're sure there are ways to improve FLoC clustering

Wendy: Thank you and Marshall shared a link to the sensitive cohorts in the minutes
… Someone asked when origin trials start?

Brendan: as far as I understand, the Chrome origin trial is a matter of enable it in Chrome 89
… is that a correct understanding, and what population will have it enabled without those browser flags
… so that we can evaluate it in publishers

Michael: Absolutely, let me go through exactly where we are
… This week we are in the middle of turning on FLoC
… using a complement of experiment frameworks
… one thing there is an origin trial going on
… FLoC is a new API, not part of OWP
… we let people try things out
… mechanism to register for an origin trial token
… and insert token into that web page
… and there is a new API
… if that token is in place, you might see token, "Navigator.Interestcohort"

Brendan: origin trial is publisher centric?

Michael: We expect a lot of companies to be adtech and a wide variety of publishers
… this is called a third party origin trial
… you can register for a third-party token
… not a question of which first party
… user is visiting
… its which third party is asking to use this
… and get a lateral cross-section of pages

Michael: Have to have JS served from their server that runs on some web page where they run trial and find the cohort
… and that script puts the token in the page
… Marshall has posted some links

Michael: That was the easier part
… Origin trials are supposed to be small things, for a small set of domains testing it
… and not exceed half a percent of all pages on the web
… hard to do when widely used third parties want to use it on the web more widely
… that is more complicated tastk
… API only exists if you are in origin trial and plunk down token
… there is an experiment only running on half a percent of population that makes that token create the API
… on other 99% it doesn't apply

bmay: Is that fraction of browsers...

Michael: 199/200 won't get it
… be consistent whether it's there or not
… everything I said
… Once this trial is available
… not sure if it's 0.5% stable
… Maybe Josh knows

<jkarlin> Not yet. Just canary/dev right now.

Michael: our usual path to turn on
… is first in Chrome developer, then Chrome beta, then Chrome stable
… you are welcome to try Chrome nightly build

Michael: it is coming to Chrome stable soon at 0.0%
… hope that answers questions

Brendan: There is a blog out there that says there are Chrome flags that says you can use it

Michael: If you want to try this API on your own personal browser
… and not be .05%

Michael: there is command line to force on trial
… I think web.dev/floc includes that

mjv: yes it does

Michael: It doesn't have the right flags; that article has an outdated set of flags
… not the current version

Marshall: we will get it updated soon

Michael: Apologize; we will get you an updated set of flags

<Brendan_IAB_eyeo> ^^ whitepaper

Michael: if you use the flags now, you will see a 20-digit long number
… that is an internal representation that has not been converted into FLoC
… if you see a number in the tens of thousands, then you are seeing a FLoC
… between 1 and 33K

Michael: around 33K FLoCs total
… if you see a number bigger than that, it is not a FLoC ID

James: thank you for arranging this session
… I'm based in UK
… I attended a session with BBC, NYTimes, Finanacial Times, Google,
… Google rep talked about pretty good campaign performance
… and talked about justification
… it was unquestioned and unsubstantiated
… I think it's important to understand the benefits
… we can review work of others and conduct experiments ourselves
… and I understand we can look to future
… and not the past 95% prior work number
… I don't think that works established norms for academia or standards setting
… and in Europe we are unable to participate
… other events set the narrative
… and set their strategies
… extremely important that the data is available for public inspection
… so we don't get into this situation again
… Not kid ourselves that these discussions don't impact the market
… discuss how data can be made available for inspection

Michael: let me be clear about a couple things
… the FLoC origin trial we are about to turn on
… that is the first time anyone can experiment with FloCs created by Chrome
… hasn't been any experimentation before

Michael: you mentioned a publication about Google ads
… that was them taking their own Google ads information and simulating the FLoC assignment algorithm
… that paper took diff FLoC assignment tecniques, and running experiments
… and what they hoped FloC might look like in Chrome

Michael: the actual algorithm we chose is similar to the algorithm used last year
… nobody has been able to experiment with the actual algorithm FLoC used until now
… We appreciate feedback from anyone running experiments
… and people here have been involved in design
… people have run those experiments and our design has been influenced
… we heard about what would be helpful, and now is the first time there is an actual thing in Chrome for anybody to try to use it

James: Thank you for making that clear
… would be helpful for a retraction of what Peter @ said
… that there was pretty good campaign performance
… But let's look to the future; what are we doing with these experiments so there is public inspection available
… I gave pre-amble about market-impacting

Chetna: I am happy to address that
… origin trials is time for key participants to lean in and do the testing
… Google ads will do same thing
… original results was on simulations
… no intention to retract those
… that was simulation data
Chetna: but it's important for as many participants to lean into the experiments
… we can get back to details on the technical questions
… absolutely, the data will be made public and share this

James: We can be sure you will be sharing your data

Chetna: of course, that is what we have been doing

James: Not sure I agree; not sure details of experiment were fully understood

Wendy: sounds as though we have a future direction

Sharif: Is there a way to estimate the size of a FloC during origin trials or @

Michael: so that is a good question
… certainly observationally, you can see how many people are on a web site with a FLoC
… FLoC doesn't come imbued with a signal about how big different FLoCs are

Michael: we ensure FLoCs will be size of at least 2K
… average number of people
… cohorts will be of different sizes; we will learn by observation

Sharif: No plans to extend the API

Michael: that is true
… even Chrome doesn't know how many people are in each indiv FLoC
… this data does not exist, sorry

Sharif: Thank you

Pedro: Will it be possible to take advantage of other proposals
… such as aggregated reporting API
… to take advantage and learn about @@

Michael: FLoC and aggregate measurement should play nicely together
… how Chrome hopes to use that, when it exists, to know how big FLoC is
… we plan to do that using aggregated measurement tech; that is all forward-looking
… IF you are using FLEDGE and want to know how many people are in an IG and what to use aggregated measurement to figure out, that seems reasonable
… if wanting to see who visited, you can count them
… no magical thing to know size of all the FLoCs of browser you never interact with

Brian: So I have a follow up
… we are in position where we cannot run scripts that interact with servers of most of our clients
… we depend upon statically executed pixels for scripts
… how can we get access to data
… how long can we access API from a static script

Michael: Accessing API from a static script
… If you are working with a site that wants to become involved in the origin trial, they can get a first-party origin trial and turn on FLoC API on their site directly
… in that case, API, subject to limits, would be available to any third-party script on your page, included a static script
… that is one possibility
… once FLoC exits origin trial
… a regular API part of OWP
… you could employ it to invoke API and no origin trial token would be needed at that point
… those are only ways to get at cohort ID for now
… was discussion about using HTTP headers
… no way to make them compatible with origin trials
… no other way to make cohort available

Brian: if you can suggest how we can get a head start
… if we wait for things out of origin trials, that puts us behind

Michael: Can you put an iframe on those pages?
… that would be another way to do it

Brian: No

Michael: You can ask your partners to expand your logs to include the cohort ID that corresponds
… and give us the logs and include the cohort ID as way to enrich the data source
… cannot think of anything else to do

Brian: Would a data set be made available about some info about FLoC
… understand it's hard to add additional attributes around FLoC so as not to be sensitive

Michael: I don't think Chrome is in position to enrich data any more than what we are doing
… sorry

Brian: ok

Michael: I will keep thinking about it and let you know if I have other ideas

Kanishk: we had questions around this
… the algorithm used by Google ads and the one being used by Chrome is different
… how different is it, and how do you determine success criteria
… and once GA, what are criteria around that you use?

Michael: An interesting question
… i don't know the details of the clustering algorithm that Google Ads experimented with
… what is running in Chrome is what we have been working out to get balance of privacy and utility
… any evaluation of benefits of different FloC clustering algorithms will also be a balance
… we expect to get feedback from the community and those in this group

Kanishk: We will build N-state products
… around FLoC
… and how we see
… we want to make sure it doesn't change from under us
… how much time, how do you envision that happening?

Michael: right; we feel we have freedom to experiment right now
… What FLoC clustering algorithm used on what browser
… in final steady state when FLoC is a released API
… changing the algorithm will be disruptive, so we expect any modifications of algorithm will happen slowly and with clear communications
… and feedback in discusisons
… we don't want to change the clustering algorithm out from under you

Kanishk: How often will FloC update?

Michael: Once a week is answer for this initial origin trials
… but as we figure out privacy and utility features...that is case once a week for now,
… but it might change in the future
… and there would be a different label

MichaelMN: Can you say more details about the actual implementation?

MichaelK: each invidivual user
… it computes the FLoC and sets a stop watch and 7 days later it says I should recompute my FLoC
… it will change

Brian: Is there a limit to the number of data points before you tag a cohort?

Michael: yes, there are a variety of elements in place
… can you post, Josh, link with tech details on the...

Josh: yes

Michael: quick summary
… there are a bunch of limits in place
… before we assign someone a FLoC
… one is they need to have visited a bunch of different sites, not a single site for privacy risk
… one of things FLoC clustering algorithm is to collect people with similar browsing histories into FLoCs
… I believe it's at least 7 different domains
… a bunch of other requirements also
… you can see a page to read more technical details

Brian: Seven different pages in a period of time?

Michael: yes, within the 7 days

Maddy: Tactical, mechanical question
… the FLoC IDs, about 33K
… intention is for them to stay the same
… and for users to move in and out of them
… even though not named
… if we see a certain FLoC, they will move in and out of it
… and not be replaced by a future ID

Michael: yes, that is exactly right
… what the FLoC ID number means as behavior
… then people move in/out of cohort as their behavior changes

Maddy: How would a buyer know whether that ID knows if it represents a FLoC they want to buy?
… how do they get that information about whether to purchase?

Michael: answer is somewhat similar to how you figure whether a third-party cookie is what you want to guy on

… you might come up with some belief about person behind that cookie
… FLoC should be same thing
… If you spend a day looking at behavior in FLoC 1-2-2-4
… you should be able to assign interests to a FLoC as a whole group
… and that is something you should have some belief about whether to target a buyer

Wendy: We have five minutes left

Wendy: I want to ask about the right way to queue up further questions for discussion?
… moving on to Aram

Aram: my initial question already answered
… So it's different sites
… can choose to opt out of FLoC

Aram: for sensitive categories, some people may not see Google categories as being sensitive enough or in right ways
… can users decide how to opt out
… this site I want to black list for my FLoC membership?

Michael: That is a really interesting question
… right now, FLoC is off if you turn off third party cookeis
… there will be a control in future about privacy sandbox APIs to let you turn them off

Michael: question of more refined controls to FLoC clustering algorithm is hard
… depends on notion of what it is
… hope conveyed, is that everything about the algorithm is subject to change
… we don't know the correct answer for what the FloC control should look like
… our UX research is attuned to risk
… and not over-promise control to peopole
… and if people see ads about topic X
… they might feel that control did not live up to its promise
… and not convey notion that control has more power than we, Chrome, can give it
… details of what control we can have, is a subject for on-going research

Wendy: apologies, we need to close here
… Suggestion that we take up questions in a future meeting
… we have a few things queued up for next upcoming calls
… also welcome questions on Github and upcoming calls
… Thanks to Chrome team for sharing all this information

Michael: Thank you for all this engagement the WICG repo is best place to ask questions

