W3C

– DRAFT –
IWABG one-off, Q&A on FLoC Origin Trial

01 April 2021

Attendees

Present
btsavage, Deepak_Ravichandran, eriktaubeneck, jeff_burkett_gannett, Joey_Trotz, jonasz, Jukka, JulieK, lpilot, mallory, nics, paul-selden, pedro_alvarado, sharif, wbaker
Regrets
-
Chair
Wendy Seltzer
Scribe
Karen, Karen Myers

Meeting minutes

<apireno_groupm> present_

Wendy: We have a lot of folks on the call; thank you for joining us for this one-off
… Logistically we are meeting as a one-off of the Improving Web Adv BG; using same irc channel for notes and queuing
… channel is #web-adv
… if you could also please present+ yourself to let us know you are here
… Our agenda today is to get deeper into the queue of question people have about the FLoC origin trial

<GarrettJohnson> what's the link for the FLOC Origin Trial github please?

Wendy: while I had offered a presentation time, I heard that there is nothing to present
… You have links to the FLoC materials
… and FLoC proposers are happy to hear questions and provide answers in preparation for the origin trial
… and to learn more about this proposal
… to see how it meets overall goals of use cases, meeting privacy-preserving goals of web platform
… I think that's enough of an introduction
… Does anyone else have introductory question?

Q&A on FLoC Origin Trial

Ben: I would like to dive into details

Wendy: Go fot it

Ben: I would love to be super clear in my head on two things
… as I understand, the test run was a live A/B test

Michael: Could I interrupt before we go on?
… are we talking about the FloC origin trial and making them available to people to try out
… that there was press about

<AramZS> I would like to talk about the first plz

<AramZS> and mostly the first

Michael: or are we talking about the A/B test of ads running Google simulation
… I thought it was the Chrome thing
… Looking at agenda, my impression was the origin trial; not prepared to talk about last year

Ben: That's fine
… my question on that topic
… is the following
… Looking at various negative feedback that the FLoC proposal has received, and where that is coming from
… what is the underlying nature of that criticism
… Because of the way the FLoCs are constructed in an algorithm
… we don't know who is in that block
… theoretically there could be one or more FLoC
… where composition of the FLoC varies greatly

<GarrettJohnson> I thought the whole point of this meeting was to talk about the FLOC experiment? As it was planned long before the origin trial was announced (yesterday).

Ben: Take some sensitive characteristic of people for whatever definition of that
… and say what is percentage of people with that characteristic, say 3%
… there could also be a FLoC with a higher percentage of representation in that FLoC
… and someone will be able to figure out one FLoC has a higher background rate
… so learning someone's FLoC ID could dramatically improve probability that someone belongs to a sensitive group
… So maybe you can increase to 20% knowledge of in that group
… if API by default, you could leak sensitive data about people to any web site on the web

<wseltzer> GarrettJohnson, the original request for a meeting was "so that participants can be as prepared as possible for origin-trials"

Ben: in that case, is this an API you plan to gate behind a user's consent
… to potentially be leaking sensitive information about people?

Michael: That was a lot of material
… and a bunch of questions
… Let me try to organize a variety of responses
… First of all you mentioned "unsupervised clustering"
… that is correct; FLoC is about unsupervised clustering based on the browser
… Chrome felt that was an important part of the clustering of FLoCs
… you may think of clustering as a ML problem
… goal, group people together as useful for some N state
… and tell algorithm which people to group together with others, and iteratively improves the algorithm
… as you train it
… that is a supervised, standard ML approach
… that is NOT what we are doing in FLoC
… that actually is what we would do if we used Federated Learning, which is the "FL" in "FLoC
… so there is not ML
… FLoC algorithm is clustering people together without an objective function for what makes people similar to each other
… We measure attributes of people, do some random number generation...carve up into regions
… and what region is which FLoC cohort you end up
… We have mechanism to make sure every region we call FLoC cohort has thousands of people in it
… and there is some sensitivity stuff
… but we are not picking the regions for the purpose of Google Ads for example
… if we were carving up, there would be more discussion
… this unsupervised clustering doesn't have a Google-created objective function
… that is an important part of this effort
… When different people run tests on FLoC
… we expect to see tests useful for a bunch of other things
… that discussion of what FLoC is useful for, will be part of the origin trial
… I hope you, Ben, FB, will participate in the discussion of what FLoC will be useful for
… and we expect to add other clustering techniques
… just one way is M@@
… hope you will experiment and give feedback on
… That's the first thing on various clustering techniques and unsupervised learning
… Next is the sensitivity question; hope people have looked at the white paper we just published
… if someone can drop in a link
… Ben, you asked the right question
… Are there sensitive atributes in population at large
… and in an elevated rate in some cohorts
… that is the right way to think about the question

<mjv> Sensitivity whitepaper: https://docs.google.com/a/google.com/viewer?a=v&pid=sites&srcid=Y2hyb21pdW0ub3JnfGRldnxneDo1Mzg4MjYzOWI2MzU2NDgw

Michael: The answer we have tried to use in first iteration of FloC
… is to look for cohorts that imply something sensitive
… using only notion of "sensitive" that Chrome has at its fingertips
… have you visited a page that is sensitive according to these pre-determined sensitive page classifiers
… which Google ads uses, and there are policies that relate to sensitive pagese
… you can read all about it from the explainer
… and link to Google policies on sensitive material
… Properties we can evaluate
… does this FLoC make it clearer to you
… is for browsing sensitive content
… I won't say that the only thing about someone is that they browse sensitive content
… many things could be sensitive about something
… Definition that makes most sense in browser context is the visiting of sensitive pages
… that is a starting place
… we hope others will participate in this discussion to expand our notion of sensitive over time
… This is the first clustering origin
… we expect FLoC clustering utility will improve
… and we're sure there are ways to improve FLoC clustering

Wendy: Thank you and Marshall shared a link to the sensitive cohorts in the minutes
… Someone asked when origin trials start?

Brendan: as far as I understand, the Chrome origin trial is a matter of enable it in Chrome 89
… is that a correct understanding, and what population will have it enabled without those browser flags
… so that we can evaluate it in publishers

Michael: Absolutely, let me go through exactly where we are
… This week we are in the middle of turning on FLoC
… using a complement of experiment frameworks
… one thing there is an origin trial going on
… FLoC is a new API, not part of OWP
… we let people try things out
… mechanism to register for an origin trial token
… and insert token into that web page
… and there is a new API
… if that token is in place, you might see token, "Navigator.Interestcohort"

Brendan: origin trial is publisher centric?

Michael: We expect a lot of companies to be adtech and a wide variety of publishers
… this is called a third party origin trial
… you can register for a third-party token
… not a question of which first party
… user is visiting
… its which third party is asking to use this
… and get a lateral cross-section of pages

@: So a third-party must have someone's server call the toke

n

<maddy_want> q*

<mjv> https://web.dev/origin-trials/

<mjv> https://web.dev/third-party-origin-trials/

Michael: Have to have JS served from their server that runs on some web page where they run trial and find the cohort
… and that script puts the token in the page
… Marshall has posted some links

<Joey_Trotz> does the publisher have to have the token in place as well to permit use of the third-party origin call?

Michael: about origin trials and third-party origin trials

<Joey_Trotz> on their domain

Michael: That was the easier part
… Origin trials are supposed to be small things, for a small set of domains testing it
… and not exceed half a percent of all pages on the web
… hard to do when widely used third parties want to use it on the web more widely
… that is more complicated tastk
… API only exists if you are in origin trial and plunk down token
… there is an experiment only running on half a percent of population that makes that token create the API
… on other 99% it doesn't apply

bmay: Is that fraction of browsers...

Michael: 199/200 won't get it
… be consistent whether it's there or not
… everything I said
… Once this trial is available
… not sure if it's 0.5% stable
… Maybe Josh knows

<jkarlin> Not yet. Just canary/dev right now.

Michael: our usual path to turn on
… is first in Chrome developer, then Chrome beta, then Chrome stable
… you are welcome to try Chrome nightly build

<AramZS> *participate

Michael: it is coming to Chrome stable soon at 0.0%
… hope that answers questions

Brendan: There is a blog out there that says there are Chrome flags that says you can use it

Michael: If you want to try this API on your own personal browser
… and not be .05%

<mjv> Information on what command flags you can use: https://web.dev/floc/

Michael: there is command line to force on trial
… I think web.dev/floc includes that

mjv: yes it does

Michael: It doesn't have the right flags; that article has an outdated set of flags
… not the current version

Marshall: we will get it updated soon

<Jessica> can you repost the link to the sensitive cohort whitepaper please?

Michael: Apologize; we will get you an updated set of flags

<Brendan_IAB_eyeo> https://docs.google.com/a/google.com/viewer?a=v&pid=sites&srcid=Y2hyb21pdW0ub3JnfGRldnxneDo1Mzg4MjYzOWI2MzU2NDgw

<Brendan_IAB_eyeo> ^^ whitepaper

Michael: if you use the flags now, you will see a 20-digit long number
… that is an internal representation that has not been converted into FLoC
… if you see a number in the tens of thousands, then you are seeing a FLoC
… between 1 and 33K

<imeyers> so https://floc.glitch.me/ also has incorrect flags listed?

Michael: around 33K FLoCs total
… if you see a number bigger than that, it is not a FLoC ID

James: thank you for arranging this session
… I'm based in UK
… I attended a session with BBC, NYTimes, Finanacial Times, Google,
… Google rep talked about pretty good campaign performance
… and talked about justification
… it was unquestioned and unsubstantiated
… I think it's important to understand the benefits
… we can review work of others and conduct experiments ourselves
… and I understand we can look to future
… and not the past 95% prior work number
… I don't think that works established norms for academia or standards setting
… and in Europe we are unable to participate
… other events set the narrative
… and set their strategies
… extremely important that the data is available for public inspection
… so we don't get into this situation again
… Not kid ourselves that these discussions don't impact the market
… discuss how data can be made available for inspection

Michael: let me be clear about a couple things
… the FLoC origin trial we are about to turn on
… that is the first time anyone can experiment with FloCs created by Chrome
… hasn't been any experimentation before

<imeyers> ah, i see there are *different* flags provided on glitch, which may be correct.

Michael: you mentioned a publication about Google ads
… that was them taking their own Google ads information and simulating the FLoC assignment algorithm
… that paper took diff FLoC assignment tecniques, and running experiments
… and what they hoped FloC might look like in Chrome

<nomad_manhattan> The expectation is that the user’s FLoC will be updated over time; is there a cadence? same consideration/concerns around cookie churn. Should the FloC is constantly being updated or rotated, it would create additional complexity for targeting accuracy

Michael: the actual algorithm we chose is similar to the algorithm used last year
… nobody has been able to experiment with the actual algorithm FLoC used until now
… We appreciate feedback from anyone running experiments
… and people here have been involved in design
… people have run those experiments and our design has been influenced
… we heard about what would be helpful, and now is the first time there is an actual thing in Chrome for anybody to try to use it

James: Thank you for making that clear
… would be helpful for a retraction of what Peter @ said
… that there was pretty good campaign performance
… But let's look to the future; what are we doing with these experiments so there is public inspection available
… I gave pre-amble about market-impacting

Chetna: I am happy to address that
… origin trials is time for key participants to lean in and do the testing
… Google ads will do same thing
… original results was on simulations
… no intention to retract those
… that was simulation data
… we got

<jdelhommeau_> looking at origin trial registration for FLoC, it isn't clear to me what "origin" we should provide during registration? Is it the adtech domain that is called on the page? For example, if my embedded code calls adtech.com domain, should https://adtech.com be the origin to provide in registration?

Chetna: but it's important for as many participants to lean into the experiments
… we can get back to details on the technical questions
… absolutely, the data will be made public and share this

James: We can be sure you will be sharing your data

Chetna: of course, that is what we have been doing

James: Not sure I agree; not sure details of experiment were fully understood

Wendy: sounds as though we have a future direction

Sharif: Is there a way to estimate the size of a FloC during origin trials or @

Michael: so that is a good question
… certainly observationally, you can see how many people are on a web site with a FLoC
… FLoC doesn't come imbued with a signal about how big different FLoCs are

<jrosewell> The video of the panel session I referred to is available here - https://wmg.wavecast.io/marketing-and-media-effectiveness-in-a-cookie-less-world/live

Michael: we ensure FLoCs will be size of at least 2K
… average number of people
… cohorts will be of different sizes; we will learn by observation

Sharif: No plans to extend the API

Michael: that is true
… even Chrome doesn't know how many people are in each indiv FLoC
… this data does not exist, sorry

Sharif: Thank you

Pedro: Will it be possible to take advantage of other proposals
… such as aggregated reporting API
… to take advantage and learn about @@

Michael: FLoC and aggregate measurement should play nicely together
… how Chrome hopes to use that, when it exists, to know how big FLoC is
… we plan to do that using aggregated measurement tech; that is all forward-looking
… IF you are using FLEDGE and want to know how many people are in an IG and what to use aggregated measurement to figure out, that seems reasonable
… if wanting to see who visited, you can count them
… no magical thing to know size of all the FLoCs of browser you never interact with

Brian: So I have a follow up
… we are in position where we cannot run scripts that interact with servers of most of our clients
… we depend upon statically executed pixels for scripts
… how can we get access to data
… how long can we access API from a static script

Michael: Accessing API from a static script
… If you are working with a site that wants to become involved in the origin trial, they can get a first-party origin trial and turn on FLoC API on their site directly
… in that case, API, subject to limits, would be available to any third-party script on your page, included a static script
… that is one possibility
… once FLoC exits origin trial
… a regular API part of OWP
… you could employ it to invoke API and no origin trial token would be needed at that point
… those are only ways to get at cohort ID for now
… was discussion about using HTTP headers
… no way to make them compatible with origin trials
… no other way to make cohort available

Brian: if you can suggest how we can get a head start
… if we wait for things out of origin trials, that puts us behind

Michael: Can you put an iframe on those pages?
… that would be another way to do it

Brian: No

Michael: You can ask your partners to expand your logs to include the cohort ID that corresponds
… and give us the logs and include the cohort ID as way to enrich the data source
… cannot think of anything else to do

Brian: Would a data set be made available about some info about FLoC
… understand it's hard to add additional attributes around FLoC so as not to be sensitive

Michael: I don't think Chrome is in position to enrich data any more than what we are doing
… sorry

Brian: ok

Michael: I will keep thinking about it and let you know if I have other ideas

Kanishk: we had questions around this
… the algorithm used by Google ads and the one being used by Chrome is different
… how different is it, and how do you determine success criteria
… and once GA, what are criteria around that you use?

Michael: An interesting question
… i don't know the details of the clustering algorithm that Google Ads experimented with
… what is running in Chrome is what we have been working out to get balance of privacy and utility
… any evaluation of benefits of different FloC clustering algorithms will also be a balance
… we expect to get feedback from the community and those in this group

Kanishk: We will build N-state products
… around FLoC
… and how we see
… we want to make sure it doesn't change from under us
… how much time, how do you envision that happening?

Michael: right; we feel we have freedom to experiment right now
… What FLoC clustering algorithm used on what browser
… in final steady state when FLoC is a released API
… changing the algorithm will be disruptive, so we expect any modifications of algorithm will happen slowly and with clear communications
… and feedback in discusisons
… we don't want to change the clustering algorithm out from under you

Kanishk: How often will FloC update?

Michael: Once a week is answer for this initial origin trials
… but as we figure out privacy and utility features...that is case once a week for now,
… but it might change in the future
… and there would be a different label

MichaelMN: Can you say more details about the actual implementation?

MichaelK: each invidivual user
… it computes the FLoC and sets a stop watch and 7 days later it says I should recompute my FLoC
… it will change

Brian: Is there a limit to the number of data points before you tag a cohort?

Michael: yes, there are a variety of elements in place
… can you post, Josh, link with tech details on the...

Josh: yes

Michael: quick summary
… there are a bunch of limits in place
… before we assign someone a FLoC
… one is they need to have visited a bunch of different sites, not a single site for privacy risk
… one of things FLoC clustering algorithm is to collect people with similar browsing histories into FLoCs
… I believe it's at least 7 different domains
… a bunch of other requirements also
… you can see a page to read more technical details

Brian: Seven different pages in a period of time?

Michael: yes, within the 7 days

Maddy: Tactical, mechanical question
… the FLoC IDs, about 33K
… intention is for them to stay the same
… and for users to move in and out of them
… even though not named
… if we see a certain FLoC, they will move in and out of it
… and not be replaced by a future ID

<jkarlin> Some data about flocs is here: https://www.chromium.org/Home/chromium-privacy/privacy-sandbox/floc We'll add the details that Michael mentioned to that page.

Michael: yes, that is exactly right
… what the FLoC ID number means as behavior
… then people move in/out of cohort as their behavior changes

Maddy: How would a buyer know whether that ID knows if it represents a FLoC they want to buy?
… how do they get that information about whether to purchase?

Michael: answer is somewhat similar to how you figure whether a third-party cookie is what you want to guy on

s/buy
… you might come up with some belief about person behind that cookie
… FLoC should be same thing
… If you spend a day looking at behavior in FLoC 1-2-2-4
… you should be able to assign interests to a FLoC as a whole group
… and that is something you should have some belief about whether to target a buyer

Wendy: We have five minutes left

<Zakim> AramZS, you wanted to say I assumed but want to double check: first parties may also permit in this origin trial in the usual way, correct?

Wendy: I want to ask about the right way to queue up further questions for discussion?
… moving on to Aram

Aram: my initial question already answered
… So it's different sites
… can choose to opt out of FLoC

<GarrettJohnson> Can we pick this up in the main meeting? Last meeting was crickets...

Aram: for sensitive categories, some people may not see Google categories as being sensitive enough or in right ways
… can users decide how to opt out
… this site I want to black list for my FLoC membership?

Michael: That is a really interesting question
… right now, FLoC is off if you turn off third party cookeis
… there will be a control in future about privacy sandbox APIs to let you turn them off

<jrosewell> Added my question regards experimentation results being available for public inspection to ensure accurate interpretation as a GH issue - https://github.com/WICG/floc/issues/86

Michael: question of more refined controls to FLoC clustering algorithm is hard
… depends on notion of what it is
… hope conveyed, is that everything about the algorithm is subject to change
… we don't know the correct answer for what the FloC control should look like
… our UX research is attuned to risk
… and not over-promise control to peopole
… and if people see ads about topic X
… they might feel that control did not live up to its promise
… and not convey notion that control has more power than we, Chrome, can give it
… details of what control we can have, is a subject for on-going research

<kris_chapman> no worries

Wendy: apologies, we need to close here
… Suggestion that we take up questions in a future meeting
… we have a few things queued up for next upcoming calls
… also welcome questions on Github and upcoming calls
… Thanks to Chrome team for sharing all this information

Michael: Thank you for all this engagement the WICG repo is best place to ask questions

Minutes manually created (not a transcript), formatted by scribe.perl version 127 (Wed Dec 30 17:39:58 2020 UTC).

Diagnostics

Succeeded: s/Brendan/bmay/

Succeeded: s/Brian/Brendan/

Succeeded: s/@:/mjv:/

Succeeded: s|web.dev/@|web.dev/floc|

Succeeded: s/Shariff/Sharif/

Succeeded: s/of/of at least/

Succeeded: s/to know/to use that, when it exists, to know/

Succeeded: s/Brendan/Brian/

No scribenick or scribe found. Guessed: Karen

Maybe present: @, Aram, Ben, bmay, Brendan, Brian, Chetna, James, Josh, Kanishk, Maddy, Marshall, Michael, MichaelK, MichaelMN, mjv, Pedro, Wendy