Improving Web Advertising BG – 16 February 2021

Meeting minutes

<wseltzer> https://github.com/1plusX/swan

<wseltzer> https://gitlab.com/eyeo/lab/spectacle

<wseltzer> https://w3c.github.io/web-advertising/dashboard/

Agenda-curation, introductions

Wendy: Let's start by looking at our agenda for today
… we have SWAN, presentation from 1plusX

<Stephan_Porz> o/

Wendy: SPECTACLE and Crumbs from eyeo
… also saw recently a request from Arnaud for a future meeting
… to schedule discussion on FLoC performance
… Looks like a busy agenda for today, so let's see if there is any other business?
… that people think is urgent to cover
… Otherwise, are there any new participants who would like to introduce themselves?
… Don't see or hear anyone
… I see that Angelo has a presentation on SWAN, Storage with Access Negotiation

Angelo: thank you

SWAN (from Angelo Brillout) https://github.com/1plusX/swan

Angelo: Let me introduce myself
… We are a small company, don't come all the time
… My name is French, Brillout
… we are looking at proposals for how we are going to adapt
… we had ideas so here it is
… Storage with Access Negotiation
… SWAN the bird, a bit constructed

<wseltzer> https://github.com/1plusX/swan

<jrosewell> presnet+

Angelo: Profiles in the browser is a more explicit name, saying what it actually does

[presents slides]
… The use case we had in mind
… from mainly publishers and marketers who want to monetize their pseudo-anonymous profile day, in a privacy compliant way
… sticking to the privacy model sketched in past
… and mainly publisher wanting to buy additional profile data to improve the performance of my audience, in a privacy compliant way
… and as a user of the browser, I don't want my full profile to leave the browser because it has all my information
… so no one can read the whole thing
… but know who can access the profile
… How this works today
… see cookie in browser used to track server side
… users
… using this cookie, one can construct a profile over many domains
… belonging to diff parties; party 1 and 2 in this example
… audiences that uses data from A, B, C
… computed by adtech providers, pushed to adservers and ultimately to browser
… when cookie is gone
… we have silo effect
… instead of cross-domain and party profile
… we have many profiles
… A,B,C
… we compute data
… if we don't change anything, we would push these audiences to the ad servers
… three diff aud with three diff profiles, but only one user
… Voila
… You already know this what TD is doing
… proposes to collect the audiences in the browser, provided by ATPs, through marketers or publishers
… one for IGs, collect ads in browser
… and then in-browser bidding
… this is graphically what this is doing
… we have audiences A,B,C in the browser
… caused by splitting of the profile
… all collected in the browser
… instead of backend, we layer and collect; then bidding
… and displays ad at moment needed
… Shortcoming of TD
… Audiences are computed based on data of one domain only
… assume no login
… and also, a shortcoming is there is no cross-party data collaboration possible
… not possible to use data from C.com to improve the quality of my profile and quality of my audiences for targeting and performance
… Problem one, the domain problem is covered in...
… important to have notion of group domains
… browser knows some domains belong together
… perhaps a legal entity
… SWAN builds on top of this
… has goal not not collect audiences, but actually construct the profile in the browser
… which is similar to SCAUP
… we compute audience members in the browser
… memberships computed in the browser
… SWAN adds a clear profiles
… what are boundaries: is it domain, party, or cross-party
… we extend so you can add data that comes from another party to enrich profiles to improve the quality of your profile and audiences
… here is a link
… Let me show you a link in the diagram again
… Instead of audiences we have profiles
… on bottom, you see A.com and B.com belong to same party
… defined as belonging to first-party ste

<wseltzer> SWAN presentation

s/set
… we leverage this to construct one blue profile
… consisting of partial profiles of A and B.com
… audience definitions
… the membership of the audience can use data from both A and B
… you can compute audience memberships using info from both A and B
… this info is registered in the browser
… and eventually does decision to which audience it belongs to
… bidding, ad bundles, contextual requests...this remains all the same
… this is one improvement
… You can compute audiences from same party
… we propose a third-party declaration
… it's a simple extension to exisitng proposal
… you say party two declares its a partner of party on

s/one
… and this time you have audience memberships computed based on full profile
… that spans domains, domains, parties and the quality of the aud membership will be much better
… using these audiences you can then use TD to compete while bidding
… and hopefully you are selected for the winning ad
… the main properties of the proposal
… profile stored in private storate
… you cannot read from it
… similar to SCAUP
… it's a write-only script
… so you cannot get profile out of the storage
… cannot leave storate

s/storage
… they can look at features of profile, but have constraint signature
… a boolian value
… not private info
… other feature is that this profile can be seen by the browser
… here we make it clearer than in SCAUP proposal
… just want to make sure user has access to, or can read the provile

s/profile
… is disclosed and readable
… contract between two parties
… how much of data is consumed, how much used
… use aggregated reporting API
… and report how profile C is being used in the browser
… so party two can build
… for the extra data being provided
… Again, the user has poss to restrict or forbid access to profile to some third parties
… or completely shut off the local storage
… which would disable TD and SWAN
… does not allow all the ML features needed
… if you need a full profile on the back end
… It is not a decentralized, ML solution like FLoC
… but with SCAUP it could make this possible
… defines who has access and control over data
… more computation in browser, FLEDGE and TD
… this would be one additional computation; aud scripts you need to run
… depends what you want to do
… Summary
… SWAN prevents fragmentation of profiles without login
… Data collaboration across parties in browser
… and more transparency and control, can block if they want
… All the details are in the Github repository
… I am done from my side; happy to answer any question

<wseltzer> https://github.com/1plusX/swan

Wendy: thank you for that presentation
… we don't hear you Brian
… we'll come back

Brendan: thank you for sharing for today
… discussion you are talking about, first party-sets and keeping data structured with affinity to a site
… do you see calculation...or only to browser to determine segment memberships

Angelo: if on backend could not leverage capabilities to have cross-domain and party view
… can be possible in browser only
… cannot build with Chromium privacy model
… if you want to do a decision of which IGs or whatever called
… that spands many of properties across domains and properties, you have to take this decision in browser

Brendan: is origin browser behavior, or some additional info delivered to browser?

Angelo: not a pure edge computing thing
… back end what you can still do on server side
… still probably ML applications; good profiles on some domains and do some pre-processing
… do on backend and push a summary in form of some features
… push to browser
… same on domain
… eventual decision happens in the browser
… Did I answer question? Cool.

Brad: Were you envisioning limits on how many third party sets a given domain could be joining or the size of those sets?

Angelo: no, no limits
… don't see reason why we would need this
… It's important to actually allow big alliances in general, to have this larger profile
… limit would be arbitrary
… depends upon how many actors
… we would need to learn empirically
… too difficult or too much data in browser, would make sense to limit
… but in proposal there was no limit
… does this satisfy the question?

Brad: yes
… a couple other clarifying questions
… Any thoughts on limiting the amount of audiences scripts?

Angelo: also not
… we can put a limit on the number per domain or per party you could say
… could make sense
… otherwise you can completely clog the browsers
… I think we would need to learn empirically what is feasisble
… If I say 10, not sure if good number or not
… see a clear reason to do this

Brad: did you give thought to adding noise or limits to the data that could be stored?

Angelo: no, nothing
… you mean silos of partial profiles?

Brad: item, not setting or changing value, or some other noise mechanism?

Angelo: no noise mechanism
… why do noise to preserve privacy?

Brad: yeah, to preserve privacy

Angelo: moment profile is allowed to leave browser
… profile is in almost write-only storage
… script has limited signature
… you can say true/false for given audience, in/out
… limit number of browsers to be in a audience, so you can preserve privacy
… I don't see need to have noise

Brad: one reason to provide noise is for plausible deniability
… your example, is female; or is gay; if user clicks on ad
… based on that
… you would be linking that info in a cross-site manner

Angelo: I would not see how we would be able to leak this information
… who has access to "is gay" audience
… maybe as an attribute in storage
… but how do you get out of it?

Brad: I put an audience script in for one that is gay and one that is not
… landing page is diff for each
… that info would be conveyed

Angelo: so this is a problem
… so yes, noise would improve in this case; a good point
… if user has introspection on his profile
… perhaps need for introspection on script
… and understand how he is being put into an audience
… and understand he is being put into a problematic IG
… if you give introspection...but noise solution is also possible

Brad: your proposal raises some interesting questions

Brian: I had same question as brad on propogation
… private storage won't allow read capabilities
… wouldn't it make it easier to manage in private storage?

Angelo: If I can read, I can extract...
… you could read it only using a script specific to a domain
… this is a possibility, sure
… if for some reason you need to read info, but always in your domain silo
… if you can read outside scripts, but not outside domains
… tech domains are all origins
… I simplified that
… there it makes sense; reading within the silo makes sense

Brian: as third party running script in browser, I don't have insight if my data is being meaningfully used on anything
… this sort of thing requires some way to id data that is not serving useful purpose

Angelo: in the Github proposal we have method called get item
… you are allowed to with third party set declarataion
… you could read feature of third party
… and be reported
… so third party would be able to understand how often
… or with other aggregations how data is being used
… maybe I did not understand question

Brian: using aggregated reporting API

Angelo: every time you call this method, you do an increment
… can think of other methods
… but just counting some features have been accessed or maybe just a profile of item on feature level
… and report this
… and for conversion measurement we need this reporting capabilities
… extend to include data usage from third parties

Brian: thinking about case where data is stored, but not accessed, so not usable

Angelo: access but not useful
… nothing comes to me to do such differntiation

Ben: I am not sure
… I understand what use cases enables that SCAUP doesn't already enable
… I think you can place adtech JS across many sites
… and not declare all of that; leverage it all together
… and then browsers to those IGs
… is there a specific use case this enables? What is the difference

Angelo: What is not clear to me is who owns/controls the data
… ATP can provide own profile
… how do you authenticate
… first, who owns data; to me its the origin, the domain
… from browser...hidden behind domain, parties
… why this proposal refines how to organize the data
… from a use case perspective now
… So in SCAUP, how to make available data to a third party
… if marketer, some consumption on my ecommerce site
… not retargeting use cases, something more complex
… how do I sell this data to a publisher and monetize the data

Ben: how to sell the data

Angelo: to create meaningful profiles..not in SCAUP

Ben: my understanding that data is partitioned by registered domain

[missed]
… all could place JS across domains
… and based on that, use SCAUP to add to various audiences
… not sure about the selling bit

<wseltzer> https://github.com/google/ads-privacy/tree/master/proposals/scaup

Angelo: a bit convoluted with SCAUP
… a bit unclear what happens if an alliance loses a member
… how do you get out this data
… these are more technical questions
… The goal here is to reuse other proposals
… and to make sense out of other proposals
… TD, SCAUP and extension of first and third-party sets

Erik: thanks for your proposal
… Wanted to follow up on Brad's comment around privacy
… similar to idea that the single bit revealed could be sensitive
… and you have no limits on groups
… groups could create a fingerprint
… leaking one bit doesn't make it private, unfortunately

Angelo: my understanding is that leaking to outside world makes it a fingerprint, but that is not happening
… it happens same as TD
… IGs called Audiences; so same rule applies here
… don't see why it would not apply to SWAN

Erik: in TD, that info is only used in the auction
… hidden from outside page
… there is a browser element being built
… to hide from outside context
… that would only be learned in aggregate
… if anybody else, including domain, publisher showing ad
… is able to see these, then you could construct a finger print, trivially

Angelo: proposal doesn't leak
… main difference is audience decision is not on back end but on front end
… otherwise same thing as TD
… I still have issues to see in proposals where the membership would leak

Erik: the piece I am unclear about is that it's read-only except in signature where it only gives you a zero-one output
… TD doesn't give you that property
… that read with a signature limited to one bit
… Maybe I am misunderstanding what you mean
… In the auction that only works in same way TD figures out who is eligible to see ad
… if a function to see which person is in which group
… you could construct a handful of groups and see

Angelo: This blue script, audience definition
… does a binary decision
… or gay/female
… and decide yes/no if I am part of the IG
… and registers just like TD
… the audience definition script is only one that has access to profile
… not supposed to be able to read from this storage; it's private, outside of definition script
… this script only outputs IG memberships
… and then acts same as TD
… collects in browser
… two unrelated requests; decision happens in the browser
… not sure if I clarified?

Erik: What is unclear why this cannot happen in TD as is
… there is this ability to run this script and see IGs that have been created as long as site is given access to those
… not sure if there are folks here from TD who see the differences more clearly than I do?

Wendy: not seeing anyone
… I wonder if you want to take that conversion to an issue thread in the repository
… and possibly come back for further discussion

Angelo: Yes, I would like to discuss further details

Wendy: Thanks for the presentation

<Brendan_eyeo_techlab> https://gitlab.com/eyeo/lab/spectacle/-/blob/master/README.md

SPECTACLE and Crumbs (from Brendan Riordan-Butterworth)

Wendy: and start with discussion that you wanted to queue up

<Brendan_eyeo_techlab> https://crumbs.org/

Brendan: doesn't have a bird name, but addresses in a certain way; published in Nov 2020
… prototypes concepts
… and wanted to present today alongside SWAN
… it is client-side profiling of audiences
… a bit different; a direct deal with user and profiling system
… presents to the user: "here's the trade"
… in browser we will profile you in exchange for giving you privacy features
… right now there is a lot of material about CRUMBS
… and first part of SPECTACLE about a collection of privacy features
… to induce users to install software
… what's interesting for this group is local profiling
… agent
… deal between user and profiling system
… we don't care what web site they go to; create one global profile
… and solicit info from user on what things to infer like vendor and location
… goal of CRUMBS is to have that conversation with the user
… what web sites you visit, other info you want to disclose
… testing hypothesis
… on what is being browsed
… tradeoff with audiences
… users will trust installed software when we couple with control
… give option to toggle off certain segments they don't want to shrae

s/share
… and disclose things that are relevant only
… an implementation that takes this forward
… current version doesn't include cohortization
… what we do is create a large number of segment membership on device
… and based on ML
… these are segments to assign to the member
… have something that works now
… engage what we do
… random sampling and share with advertising systems
… and share server side
… That is not inline with FLEDGE
… we see path where @ to cohrots

[too fast]
… that concept with centralized server managing the thinking
… instead of sending raw data to server
… have computed segment memberships
… instead of sending all cleanly; doing research into differential privacy
… we know systems align in predictable way
… can make inferences on size of cohorts
… and that can feed into the FLEDGE decisioning being envisoned
… with CRUMBS and SPECTACLE, we hope to establish that there can be more than one party assigning a cohort membership
… empower publisher choice
… a practical implementation and we are excited to support the FLoC and FLEDGE cohort support
… and couple that with local profiling

Wendy: Thanks, Brendan. Where would you like feedback?

Brendan: GitLab; feedback welcome there
… reach out to me or team directly
… we are updating our licenses
… check it out; it is not an ad blocker; it's a tracking blocker
… outreach as well; we are happy to talk to more partners
… as participate in this W3C process and look at how to use FLEDGE
… love to hear thinking about the privacy implications of creating a profile and how to share that?

Wendy: questions or comments from others?

Ben: I'm just skimming proposal quickly
… and might have wrong sense
… Idea is people are installing a custom extension where they share their email address
… for purposes of re-enable cross-site tracking with data access for controls?

Brendan: email is optional...done with local profiling agent
… seeing email-based cross-site tracking
… we are seeing email aliasing
… use different domains; not have that identity
… goal is to create a profile
… and expose that to user so they have control
… so transparency and control
… email is optional and only for that email aliasing feature

Ben: this local profile
… idea that this extension is a proof of concept; or long-term plan for browser extension?

Brendan: Long-term, browser extension and profiles from not only one source

Ben: How do you use for ad selection?

Brendan: We do ad requests with segments discovered
… if you said 17 membership this browser has every single time you went there
… right now doing a random selection of them
… looking for a better solution; path to cohorts
… as you do a server that will count how many members are in a cohort and maintain anonymity

Ben: Sounds like PIGIN proposal
… where IG lives only in browser

Brendan: we need something now that works with server-side decisioning
… when it moves to client, there are other ways to integrate
… profiling engine and decisioning engine
… APIs for communicating between those two is where we want to see standardization

Don: In some of FLoC discussion we cover whether training should be opt-in or out for site
… FLoC has permissions policy for the site
… Not seeing anything on SPECTACLE fo how site can opt-in to IG data collection or opt-out

Brendan: that is interesting
… haven't updated proposal since November
… as FLoC fine tunes and ownership gets integrated, we would like to support
… Chrome is deal with @ and profiling agent
… publisher control hasn't been a priority yet

Don: thank you

Michael: Sorry I missed an interesting day of discussion
… I apologize
… This does as Ben mentioned seem reminiscent of ideas
… ideas of taking a random subset and sending those
… in PIGIN we try to take one value but that is hard
… working with privacy researchers, picking a random subset
… was enough to recognize exactly who the user was
… enriching ad request with info
… seems hard to do
… my primary worry

Brendan: my concern and our team's
… it is not as private as FLEDGE envisions
… but more private than having a unique identifier on the request
… we see it as transitional phase
… as rest of scenarios are being worked on
… we proxy all the ad and bidder requests
… removes some info from what a malicious actor could collect

Wendy: We are at time for today
… thanks for both for these enriching presentations and for all the work that has gone into them
… we'll be back again on the 23rd [Feb]

<wseltzer> [adjourned]

<Brendan_eyeo_techlab> Thanks all!

– DRAFT –
Improving Web Advertising BG

16 February 2021

Attendees

Meeting minutes

Agenda-curation, introductions

SWAN (from Angelo Brillout) https://github.com/1plusX/swan

SPECTACLE and Crumbs (from Brendan Riordan-Butterworth)

Diagnostics