Ad Measurement and Privacy
18 Sep 2019


taraw, Alan, christine, kleber, scottlow, yoav, heejin, blassey
John Wilander


<toml> Seems to be working?

johnwilander: Hello. Talking about measuring effectiveness of ads, while still preserving privacy.
... Cookies, 1st 3rd parties.

<Alan> scribenick: mkwst

johnwilander: Tracking prevention. "Bits of entropy"
... Ads come in a few flavors.
... There are ads that are "viewed", ads that are "clicked on", might be rich media that the user watches.
... These may lead to "conversions", causing the user to take some action the advertiser desired.
... These result in an "attribution.
... An ad placed on a site caused a conversion, advertiser is happy, publisher is paid.

<scribe> ... Done today via cookies. Can we do this in a privacy-preserving way?

UNKNOWN_SPEAKER: Don't want to reveal that a particular user is the one who saw the ad and converted.
... Apple propsed "Private Click Measurement".
... Google proposed something similar: "Click-through conversion measurement"
... Brave has "Private Ad Conformations"
... WebAdvertising Business Group proposed "Private Web Audience Measurement" document.

<christine> Apple - Private Click Measurement - https://wicg.github.io/ad-click-attribution/index.html

UNKNOWN_SPEAKER: Session goals: privacy model for ad measurement, 1st/3rd parties.
... What's needed to turn on PCM
... Joint proposal?

<Alan> [to be precise, it's officially called Improving Advertising on the Web BG]

UNKNOWN_SPEAKER: What happens beyond clicks? View attributions, video ads, ...
... Format:
... General intro including PCM
... [something]
... Back to ads:
... PCM targets clicked ads, not views, not video.
... You click an ad, and in Apple's proposal, the last conversion event will be attributed/reported.
... What is ad click attribution
... Say you're searching for a kite.
... Search engine has a cookie for you attached to your search history.
... You click an ad on the search engine, taken to an advertiser site.
... You put the kite in a shopping cart, that's a conversion.
... Advertiser might put a pixel on their page telling the search engine that a conversion happened. "John added a $30 kite to the cart."
... Note that these pixels are added regardless of whether the user clicks an ad or not.
... In browsers that block cookies, these requests will be cookieless, so the attribution no longer works.
... The pixel request back to the search engine doesn't have a cookie, no attribution possible,
... Can we support click attribution without cookies?
... PCM
... You click the ad, you're taken to the shopping site. In the `<a>` tag, you can ad metadata.
... Ad destination, and ad campaign ID.
... Where is the user being sent, and which of my ads was it?
... Browser stores these two pieces of information.
... When a conversion happens, a well-known URL is requested (perhaps as a redirect from the preexisting pixel).
... This well-known URL causes the browser to grab the attribution information from its internal storage.
... At some point in the future, the information is sent to the search engine.
... "Some user who saw ad X converted with Y."
... Privacy Model
... Should 1st or 3rd parties receive attribution reports?
... In Safari, only the 1st parties get the reports.
... Let's say the ad was displayed on a news site: the news site would get the reports, not the 3rd party ad network.
... Who should issue trust tokens?
... We'd like to combat fraud so that these reports can't be issued with `curl`.
... Should the browser issue tokens? Trusted 3rd party? Click destination? Click source?
... Shold click sources or destinations receive attribution reports?
... Should these clicks ever be able to be tied to individuals?
... Open for discussion.
... kleber: Hi. I'm Michael Kleber from Google, involved Google's proposal.
... To be clear, the Google proposal has two pieces.
... Click-based model, similar to this one.
... Also an aggregated one, which is distinct.
... Happy to talk about all of these? Also happy to let others hop in.

johnwilander: Talk about them!

kleber: Ok!
... To give context about why click attribution happens, let's talk about the ecosystem.
... We talk about things here as if a publisher site does something, or an advertiser site does something.
... But really, almost everything we're talking about here is done by third-parties.
... The important party here is the ad network acting on behalf of the advertiser.
... In your example, `kite.com` is not in the business of displayig ads about kites. They outsource that work to an advertising network that specializes in that capability.
... The shopping site has entrusted the ad server to spend its money. Shopping site has to spend money to get the ad to appear.
... Ad network's job is to use the advertisers' money wisely to get the ad to show up in places that will cause conversions.
... Picking places to display the ad where the value to the advertiser is as high as possible.
... Advertising on the internet works because the web is good at providing enough data to allow you to understand the relationship between the money you're spending and the money you make because of it.
... Figuring out which sales fit which which ad display opportunities is critical to the whole operation.
... Every time the ad network gets an opportunity to put an ad somewhere, it needs to decide whether that opportunity is good or bad for a given advertiser.
... They need conversion information in order to make these decisions.
... This means that the attribution report clearly needs to go to the entity that's making decisions based on that information.
... In this case, it must be the ad network acting on behalf of the kits site.

ah. thanks. :)

johnwilander: We at Apple started from a user's perspective.
... Users don't know about the ad networks. They know about the site where the ad was displayed, and they know about the site where the conversion happened.
... If the browser is going to reveal information, the only thing we could explain to the user would be in terms of those two entities.
... From that point, of course, the site which gets the information can decide to forward the information somewhere else.
... That suggests that sending the report to the advertiser would be a better model, since there's a relationship between the advertiser and the ad network.
... But should always land somewhere the user knows about before going anywhere else.

kleber: I thnk you're right that the report could start on the advertiser or publisher site.
... My view of the web is that first-parties should have freedom to get other parties to do stuff for them.
... Delegate to other folks rather than doing everything myself.
... I prefer to make it as straightforward as possible for that delegation to happen.
... The thing that seems most reasonable to me is for the browser to enable that directly.
... An alternative is to send the data to `kites.com`, and have them send it on.
... Of course, further communication can happen on the backend, server-to-server.
... But it would be cleaner if the web would take care of that use case.

yoav: It seems like there's visibility loss if that's the case.
... Server-to-server is invisible to the user.

johnwilander: I think we don't have visibility anyway. Anyone can forward data all over the place.
... Showing the first parties is the only thing users can understand.
... Going back to sending to advertiser:
... we have to take abuse cases into consideration. Some actors would add metadata to all links, just to get reports back about activity on other sites.
... If data only goes to first parties, then third-parties couldn't game it.

englehardt_: Are you worried about sending to third-parties because of IP addresses? If you had Tor, would you be worried?

johnwilander: Probably. Also worried about pixel requests. We're only removing the cookie, after all.

blassey: If the concern is having third-parties add themselves as attribution endpoints, you could presumably control that with a policy. One reporting endpoint for a site, etc.

johnwilander: Third-parties have convinced sites to paste their scripts in. Security/privacy disaster.
... Can pressure sites to do things. "You must give us reports, otherwise you won't get dollars."

blassey: Sure, but if you restrict to a single endpoint for a given party, that isn't as much of a problem.

johnwilander: Ah. That's an interesting idea.

toml: If it's a service, that seems like something you don't need to delegate to a distinct domain.

<kleber> mkwst: cnaming means that the 3rd-party gets cookies for the 1st-party

<kleber> ...perhaps on other requests — cnaming is bad hygine

toml: The concern is that I would hit that domain in some other way and deliver cookies?

yoav: An image on that domain could send credentials.

englehardt_: Would locking to one reporting backend limit competition?

kleber: I think it's plausable that `kites.com` would have multiple providers, yes.
... Might make deals with both Google and Facebook, for instance.
... Distinct networks. So, yes, that's a risk.

blassey: If you set up one network for a first-party, it might act as an agent for the rest, forwarding things on.

johnwilander: Let's move on to Trust Tokens.
... Regardless of where that reporting request is sent, it's vulnerable to fraud.
... No cookies, so no authentication.
... It would be good to fix that problem somehow.

kleber: Trust Token session later today.
... But, at a high level, no. No promise that Trust Tokens solve this problem.
... Trust Tokens are one of the tools we might be able to use in the absence of cookies.
... I don't think they're the end of the story, they're just a useful part of the story.

johnwilander: Would they get us back to where we were before?

kleber: No. Let's talk about it in the other session.

johnwilander: One major difference between Apple and Google's proposal is ...
... [slides]
... Entropy.
... WebKit -> 6-8 bits of entropy.
... Click source and click destination will have ~6-8 bits of entropy: 0-63 to represent ad campaign and conversion.
... Neither side can tie the value to a user identity (unless they have 64 users).
... Google -> 64 bits available.
... Clearly can identify an individual. Only the click side, though. Only 3 bits on the conversion side.
... We think neither side should get that entropy.

kleber: So, why do we have this very large number?
... As I said, we think the information should go to the entity that caused the ad to run. Ad network.
... Our proposal is that the attribution says "This particular ad lead to a conversion".
... The closed loop we want the ad network to learn is which ads caused conversions, benefited the advertiser.
... We want to tie the specific click event to its conversion.

<toml> +q to talk about the challenge posed

kleber: There was an opportunity to show an ad. Some algorithm lead to displaying the ad.
... Train the model with lots of input signals, one bit of output.

<Zakim> toml, you wanted to talk about the challenge posed

kleber: Over lots of one bit outputs (did this ad convert or not?), you get a better outcome for the advertiser.

toml: The ad serving model you'd like to maintain does not allow for a distinction between the parameterization of your ML model and a reasonably global unique ID.
... As long as we're in the world you want, we can't resolve the privacy commitment in a way that's interogatable at the client.

kleber: I wouldn't expect the 64 bits to describe the ML parameters. I'd expect the ad network to create a unique event ID, tied to those parameters.

tom: Sure. But I, as a person interacting with a client, can't distinguish the 64 bits as being a lookup into an event table vs those bits as a lookup into a user database.

kleber: Yes. Data measured by the ad at the time you saw it. Tied to the information it has about you at the time you saw it.
... Our proposal is that it's reasonable from the privacy point of view to allow that unbounded amount of information about the display of an ad to get a small additional amount of information (1 bit, 3 bits) about whether the ad display was a success or failure.
... In the status quo, you get infinity bits on the publisher side when you displayed an ad, associated with the publisher-side identity.
... And there's also infinity bits on the advertiser side. You put the thing in your cart, who "you" are, etc.
... Those are joinable in the status quo.
... The event-level proposal is to let you get the small amount of information to close the loop: did this ad convert?
... Without joining that information with the advertisers view of your identity.
... Trying to make it impossible to link user profiles across sites.
... Sending minimal information to determine whether a conversion happened.

johnwilander: We have to assume abuse.
... There's no way to know that this is being used for ads.
... All links should be assumed to have this.
... I believe your view is that Google will not abuse this.
... I believe you believe that. Other actors out there that get a lot of clicks.
... They'll abuse this power. If they have scripting power on the destination page, they'll use it to signal to themselves.
... That's cross-site tracking.

kleber: Two things:
... 1. "Should this be something that happens on all links, or restricted to ads?" I think it would be great to have a definition of "an ad", and it's reasonable for us to say "Oh, this is an ad frame" and tie restrictions and capabilities to that context.
... Different discussion, but worth talking about .
... 2. The capability of someone who has lots of clicks that go from A to B, and can filter small number of bits back to A? Yes. This enables that.
... If a user clicks from site A to site B lots of times, there are a host of ways for those two to join up their identifiers.
... Apple News to NTY 100 times, I'm sure both could communicate an identity to the other.
... We should talk about how to limit those capabilities.
... I don't think this thing is the low-hanging fruit.

blassey: There are affordances to limit that abuse. Limiting conversions to places where you have a click, for instance.
... Also opportunity, since both proposals recommend delaying reports, to create UX around the reporting information stored by the browser.
... Users have an opportunity to see that, say it's not right, brand reputation pain.

Alan: Before we're out of time:
... Same conversation is going on in Improving Adv. on Web BG.
... Would be great to get more involvement from the community.
... Similar conversations happened there.
... Web is stronger because of these conversations.
... That group meets every Thursday, would appreciate more folks joining.

johnwilander: If we're down to this difference, perhaps we can make a joint proposal.
... Will a reduced number of bits be used?

kleber: Good question.
... I've heard from folks inside Google, and heard folks from Facebook talking in the BG.
... 6 bits is not useful. Facebook: 6 bits is not enough, can't train the ML model.
... Search: the thing the advertiser might want to know is "Ok, what do users type in that leads them to buy kites afterwards?"
... You'd need to encode that query somehow in those 6 bits. "Blue Kite" isn't a kite, maybe it's something else.
... This non-personal contextual information is critical.
... Can't be encoded in 6 bits.

johnwilander: Absolutely. But the search side learns "This user typed these words and clicked this link, and converted." That's a lot!

kleber: Aggregate model?

johnwilander: No time!
... We'll have to revisit the 6-8 bits thing. Hope Google revisits the 64 bit thing.
... We'd rather not ship this at all than have 64 bits.

englehardt_: Similar position.
... Concerned about learning information about specific users cross-site.

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2019/09/18 06:33:54 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Present: taraw Alan christine kleber scottlow yoav heejin blassey
Found ScribeNick: mkwst
Inferring Scribes: mkwst

WARNING: No "Topic:" lines found.

WARNING: No date found!  Assuming today.  (Hint: Specify
the W3C IRC log URL, and the date will be determined from that.)
Or specify the date like this:
<dbooth> Date: 12 Sep 2002

People with action items: 

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)

[End of scribe.perl diagnostic output]