W3C

- DRAFT -

Improving Web Advertising BG
01 Dec 2020

Agenda

Attendees

Present
wseltzer, wbaker, bmay, ErikAnderson, eriktaubeneck, lbasdevant, mlerra, Mike_Pisula_Xaxis, kris_chapman, mjv, charlieharrison, jrosewell, joeyrobert, kleber, blassey, dialtone, zerth, Mikjuo, tomkershaw, GarrettJ, aschlosser, jeff_burkett_Gannett, imeyers, gendler, AramZS, pl_mrcy, bleparmentier, dkwestbr, apascoe, weiler, ajknox
Regrets
Chair
wseltzer
Scribe
Karen

Contents


[from Ryan Avecilla] https://github.com/neustar/pelican

https://w3c.github.io/web-advertising/dashboard/

<inserted> scribenick: wseltzer

Agenda-curation, introductions

<ravecilla> https://github.com/neustar/pelican

matt_zambelli: Neustar, a product manager working on measurement

robert_blanck: Axel-Springer, privacy, TCF

mike_waters: nextroll, engineer

zakim take up agendum 2

PELICAN (Private Learning and Inference for Causal Attribution)

https://github.com/neustar/pelican

ravecilla: we shared a link to PELICAN
... invite Robert Stratton to kick off discussion

<kleber> :-)

Robert_Stratton: with the bird name, we're 98% of the way there
... 3 things: overview of how we see measurement in digital marketing; importance; minimal requirements in our proposal
... not a fully developed proposal, but focused on discussing main components, after feedback

<palvarado> !present

Robert_Stratton: so far, thinkin gabout advertiser with ad budget, digital channels
... drawing back, feedback loop from measurement channels
... advertiser deciding how to spend based on performance of various channels
... feedback on where to put money is important to advertiser
... how to measure, merge across channels

matt_zambelli: think about how the feedback loop works
... how do marketers understand which of their channels are working, for optimization

https://github.com/neustar/pelican#defining-attribution

scribe: [describes attribution, credit after a sequence of actions]
... rules-based appropaches, e.g. last-touch, first-touch, fractional
... any touch

https://github.com/neustar/pelican#rules-based-attribution

scribe: another method is learning-based approaches
... modeling to assign weights to touches, learned from data

https://github.com/neustar/pelican#beyond-rules-based-attribution

scribe: Accurate attribution [slide with research studies]

https://github.com/neustar/pelican#bibliography

matt_zambelli: rules-based attribution approaches are inaccurate or misleading, so we need to facilitate learning-based attribution

<GarrettJ> LOVE the slide deck, thanks! So much clearer!!

matt_zambelli: what happens if within the browser sandbox, we degrade accurate measurement?
... if we remove learning-based measurement, advertisers will lose confidence in their ability to measure digital
... so they'll spend elsewhere where they can measure
... introduce biases
... overall ecosystem less efficient

https://github.com/neustar/pelican#implications-of-removing-accurate-measurement

AramZS: your proposal: we have an existing set of tools, they're not working effectively, so we're lookign at a more complex tool once those go away?
... learning based tool isn't in the market right now?

matt: learning-based tools are possible today, and used by lots of folks.
... if this group takes no action, those tools will stop working

robert: it's split between vendor offerings and clients doing learning in-house

AramZS: it's usually ML based on last touch?
... and other data/

robert: ML generally gets to look at all the touches

AramZS: the signals exist now, being used to train ML. How would they cease to exist?

robert: some of the signals today are tied together by 3d party cookie
... some elements of the sequence would be lost if 3p cookie went away, hidden inside the chrome system
... we can provide more information on the ecosystem today

bmay: important to focus on - marketers and advertisers will give attention to those they can see
... don't think ML will go away, but visibility of interactions will be reduced
... some will lose credit because they're invisible
... so will drive advertisers to first parties, who can be seen

robert: some advertisers have been quoted "if they can't measure, they won't invest"

bmay: so you'll spend money where you can get data, regardless of whether they're effective

joao_natali: not only based on visibility, we'll talk about what we think is needed
... whether the actual measurement allows advertisers to have view on effectivenes
... not enough to correlate touches and marketing with valuable activity,
... but to understand what is causal

angelina_iabTL: highlighting these challenges is great.
... learning-based depends on level of detail
... advertising, martech, CRM goes into complex modeling
... post-view conversion, will that be available?
... enabling advertisers to figure out and decide what data points drive interaction
... giving insight to who those users are, demographic, technographic
... lots of data sets being used by advertisers to figure out what's the best way to communicate
... second and third tier publishers will be hurt

robert: agree. We've tried to develop proposal within the chrome sandbox world
... there's a broader conversation about integration

angelina_iabTL: also a challenge if other browsers are reporting something different
... challenge for attribution across Apple, Google, across ecosystem
... brand awareness campaign, how users are being driven down the funnel
... not simply serve an ad and figure out where it's being converted
... other insights, other behaviors offline

robert: designed to satisfy chrome privacy conditions

joao_natali: proposal at high-level
... goal to get directional feedback
... discuss need for this kind of appraoch
... develop necessary APIs for effective learning-based measurement
... 3 elements required

https://github.com/neustar/pelican#what-might-be-required-to-support-accurate-attribution

joao_natali: browser would have to be able to aggregate activity from users, multi-vendor, multi-channel pathways
... would have to go beyond partitioning that exists today, to compile possible pathways
... organic and non-organic
... 2. collection of both converting and non-converting sequences. learning depends on reporting both
... to build a probabilistic model of what drives success
... understand causality
... 3. technical investment to collect sequences and events and transform them into form appropriate for model-building
... can do via helper servers or federation
... we've identified these as gaps on current measurement proposals to enable learning-based approaches
... Feedback?

<Karen> Scribe: Karen

joao_natali: whether opinion of group is aligned or understands
... browser collecting pathways, or failure to convert
... the ability to understand what is driving valuable outcome, cannot be simply marketing
... have to rely on first parties, organic interactions to have complete view
... to have true effect of marketing on the outcomes
... pause here for questions

Wendy: thank you to the Neustar team for the presentation

charlieharrison: thanks for presenting this proposal
... it is really interesting
... I am interested to learn more on the technical side
... this is pretty high level
... I would be interested if you have more details, or put out more details on how to do this learning
... either in federated way or MPC servers

Joao: that is the intent
... we have a deeper level of some of these proposals we want to start publishing
... but we also want to get your thoughts on the direction
... and what you are proposing on the measurement APIs
... in spirit, there is complexity
... but in terms of principals that guarantee privacy
... everything we intend to specify would be compliant to this

robertstratton: Google are champions
... charlie, what is best way to do next level of detail?
... in Github?

Charlie: I'm agnostic; ask Wendy
... side meeting ok, but no strong opinion

Wendy: thanks, that sounds great to add more detail in github
... and if you want to request more agenda time to discuss in this group, happy to offer it
... if we find discussion goes in direction that is highly technical and only of interest to a few people
... or you need more time than this meeting allows, side meetings are good, too
... unless I hear people say stop, the subject matter feels appropriate

Charlie: one more piece of feedback
... quickly, the challenge that we've had with these kinds of data driven or learning based attribution approches
... is in reporting paths or sequences to reporting companies
... sequences are difficult to aggregate
... high entropy
... about user behavior
... sensitive to form into model of aggregation or DP
... technique to have helpers learn sequences
... or do learning on device with federated learning
... seems more challenging and more complex
... to me, a more likely place for user privacy
... direction is good, but I am nervous about the complexity

<rstringham> https://github.com/w3c/web-advertising/blob/master/privacy-preserving-multi-channel-attribution.md

Russell: I want to say this is an area that's important to Adobe
... to do this type of ML
... posted link to a proposal we talked about several months ago
... privacy preserving multi-channel attribution
... talked about how it could be extended with a helper that could do the ML
... for calculating the models
... biggest drawback is it is only converting paths; doesn't have non-converting paths
... would need extention to ML
... it's a place where helpers can be used to do this type of computation
... thanks

Joao: agree
... I think it's definitely worth it
... to look at this proposal
... as a theme and a group
... to Charlie's and Russell's points
... we are in complete agreement
... that the idea of exporting some how
... sequences and properties associated to users
... properties, uniqueness of behaviors
... is well known
... basically just reports out of system
... the same way aggregated measurement proposal
... data accepted by server in encrypted way
... protection helps
... minimal level of plus DP
... same intervals would be applied here
... heavy lifting server has to do is higher
... we think that is not insurmountable
... FLOC system would have to solve as well
... part of solutions could be adopted for both sights

Mikko, NextRoll: for Charlie's questions

scribe: this PELICAN proposal is high level
... we are working on bit for browser on technical side
... on how to do these use cases
... we have a bird name, and hope to discuss next week

Wendy: look forward to a link to share

<Mikjuo> SPURFOWL

Wendy: that is the name
... as soon as we have a public document I will share it

Robert: If you want to talk offline
... before next week, we can
... or you can wait

Mikko: of course

WangGang: I missed some context
... I hear machine learning
... in context of the multi-party attribution
... could someone explain the kind of ML models you have in mind
... different ML add different levels of complexity

Robert: we are agnostic
... complexity is in computation rather than feature set
... random forest ...is computationally more extensive than linguistic expression
... different algorithms can be applied rather than one for all
... to generate features we are talking about
... we would have to deal with rest of it
... works on these features

<wseltzer> s/linquistic expression/regression/

Gang: Thanks

Angelina: I want to give some insights into some attribution models being conducted
... a lot of advertisers use one ad server, but they have many different types of campaigns
... running across many media channels and publishers
... for sophisticated marketers...found areas most effective, they increase ad spend on line
... a lot doing customized attribution with campaign and publishers
... journey cycle is a long time frame, for example car buying
... being able to not give credit fully to those awareness campaigns to a completed sale
... if spending on FB may have certain links, for certain audiences
... or WSJ or NYT may have longer time frame
... than those on social
... different combination of attribution settings
... when collecting API level data, they are setting network at 30-day latency
... but also taking data and breaking it up and playing with attributions based on time stamps
... if diff between existing customers or new customers
... ads might not be as... less weight
... if one size fits all
... or browsers set attribution is designed is going to be challenging for a lot of advertisers
... also varies by vertical such as financial services

<jrosewell> angelina_iabTL - thank you for making an important point about the economics of attribution and why marketers spend money on the open web

Brian: I'm looking at this
... you have lots of people contributing value into the advertising journey
... we are suggesting where there is a lot of cooperation involved

<jrosewell> One size fits all where browser sets design of attribution will be challenging as use cases vary by vertical and campaign

Brian: wondering how we are going to manage that cooperation so the right people get the right data in a privacy preserving way

Robert: multi-touch approach beneficiary would be advertiser himself
... he may pass back info
... that you are not cost effective and switch to another channel
... we did not envision this being a report to publishers or adtech cos
... but more to the advertiser paying for the advertising
... does that answer question?
... yes

Aram: thinking about next conversations
... I would like to see comparisons against other techniques
... concern with advertising going away if we don't do this
... are there use cases we are trying to prevent in a more private way
... how to get part of way here
... it won't go back to what it was
... it's not poss even with this proposal, although hard to tell for sure
... if there are other platforms doing more detailed tracking
... and what we do prevents more detailed tracking
... will getting halfway there be ok
... will it stop concern that revenue will exit web as a platform
... or are we putting a lot of work towards a halfway solution on web vs. non-web measurement
... have a hard time envisioning advanced state of non-web
... would like to see comparison of this proposal vs. non-web other measurement proposals

Robert: we cited in github proposal
... 2018 study
... and other studies which compare methodologies
... we are not looking for timing or tracking
... just aggregate and attributions to diff media channels and elements to those channels
... we are not identifying more entropy
... first party platforms, including Google, enable detailed reporting...with attribution
... another world where detailed attribution is available
... we can point you to examples

<jrosewell> Aram: I agree; understanding the economic and competition consequences of these conversations and eventual decisions and standards is very important

Angelina: Advertisers don't need to see true event level data
... they want to see large patterns
... and to be able to query them and put together different models
... time stamp, but more timing
... from first exposure to last exposure
... what is reach and frequency within certain time frame, daily, hourly bases
... and see those patterns
... I have seen advertisers improve efficiencies in cost when they do tests
... including ad size, placements, time of day, day of week, devices, browsers, etc.
... interesting to see how people react and those various conversations
... and how advertisers can improve

<jrosewell> Angelina_iabTL : 20% to 25% improvements are significant for marketers - should be in the minutes

Robert: PELICAN considers different timing
... but seems all of that is considered by ML learning model
... none is exposed to advertisers
... we are not asking for timing to be exposed
... all exposed is privacy aggregates

<AramZS> I agree it is very interesting to see how people react, and that advertisers can improve their performance, but I don't think that advertisers are the ONLY stakeholder we are designing these changes to how the web works for.

Wendy: Any final questions or comments?

<robertblanck> +q

Wendy: where would you like comments and follow-up discussion on issues
... on your github?

Joao: yes, on our github, that would be great for us

Robert Blanck: Ok

<jrosewell> AramZS : As W3C we can only design for all stakeholders

robertblanck: I just want to say, from an advertisers POV, I understand
... there is measurement better than in other channels and there is efficiency
... there is monetization
... advertisers are the main stakeholders, although I am a publisher, I get that
... the efficiency is something very important
... although privacy should be respected
... and not be hole for privacy
... this is an important step forward and we should see for that
... hopefully advertisers don't switch back to TV from web echosystem
... this is a good step forward and we should consider it

<AramZS> jrosewell: I agree, which means that advertisers aren't going to be the ONLY stakeholder that systems have to be built around. People with privacy concerns are also a stakeholder.

Wendy: it sounds as those we will hear a proposal from NextRoll next week, or at a future meeting
... and we invite continued discussion of PELICAN in gitbhub, or we can bring back further discussion
... look forward to seeing you next week

<GarrettJ> AramZS: The issue is that publishers are stakeholders and measurement solutions that disadvantage them in favor of the walled gardens hurt the websites and therefore their users.

Wendy: I believe unless I hear otherwise, we will meet up through 22nd Decemeber
... but not on the last Tuesday, the 29th of December

<jrosewell> Robert: here's how deal with this in TV https://www.samsung.com/us/account/privacy-policy/samsungads/

Wendy: a few more meetings this year
... thank you for all the proposals and discussions

<AramZS> I would say that being overly tracked also hurts our users, as does excessive leaking of user data.

Wendy: we are adjourned

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version (CVS log)
$Date: 2020/12/01 17:04:05 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision of Date 
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/@:/robertstratton:/
Succeeded: s/Flux/FLOC/
Succeeded: s/@/Robert/
Succeeded: s/@/forest/
FAILED: s/linquistic expression/regression/
Succeeded: s/@/considered/
Succeeded: i|Agenda-curation|scribenick: wseltzer
Succeeded: s/...I just want to say/robertblanck: I just want to say/
Present: wseltzer wbaker bmay ErikAnderson eriktaubeneck lbasdevant mlerra Mike_Pisula_Xaxis kris_chapman mjv charlieharrison jrosewell joeyrobert kleber blassey dialtone zerth Mikjuo tomkershaw GarrettJ aschlosser jeff_burkett_Gannett imeyers gendler AramZS pl_mrcy bleparmentier dkwestbr apascoe weiler ajknox
Found ScribeNick: wseltzer
Found Scribe: Karen
Inferring ScribeNick: Karen
ScribeNicks: wseltzer, Karen
Agenda: https://lists.w3.org/Archives/Public/public-web-adv/2020Nov/0014.html

WARNING: No date found!  Assuming today.  (Hint: Specify
the W3C IRC log URL, and the date will be determined from that.)
Or specify the date like this:
<dbooth> Date: 12 Sep 2002

People with action items: 

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]