W3C

– DRAFT –
Improving Web Advertising BG

30 November 2021

Attendees

Present
alextcone, arnoldrw, blassey, dialtone, eriktaubeneck, imeyers, johnwilander, kleber, lbasdevant, Lisa_Markou, mjv, weiler
Regrets
-
Chair
-
Scribe
Karen

Meeting minutes

<wseltzer> Lab's Content Taxonomy 3.0 (Benjamin Dick):

<wseltzer> https://iabtechlab.com/press-releases/tech-lab-releases-content-taxonomy-3-0/

<wseltzer> https://www.iab.com/guidelines/content-taxonomy/

Wendy: Invite you to join irc and "present+" yourself

Wendy: Let's start by looking at the agenda

[reviews]

Wendy: Do we have any introductions from those new to call or the group?

Mariah Hart: My first time joining call; on the privacy sandbox team
… will be owning strategy around these W3C meetings
… and funnel info from these calls back to the product teams
… happy to meet everybody
… say hello

Wendy: Welcome, Mariah; good to have you join

Shannon Janus: First time joining; I'm a taxonomist from Hearst Magazine

Introductions and Agenda Curation

contextual advertising, with reference to IAB Tech

Wendy: welcome Shannon as well
… any agenda curation?
… or anything else people would like to discuss in the near future?
… Then we will get to the core of our meeting today
… We have a request to talk about contextual advertising and the content taxonomy work from IAB Tech Lab
… we have a presentation from Ben at IAB
… think about what W3C can contribute to that work, think about whether there is other standards work we would like to see in the area of contextual advertising

Ben: thanks, Wendy
… I have a short deck to walk through

<wseltzer> https://www.iab.com/guidelines/content-taxonomy/

Wendy: we see it
… people appreciate presentations
… thanks for sharing

Ben: for those who don't know me, I'm Ben from IAB Tech Lab; long-term contributor
… contextualize the slides

[slide 2]
… Content Taxonomy is a standardized way to describe the "aboutness" and context of a site or application

[reads off slide definitions of Audience Taxonomy and Ad Product Taxonomy]
… we will focus on the Content Taxonomy
… original taxonomy came out a number of years ago
… were basically aggregated together to create a common approach
… 2.0 came out in 2017/2018
… introduced additional vectors
… can signal aboutness and orthogonal attributes like language
… or attributes to inform what is going on in content itself
… 2.1 introduced shortly thereafter
… introduced concept of an SCD flag
… to indicated nodes of taxonomy that should NOT be used
… for profile
… for audience collection more broadly
… 2.2 added brand safety depth
… added GARM framework
… and 3.0 is latest iteration
… and just ended public comment
… big difference is the introduction of alphanumeric rather than linear IDs that create problems with changing attributes in taxonomy
… we needed to update this to a 3.x version
… also added new vectors around news categorization
… big driver was to prevent demonitization
… because of COVID
… from brand safety perspective, so legit advertisers woudn't be afraid to advertise
… also added some video genres, podcasts, apps and gaming content
… I mentioned these additional vectors
… what are they?
… largely content categories, type, format, language and source
… can provide a rich signal for what is going on in the page
… to reiterate
… we added a new idea
… to labeling 3.0
… we added "politics & news" as a separate vector to describe the content or channel itself
… also added video support around genres and added form factor to describe the attributes further up in taxonomy
… and more on podcasts and games
… Largey driven because taxonomony was for monetization activity
… can get into systems design if you want to dig into it
… These taxonomy IDs intended to be used as primary approach instead of cookies
… brief update
… happy to answer questions
… I don't have irc up

<wseltzer> https://iabtechlab.com/standards/content-taxonomy/

<blassey> that's old

Wendy: I see Olaf...that's old
… questions or comments for Ben?
… I think the context for when this request came up
… was that we have been discussing in many of the proposals before us in the Business Group
… we have been discussing personalized, targeted advertising
… it was recalled that there is another big category, contextual
… and wondered if there was more work to be done
… and thinking about how we do contextual advertising and whether there is work for standards
… Ben, do you see a place where web side standards are need to work with what you are doing?

Ben: yes, depends upon the use case
… often times can be used for site analytics by publishers
… and less used for addressability
… in most circumstances there is an intermediary
… downstream
… to make an attribute determiniation
… buy side uses as the bidding signal
… there is always a role to broaden input and make sure we are getting holistic feedback
… depends upon the the content taxonomy is being applied

Wendy: sure, and do you see need for APIs or other ways that the web stack could interact with this taxonomy?

Ben: unfortunately, not much that is automated in terms of algorithm input or ingestion
… not sure if role for APIs
… in informing what we are up do
… maybe you are referring to automation of content tagging

Kleber: thank you for the overview
… I think my question is similar to what you and Wendy just talked about
… where do these labels come from in practice?
… who is it that does the assignment of taxonomic classifications on pagage
… what goes into that in general
… and to know whether there are situations where you are worried about people applying the wrong taxonomic decisions to things
… you mentioned COVID
… it might be to someone's advantage for monetization to classify something one way

Ben: for sure
… I'll start with that one first
… you are right
… there is an inherent conflict of interest for publishers to misrepresent their content
… buyside relies on that
… orgs like IS or DoubleVerify
… to apply semantic analysis for the buyside and cut through poss for misrepresentation
… that will always be there
… that incentive is hard to remove
… self attestation of inventory
… who's doing this and how it's applied?
… It's usually a taxonomist inside the publication
… happy to have additional input from them on this call
… or taxonomy teams who tag for anlaytics purposes
… and sometimes crawled for analysis

Kleber: let me make sure I understand the flow of info
… publishers who put content up on the web
… or whatever collection of people put content up
… often have people inside their orgs who apply some kind of taxonomic classifications
… and the taxonomy associated with a particular page
… is introduced into realtime bidding
… as part of audience ID
… and some independent party that does their own evaluation of what a page is about
… might do some after-the-fact checking
… that this URL goes with this taxonomy topic
… we may disagree
… and may have a reputation score
… just trying to put the info flow into a bigger picture

Ben: The nuts and bolts are correct
… the content taxonomy ID will be picked up
… certain locations where they are housed to have a consistent location where to look
… often in the bitstream itself
… most advertisers don't rely on it only for the bidding decision
… there is that midlayer of entities
… that inform vast majority of bidding
… the self-declared taxonomies from publishers are rarely used
… but the data flows you described are correct, but not always used

Kleber: yes, I understand the picture

Lisa Markou: How new is this and what are the adoption rates? Is it in market?
… how are you working with publishers and adtech to get it into market?

Ben: It has been around ten years
… but 3.0 has not been released as a final spec
… we don't have good ways to measure it
… we do know that it is pretty widely used; 2.0 most widely

<alextcone> @kleber - Grapeshot (acquired by Oracle) is a commonly used, real-time contextual integration a lot of the "buy-side" uses. https://www.oracle.com/cx/advertising/contextual-intelligence/

Lisa: I meant 3.0, so cool it's not widely used yet

Jukka: Comscore is doing automated classification of pages
… we have a crawler, one of bigger ones
… scanning pagers and doing automated processing to figure out the taxonomy labels that go with the content

Ben: yup, another example of an org that evaluates the aboutness
… to inform downstream bidding behavior

Kris: a related question
… has there been any discussion with ad servers about applying the taxonomy to the ads
… when ad shows up it's content of page, but also the other ads
… trying to stop competitor ads from being displayed next to each other
… especially with fake news
… also seen sensitivity next to ads that are behaviorally targeted ad and contextual ad
… may come across as creepy
… wonder if there is work being done on ad side and not just publishers content

Ben: yes, thanks, Kris
… we are also looking to support ads
… newest taxonomy, Ad Product Taxonomy, to describe ads being utiliized
… hope that gets to crux of first question
… concerning audience or behavioral advertising
… will be up to the preference of hte publisher
… and the tech providers they work with
… a lot of that would be hard to support in meaningful scale
… depends upon publisher preferences and how they are monetized
… does that help, Kris?

Kris: yes, great

DavidDabbs: thank you
… Hi Ben, to address somebody's earlier question
… about the adoption
… this would be signaled in open RTB
… to show which level of a taxonomy
… if I'm right, Tech Lab is close to doing a 2.3?
… or is that not baked yet?
… going into weeds of signaling and not taxonomy

Ben: We have an example of that model
… with seller defined audiences that address @ specs
… we did address @ with RDB 2.6
… idea is to anoint one location moving forward
… problem with doing anointing now is there are moving pieces with the actual extension itself
… we cannot do that yet
… have to make sure the OpenRTB group is comfortable with that

David: Seems like a natural inflection point
… that we are doing updates to a 2.6 and also adopt a more modern content taxonomy
… hard to change too many tires on the bus while it is in motion

Ben: Exactly
… look at what that SDA extension looks like
… trying to find slide
… here we go

[shows slide]

"About the Pipe"
… there is basically a location for audience taxonomy signally and
… site audience data
… what we use for audience
… there is a lot to wrangle
… in the OpenRTB supply chain
… this is what the extension looks like for those who have not been exposed to it

Wendy: thanks
… sounds as though...where should people join the conversation
… if they want to participate?

Ben: Good question
… we have three relevant working groups
… a taxonomy and mapping WG
… that owns the product roadmap
… and the addressability wg that owns audience specs that activates stuff
… and also the OpenRTB wg that owns product map for the open spec
… all three working groups work together
… but there are also peripheral components
… hope that is helpful

BrianMay: Are the groups you mentioned working on this
… free for people, or do they need to be TechLab members?

Ben: The working groups are for Tech Lab members
… there are @ for non members

Wendy: very good, similar to W3C, varying levels and ways for participation and all sorts of groups to keep track of
… we try to share information

Ben: indeed, but easier said than done

Wendy: Can you make the deck you shared available to include with the minutes?

Ben: yes

Wendy: other questions or comments?

AOB

Wendy: anyone inspired by this to think more broadly about how this integrates with the other proposals we have heard
… or with the landscape of behavioral advertising?

… Our next item is any other business
… is there anything else people would like to draw our attention to?

Shailley: I'm not on irc, with Tech Lab
… but add about activation
… anything in adtech is not just one product
… it needs to be a complete system
… so just having a taxonomy is not enough
… and to Michael Kleber's question
… about how it all works
… Ben explained well
… we are also trying to do
… is an open source project for a test benchmark
… and brand safety measures based on the taxonomy
… have a human analyzed benchmark
… if someone is trying to apply ML
… they can test against how close or good it is
… that is another effort
… I would love for people to look at it and expand the scope to a broader taxonomy test benchmark

Wendy: Can you drop a link into irc?

Shailley: I don't have irc on, but can send it on Slack

Alex: I'll drop it in

Alex: it is on the previous topic
… I realize we may not have the right people on the call or may not feel comfortable to answer
… from Chrome side, what types of things are most important to be thinking about
… with relationship to taxonomies
… and the current thinking going on with FLoC
… anything that stands out
… like important to see "xyz" to fit in
… standardized ad taxonomies in FLoC
… or something that doesn't exist yet
… curious if anyone has top of mind thoughts
… to improve, change or amplify

Kleber: I think we don't have a good answer right now to the question ALex just asked
… seems clear the goals of FLoC and content taxonomy are closely related enough that we should have something to do with each other
… we are in the middle of going back to FLoC and see what successor version looks like after one we tested this year
… once that is out, that would be a good time to talk about it
… in the near future
… I look forward to it and thanks for the presentation

BrianMay: Totally off topic
… get some idea for the meeting schedule for the rest of the year?

Wendy: We are now meeting every other week

<alextcone> link Shailley (IAB Tech Lab) referenced: https://github.com/IABTechLab/Brand-Suitability-Test-Benchmarks

Wendy: next meeting is 14 December and I proposed we NOT meet on 28 Dec, so one more meeting in 2021

Brian: Thanks

David_Dabbs: Might have been your question, Wendy, or Michael's
… on how web standards can intersect with this
… if community says because of crawlers it would be good to annotate your pages
… if there is not a slot
… for publishers to appropriately annotate their web content
… the right HTML
… haven't looked at RDF and metadata in years

<jdcauley> that would be a schema.org thing yes?

Wendy: and a place to talk with our data standards activity folks to see if you have all the expressivity you need
… for the taxonomies
… I won't speculate on what those are
… there is Schema.org
… LinkedData, JSONLD, and RDF variety of data standards
… and many of them are about the formats and semantics for data interchange
… I hope those are useful to this project
… those groups would welcome input if there are use cases
… seeing no one else on the queue
… Our final agendum was next meeting
… Brian's question anticipated that
… the next meeting is December 14th
… Like people to share agenda items and requests before that
… And to say to wrap up the year

<wseltzer> [adjourned]

Wendy: before going off on winter/summer breaks depending upon hemisphere
… with that I think we are at the end of this meeting
… Thanks very much, Ben for this presentation
… and thanks everyone for questions and comments

[adjourned]

Minutes manually created (not a transcript), formatted by scribe.perl version 159 (Fri Nov 5 17:37:14 2021 UTC).

Diagnostics

Succeeded: s/to describe/newest taxonomy, Ad Product Taxonomy, to describe/

No scribenick or scribe found. Guessed: Karen

Maybe present: Alex, Ben, Brian, BrianMay, David, David_Dabbs, DavidDabbs, Jukka, Kris, Lisa, Shailley, Wendy