User Agent Client Hints - TPAC 2020 - Breakout session -- 27 Oct 2020

<steveb> Hi

<wseltzer> chair: Travis

<scribe> scribenick: slightlyoff

Travis_: proposed by jrosewell
... opportunity to debate UACH tradeoffs; not sure how we want to proceed
... do we want to go through open issues?

<AramZS> is there a github repo?

<AramZS> ah here

<AramZS> https://github.com/WICG/ua-client-hints

jrosewell: yoav added tags to some tags asking for feedback? Perhaps start with usecases?

<Travis_> https://github.com/WICG/ua-client-hints/issues?q=is%3Aopen+is%3Aissue+label%3Afeedback_requested

Travis_: can you give us some background on this, James?

jrosewell: 2 areas: HTTP header for UA; no definition for how that should be structured, so conventions have materialised over the years. Problematic for parsing and strucrture. Second consideration is re: fingerprinting
... one of the other uses are fingerprinting and detection; e.g. for fraud detection
... analytics, etc.
... there's strong ovelap with those use-case and tracking-prevention policies from Mozilla/Apple/etc.
... Accept-CH is a header that provides extra information today, currently providing bandwidth, memory, etc. Proposal is around adding more fields, particularly information currently part of the UA header. There's some complexity around something called GREASE

<weiler> that misrepresents GREASE

jrosewell: an issue around the need to potentially obsfucate.
... another issue around the structure and value of the fields. Aligning the fields and making them easier to extract/parse.
... another issue around the first/second request timing
... another document that discusses the entropy that a device provides in different situations
... <discussion of parties involved>

<Masinter> I wanted to ask if

Travis_: does anyone else want to supplement this description of the feature?

<Masinter> anyone had considered the old IETF work on Media Features

<steveb> Perhaps, I think it relates to what the data is for. For example, if Sec-CH-UA is for telling the server what 'user-agent' (i.e. browser) the client is using, GREASE would appear to get in the way of that responsibility.

<Masinter> yes, thanks wwendy

jrosewell: <discussion of how/when browsers apply rules/policies>

<AramZS> @MasInter: can you link?

jrosewell: my business provides services that provide device information services; not in the grey areas talked about...feature phones in sub-saharan africa making server-side optimisations based on device model
... making heavy use of this

<Masinter> RFC 2506, 2913,2533

Travis_: this is on firs request?

jrosewell: yes.
... there was a companion proposal for potentially making this available on first request

<AramZS> For those not familiar with the URL structure:

<AramZS> - https://www.rfc-editor.org/rfc/rfc2506.html

<AramZS> - https://www.rfc-editor.org/rfc/rfc2913.html

<AramZS> - https://www.rfc-editor.org/rfc/rfc2533.html

<AramZS> Thanks Masinter!

jrosewell: lots of firms involved in analytics use this data too. First party (site owners) learning how to make their sites better, e.g. based on which OSes and features are available. Next are aggregated analytics (e.g. statcounter), comscore, ipsos, etc. Aggregated from multiple sites
... the aggregated analytics cases are potentially impacted by this. Tried to collate these in our PR.

Travis_: using the queue for our discussion today

<jyasskin> The HTTPWG at the IETF had a discussion of first-request client hints at https://httpwg.org/wg-materials/interim-20-10/minutes.html#client-hint-reliability.

Masinter: wondering if I'm missunderstanding...have you looked at older work on media features from IETF? Trying to describe capabilities, charistics, and content-type ("the 3 c's") as a model. Didn't succeed because the client may report something which the server couldn't trust (buggy), so folks moved to UAs instead

Travis_: is there a question there?

Masinter: question is: have you considered that older work?

jrosewell: appreciate the link to that body of work...that original use-case is partially what that information is used for today; tryign to understand what that device can do, it's capabilities, etc.

yoav: I wasn't aware of that earlier work, Masinter . Perhaps tackles a slightly different problem? CH doesn't try to tackle feature detection. CH as a draft has been in the HTTP WG for several years and is now graduating to an experimental RC. In the review process nobody raised that earlier work

AramZS: biggest problem that CH needs to address is the question of fraud and how it's dealt with
... some feature detection outside of CH is available....two interests: how Users may restrict data about themselves, and how a website may restrict data available to third parties (for lack of better terminology)

<yoav> https://github.com/WICG/ua-client-hints#spam-filtering-and-bot-detection

jrosewell: 2 different scenarios; fraud detection services based on historic interactions resulting in, e.g., a captcha box...and once you get into a page that's loaded and can call APIs you can learn a lot more for identifying fraud

weiler: I don't think I understand these fraud use-cases

<Zakim> weiler, you wanted to ask Aram to say more re: fraud uses

AramZS: as a publisher we have our own fraud detection issues. A few main concerns: the first is fraudulent visitors. Bots (mostly) or some sort of click-farm operation...don't want to serve them some resources if they aren't legit...e.g. ads or the whole site. 3P fraud; advertisres also want to guard themselves, don't want to serve their ads to bots either. Ad networks don't trust publishers to report fraud information. Those networks need

to be able to make assertions.

<weiler> /me may I interrupt?

AramZS: some cpaability detection, and some are user-agent based. E.g., not on a bot list. Some degree of fingerprinting. Ignoring it's valence, it's being used to identify bots today. Seeing them in one place then blocking them on next encounter. Once detected as bots, a UA and other properties cause those bots to be blocked. If an ad is shown to a bot and then the bot is re-classified, there may be an accounting 'make good' to account for

"illegitimate" impressions

weiler: so you're using "fraud" to mean "bot detection" primarialy?

(sorry, scribe interrupt for 30 seconds)

AramZS: questions of DDoS...a big problem for smaller publishers who are reliant on 3p solutions....a redirect to captcha for too many users (dialed up)
... the lines are helped to be set by, e.g., 3p join-up of a cookie for a user that has previously passed a captcha

yoav: for fraud detection, linked previously to the explainer. Trying to include it in the UACH proposal. In that section, under "fingerprinting", but agree with you that it can be considere different in kind.
... question: what parts aren't covered in that use-case section? How can we do it better?
... is there something that's preventing you from accepting client-hints, either for your own or for 3p origins? Can you delegate that effectively w/ CH?

jrosewell: AramZS talked about publisher fraud, and ew also see "survey fraud"...folks getting paid a small amount to fill out surveys

Travis_: don't really want to rathole on various sorts of fraud

jrosewell: currently the language's relationship to privacy budget is unclear. Who gets to decide?
... very dangerous...who gets to make the decision?

<AramZS> jyasskin: I'm not seeing anything at that link? But yeah, I would love to talk more about that. I played around with it and it looked like it somewhat worked, but wasn't sure

jrosewell: on the first request side of things, if you're making a ping to some environment to get access, there will be a performance impact

<eeeps> jyasskin: AramZS: that delegation is now defined in https://wicg.github.io/client-hints-infrastructure/ (and https://w3c.github.io/webappsec-permissions-policy/)

<Zakim> cpn, you wanted to mention another use case

<AramZS> Ah thank you jyasskin I will examine those.

cpn: just wanted to mention another use-case; similar to jrosewell 's point re: first request. For interactive TV applications, we're targeting non-evergreen environments. Targeting different models and manufacturers of devices to serve javascript that contains workarounds for specific models and devices

<yoav> https://tools.ietf.org/html/draft-davidben-http-client-hint-reliability-01

cpn: looking at CH with some interest to understand if we retain the ability to continue to work around issues

Travis_: jrosewell when you introduced the topic, you mentioned GREASE and analytics...were there other high-level topics you wanted to discuss?

jrosewell: who makes the decisions. Also, migration strategy. Used in many ways no one person can understand. Want to see migration done incrementally over time. Millions of websites are using this feature....if it's a half-day job for one site, that's millions of half-days
... the complexity of the new solution; are there alternatives? Can we tidy up what's there instead?

Travis_: <recaps topics>
... going back to yoav, did you want to continue on bot/entropy/fingerprinting?

<AramZS> No need for me to queue for this: but as transitions go, from what I've seen in Canary the switch over does sound fairly reasonable in terms of time request. That said, a version of the rollout where both are simultaneously available with decreasing quality on the old method seems reasonable?

yoav: on first-request, I posted a linke to a CH reliability proposal that will address it (an IETF draft)

Travis_: can someone give us an overview of GREASE?

yoav: I can try

<weiler> [I like the critical-CH proposal.]

<AramZS> weiler: link?

<weiler> https://tools.ietf.org/html/draft-davidben-http-client-hint-reliability-01

yoav: protocols tend to ossify; receivers of protocols tend to rely on defacto existing values. Protocol extensibility is in theory valuable, but ends up being irrenelvant. In TLS, Grease is used to exercise *all* the protocol features/values in order to make sure that clients handle all extensions.

Travis_: so a way to keep implementaitons on their toes?

yoav: a way to keep implementations conformant; hopes to keep protocol from ossifying. In the context of CH, we have seen over the years that various properties/sites rely on the UA in ways that hurt untested browsers. Want to avoid that this time around.

<AramZS> oh this critical-CH proposal is interesting!

yoav: want to make sure that consumers rely on structured headers instead of bad regexes

<MikeSmith> this is https://wicg.github.io/ua-client-hints/#grease I guess

yoav: want ot make sure we don't repeat mistaken abuse the way UA was. Harder to deal with deliberate blocking, but we can relieve a big compat concern if we ensure folks don't shoot users in the foot accidentally

<AramZS> ahhh so many meetings so little time haha, thank you for the link

<MikeSmith> https://tools.ietf.org/html/rfc8701

steveb: working with jrosewell at 51 Degrees...RFC 8701...struggling to understand how this extends to CH. CH tells you what the browser is (which the site can use) or it's not (because it's so randomised that it's not useful)
... trying to say "it's going to contain this information" but so random that it doesn't

thanks, MikeSmith

jrosewell: if the goal is to avoid regexes, we put a lot out in OSS in order to avoid this
... there are way around regexes that are working well

<steveb> The relevant RFC for GREASE in TLS is 8701

yoav: where we're currently using GREASE in the latest UACH impl in Chromium is to add another value to the brand version set that browsers send
... that added value includes charachters that ensure a regex that isn't a conformant SH parser is likely to fail at some point
... so the value isn't randomised...the itneresting bits of the value aren't randomised, but to read them you have to use a conformant parser
... goal is to ensure that implementations aren't aweful
... that it also includes an unknown value is also to help allow-list known browsers
... has prooven to be a bad practice for web compat

jrosewell: not quite sure I'm understanding how, e.g., "Edge" and your regex is looking for "E", "D", "G", "E" is going to solve the problem of a regex looking for that set of chars

yoav: if that's your regex, that's indeed a hard problem to solve. Trying to attack problems that are more complex than that
... not sure if that's a realistic example of a regex?

jrosewell: you've got an experience of a browser being blocked?

yoav: you can have conformant SH parsing that can result in blocking. We're trying to avoid folks using naive regex impls for detection of browsers
... that's the reason for motion of delimiters in the serialised value...if you've got other ideas for how to prevent that, would appreciate them

weiler: steveb charachterised this as an identifier of the browser...what the browser is...I'd tought of CH as more of a "what the browser can do"....memory, etc. rather than identity. More about capabiliities rather than identity? Can yoav can explain this change in direction?

yoav: UA CH is mostly about capabilities and user environment and that's how CH started...it's a content negotiation mechanism...various aspects used in this negotiation. Some, like device memory, DPR, viewport width, etc...
... ...there are some CH values for netinfo that tell you about the network situation. UACH are an extension of that...a different dimension of content negotiation but relying on the content negotiation mechanism

steveb: one of the main things about GREASE is that it's meant to prevent ossification of protocols...every UA has "KTHML" now...most UA now have "Chromium"...
... you can see a situation where sites may rely on this

Travis_: does anyone else want to chime in on utility of UACH in lieu of UA?
... let's talk about migration strategy
... how are editors and implementers considering rolling this out over time?

yoav: not sure I'm the best person to represent this view -- don't own a large web proprety myself -- but what we had in mind for UACH is to make it available/shipped for a while so that properties can migrate towards it before any sort of information reduction is exercised against the UA string itself

Travis_: are there plans for changes to UAs a well brewing in the background?

yoav: yes. There are plans, but UACH is still being rolled out

jrosewell: I'm relatively new to all of this...there's an IETF doc that you and a few others were authoring...experimental stage...
... can you comment on the relationship between these docs?
... can you talk about document status and how that relates to mass availability?

<wseltzer> slightlyoff: There is no requirement of formal document status relative to any feature shipping in chromium

<wseltzer> ... governance body, API owners, make decisions about which features launch in our engine

<wseltzer> ... other vendors have similar process

<wseltzer> ... there's no requirement for standards process

<Zakim> weiler, you wanted to answer

weiler: I can give you a rough approximation at the IETF
... IETF document statuses are more descriptive than prescritptive....how much of this are we seeing in the wild?

jrosewell: thanks to Travis_ for chairing and to the scribe
... thanks to the w3c for arranging the session and thanks to everyone for discussing...lots to read. Would welcome more discussion in this forum.
... many issues we only touched on and didn't look into in detail...access restrictions...what browsers decide..."judge, jury, executioner"....
... "who gets to do what" quesiton isn't one we touched on today
... great progress, would like to do this again

<wseltzer> [adjourned]

- DRAFT -

User Agent Client Hints - TPAC 2020 - Breakout session

27 Oct 2020

Attendees

Contents

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output