Annotating WPT to Surface the Status of the Platform

Meeting minutes

@jugglinmike: Welcome everyone! This talk assumes a little familiarity with WPT. Come talk to me if you want to know more about contributing to WPT. I'm a worker-owner at Bocoup and have been contributing to WPT for 6-7 years. More recently, I've been working with Google to annotate WPT for web features.
… I also want to talk about how to push this work further along and brainstorm with everyone. But first a quick overview of web features. We'll do a crash course in annotation and talk about some trickier cases. This is still an early-stage project.
… Web Features is a project by the WebDX community group. Its goal is to promote understanding of the full web platform using web author's own language. It's also intended to give web developers more of a shared language for advocating for changes to the web platform amongst web platform implementors. It also helps make implementation status more transparent to web developers.
… Web Features are a glossary of features or themes that diverge a bit from the specs. I would recommend checking out "web-features and Baseline" by Patrick Brosset if you want to learn more.
… WebDX Features powers the Baseline project (which appears on MDN and Can I Use). It also powers webstatus.dev and many other projects as well.
… And we have stickers here!
… Test annotation started in the fall of this year. We incorporated annotation into Test262 for ECMA/Javascript in November, 2025. We're happy to talk more about that offline if anyone is curious.
… So now let's try to actually annotate WPT with a web feature! We're very open to hearing about suggestions for improvement in this overall process as well.
… Annotation happens in a WEB_FEATURES.yml file in any directory that has WPT tests. They're scattered throughout the WPT repo wherever there are tests that can be classified as being about a web feature. The schema is pretty simple. It's just a list of mapping from the web feature ID to a list of files. There are globbing and ignore patterns that you can use to list the files that should be mapped to web feature ID.
… For example, '**' means all tests in a directory and all subdirectories.
… Some considerations for classifying. The format is simple but there is a lot of nuance behind the process. Many tests are cross-cutting across a number of web features. For example, some tests are written to test many different properties with repeated logic in one single test for convenience. These are challenging to include. You could refactor the tests to be more granular, since the classification system is very binary right

now. But that has its own downsides (lots of duplication, etc).
… Sometimes tests are cross-cutting by necessity. For example, `border-radius` and `box-shadow` may be tested in tandem because the test is about the interaction of the two features. This is one nuance that I would love to discuss with the group today.
… One other consideration is that there are trade-offs to pattern matching. The glob and negation operators make it easier to write out the yml files. They make it easier to include new tests going forward, assuming they follow similar naming conventions. Globs also make it easy to include tentative tests whenever they move out of tentative status. But a major downside is that there's a lot of potential for unintentional matching.

For example, the "**" will always include any file in a directory, so this will include any new tests added to the directory later, whether or not they are relevant to the web feature or not.
… Now that we've covered the process, here's how you can contribute. Author tests to support classification and think about granularity in testing. You could also classify new tests as you write them. You can also update existing tests to improve classifiability. You can also refine existing classifications and prototype new features IDs and propose them to the WebDX Community Group.
… So now I want to open it up to discussion! There's two discussion I want to lead. How do we promote maintenance of web feature classification in WPT? Currently, the process is owned by WebDX and have a process for watching changes to WPT with an eye for classification opportunities. We would like to make this more integrated into the general WPT contribution sense.
… One way this could look is with some annotation reminders in WPT pull requests. Something like alerting test authors about how many new tests are classified or unclassified on a new PR. Or whether or not existing test classifications have been changed.
… We should probably shy away from automated comments to PR's though. That seems to annoy test authors from historical experience with this. We want to do this in a less noisy way if possible.

tidoust: Is the intent to minimize the amount of files associated with a web feature? Are we trying to improve granularity? One suggestion for engagement: having a generic dashboard with metrics that tracks classification process has helped in a BCD project.

jugglinmike: Webstatus.dev is to some extent a kind of tracker for the overall classification process. It shows us how many features are classified and supported by some process. But there are a number of features that have no browser data.

Lola: No entries in webstatus doesn't mean that there's no tests written, correct? Just that they haven't been classified.

jugglinmike: That's correct. And for the question about number of files being classified. Currently, we're taking a very conservative approach. So we're sticking to tests that really only test one feature very explicitly.

foolip_: As far as updating PRs, I think if there was a way to notify people that tests could be linked to feature would be useful. Otherwise, how do we make this process more visible?

Lola: Is it possible to include that in the PR process more officially? Maybe this could be part of the review process?

foolip_: I think the problem is that neither the author nor the review really know much about the process. So the problem is really awareness right now. We definitely don't want to block PR's on this yet.

There's not even a way to notify WPT contributors from a comment in an automated way.

Could this work with the export bot in some way?

As a test author, is there any easy way to tell if a given test has a feature already associated with it? How would I even find a relevant WEB_FEATURES manifest? What's if it's nested three folders up for example?

jugglinmike: Right, that gets to the tradeoff between granularity and flexibility. Here's an example PR that shows how difficult it is to make the impact of the classifiers really transparent.

smaug: What about invalid or controversial tests? For example, tests that are focused more on a feature than a spec?

jugglinmike: Do you think that's a general conformance test problem?

foolip_: "Tentative" or "optional" tests are ways to make sure those tests don't end up in webstatus results.

<foolip_> Filed GoogleChrome/webstatus.dev#2002

jugglinmike: One more discussion topic. How do we classify tests that are cross-cutting by necessity? How do we discuss or visualize compatibility for web features that really depend on each other. I don't feel strongly one way or the other about this. But maybe there's a more nuanced way of visualizing browser support for features that are cross-cutting, for example tests that test both css animations and text-shadow.

foolip_: I've been thinking about that as well as I've been reviewing classification PR's. Most tests are primarily for one feature. There seems to be levels of precedence for what is under test. The tests that are really testing interactions of features on an equal footing are more rare, and maybe it makes sense to label both.

jugglinmike: I agree with that mostly, though I wonder about the number of tests ends up being reported on a dashboard downstream.

foolip_: My concern is false reporting of regressions when a test fails because of the interaction with a newer feature. Dialogs haven't regressed but their interaction with popovers might cause failures.

lola: To foolip_ 's point, for a lot of accessibility tests, we are looking for the interaction or influence of multiple features. For example, how would one classify something like aria-label? Also, how do we enforce validation and integrity in classification?

jugglinmike: We're at time unfortunately, and I want to thank everyone for coming. I'm happy to keep talking about this with anyone, so let's keep the discussion going!
… The best way to stay updated on this process is by staying plugged in with WebDX!

– DRAFT –
Annotating WPT to Surface the Status of the Platform

10 November 2025

Attendees

Meeting minutes

Diagnostics