Semantics for the Agentic Web

Meeting minutes

Semantics for the Agentic Web

Penny: I will present a few slides, but aiming for an open discussion afterwards

[We look at Pennys slides]

Penny: The agentic web is already here

<dom> Slideset: TBD

<dom> [slide 2]

Penny: I expect us to see a trough of disillusionment, people will learn about the brittleness of agents
… However eventually users will start to delegate full complex and high-risk transactions to agents
… And getting 99.9% is not acceptable, given bad actors

<Jemma> can someone share the slide url?

<dom> [slide 3]

[The slides will be available after, with apologies]

Penny: Today, agents use various fragile parsing mechanisms, often visual

Penny: On top of this, models are constantly changing, which also introduces uncertainty

Penny: I suspect we need to go beyond DOM or reading pixels for acceptable success rate

Penny: So the question here is - is ARIA the answer?
… ChatGPT Atlas already uses it

<dom> [slide 4]

Penny: It seems nice to tell developers that they need to get ARIA right
… But the risk is that we may need to tell robots different things to humans
… And we risk degrading the ARIA-driven experience for real humans
… E.g. if we want to lie to the robot, or give more information than a human needs, etc

<dom> [slide 5]

Penny: For example, trashcan icon on application. Does this mean archive, or delete?
… Without clear signals, the agent may wipe data accidentally

Penny: And there's a performance and sustainability problem, pixel scraping is probably a lot more expensive than just looking at the accessibility tree

<dom> [slide 6]

Penny: Onto the discussion. Some question seeds:
… Do we accept the premise? Can we achieve 100% reliability just using inference?
… Are we comfortable conflating human accessibility with machine capability?
… Do we need dedicated agent augmentation for HTML markup

<Zakim> mbgower, you wanted to say I launched a small trial balloon in this direction w3c/aria#2046

mbgower: A few years ago at IBM, we had the visual cues for AI material and no way to do it programmatically
… At the time, we wanted some kind of semantic structure to deal with non-accessible related information
… ARIA seemed like an interesting area to bring it up, but at the time ARIA group said it wasn't an accessibility issue, which was somewhat valid

mbgower: I don't believe we will get it 100% by inference. I am concerned with conflation, but I accept it may be necessary.

ErikAnderson: I as a human cannot get 100% reliability, but I take responsibility and/or blame the site for bad design
… I think its important we do something here to help agents get closer to 100%

Penny: I think there's an analogy to self-driving cars - the bar is higher than for humans
… I think its similar for agents on the web

benvds: Very interesting + important. On representing things differently between agents and users is possibly dangerous. Lying to agents becomes an attack, can lead to differences in serving.
… I also like labelling semantics as destructive vs mutating but reversible

wendyreid: One big problem we already is how context drives interpretation, e.g. edit icon on one site could be highlight on another. Accessibility when done well helps with that (e.g. via labels) but we cannot rely on that today for humans
… I do love the point of us having a higher bar, however

ack

Isaac: Is the question here reliability for b2c or b2b? There's a trade off where we might accept less reliability to be able to achieve things that humans cannot

Penny: Agree. Hopefully nobody is trusting agentic systems in a life or death situation. But in terms of money, there's a lot more flowing in b2b or b2c scenarios. So imagine an agentic based clearing house, what happens there if we don't have 100% certainty.

Isaac: Definitely, not suggesting b2b means less reliability. I think there may still be value tradeoffs that are more common in b2b

<mbgower> COGA did a presentation at CSUn back in 2022 about how ARIA could be levered in a way similar to what I infer is an approach here. I can't find the presentation, but here's the original doc w3c/aria#2046

mjwilson: I think there's a risk of overloading ARIA. I think there's a similar risk with dedicated tags, because developers may implement those in place of ARIA

jamesn: the ARIA technique, agentic browsers are going to use it whether we like it or not. I think we should be giving an alternative, and avoid ARIA being misused
… if we rely on that being the only way for robots, I think it will make the web worse for humans
… ARIA should be a fallback, not the main approach

<wendyreid> +1 jamesn

reillyg: Agree with last two comments, I am concerned that sites will try to block bots and they will use ARIA as a way to do that, and end up blocking users who use assistive technologies
… It might be prudent for browser developers to say "on my browser, ARIA will be for humans"

<Zakim> csarven, you wanted to mention the W3C RDFa recommendation can express any significant unit of information at any level of granularity in a document, which can be used in any markup language

csarven: I think this problem is addressed by existing w3c recommendation (RDFa)
… Basically orthogonal attributes that don't interfere with ARIA
… Attributes in HTML elements
… Can express the content that the human is reading, or something that is only for machines
… So no duplication of labels since it can be for both, or it can just be for machines
… I strongly recommend reviewing RDFa
… I would also like to see more concrete examples of use cases

<csarven> RDFa: https://www.w3.org/TR/rdfa-core/ , https://www.w3.org/TR/html-rdfa/ , https://www.w3.org/TR/xhtml-rdfa-scenarios/

Penny: Look forward to exploring that!

kbx: I am worried that developers won't know what they have to explain to models.
… Agents are not regular users, this might be hard

Penny: So that I understand, its that the model may need very different information than humans?

kbx: Yes. So things like "these icons make sense to my users, because of wider context, but a model has no idea"

vasilii: For agentic systems to have utility, there needs to be scale. So I want to explore how we can do that today
… I really like the point of building in a way that works for both humans and agents, without building two separate things

<mbgower> One of your examples seemed to be a challenge with inferring the meaning of a button function based on a crappy label. I did a little exploration of how this could potentially be done just by using an established vocabulary. So it's something of an inferred approach, but it provides some underlying support. I'll see if I can dig it up

vasilii: to avoid expense and also avoid excluding systems that don't adapt
… I also agree that there's still place for deterministic systems, not everything should be agentic

benvds: Conflating accessibility and machine capability - to me the answer is no, based on what has been discussed
… However, you can still use the technology without conflating it
… There was/is a Better Together breakout Monday, looking at expanding ARIA
… The goal is to give benefit to assistive technologies without having it be taken over by robot use cases

Penny: Feels like temperature of the room is trending to don't conflate, can augment using the accessibility tree but don't rely on it
… One question that comes to mind is that any solution we add here needs to be very low effort to developer and also needs to avoid staleness
… Having a single location for the information helps avoid staleness

Penny: I also want us to consider that browsers/agents will have to treat webpages as at best neutral if not hostile entities, so encourage folks to think about how this could be used to attack users via their agents
… Its a bit like having a trusted employee that gets tricked to be malicious, I would be more likely to trust that employee and be vulnerable, versus my own relationship with a malicious website

noamr: Two things. I am struggling to find the distinction of where human ends and machine starts. Is an agent that helps me buy a car a different machine vs something that helps me read a website. ARIA is still a machine, its just not agentic.

noamr: I am also missing examples of where ARIA is harmful. I'm sure there are, but I'm not aware

dom: Not an ARIA expert, but one rule I've heard is don't use ARIA unless you have to
… I think shifting the conversation towards the accessibility tree (rather than ARIA) is important
… A similar proposal to RDFa is there's a new community group for ???
… Looking at web semantics for agents

dom: Also, are agents a new class of users for webpages
… Some people are now using ?? to refer to the experience of a website for agent 'users'

<Zakim> dom, you wanted to mention the first rule of ARIA, NLWeb and to mention "AX" (agentic experience) and to mention the role of AI in maintenance

Penny: I'm starting to see developer agencies marketing "is your website AX ready", and this seems concerning

dom: Yep, risk of two class, where rich people have their agents doing things for them, and the rest deal with a terrible web
… That reminds me, AI can also be used to help maintain things like the two ways of dealing with communicating info

tomayac: First rule of ARIA is don't use ARIA
… Before we reach to that, the solution is actually semantic web
… So many things are just form operations

<wendyreid> +1 tomayac

<benvds> +1

tomayac: If your website is not recognizable to a machine, then its because you're building div based UI and then its not machine readable
… I am concerned that people will start using ARIA wrong, making the web worse for human users
… So we should be careful about telling users to use ARIA

johannhof: I want to take us one level up. Who are we serving, what's the outcome that we want?
… To me its that websites should be able to be optimized for browser agents
… This takes us back to the problem of interoperability, but now its no longer between browser engines but between different models/system prompts/etc!

johannhof: I want to unite us on this common goal, that websites should be interoperable
… Maybe we need to make it easier to test and optimize generally

<Zakim> jamesn, you wanted to discuss ARIA and the accessibility tree - and how we can't rely on aria labelling and roles unless the tree is created

jamesn: I wanted to talk about technical feasibility of using ARIA, if we were going to do that
… For example, the label for ARIA, you can't get it until you spin up the accessibility tree
… Thats extra compute cost, are browsers actually going to do that for all agent users

tomayac: Is it cheaper to screenshot?

jamesn: Good question. But they'll be screenshotting anyway, they will be doing multiple thibngs

johannhof: Agree, there's this concept of no point in building complex systems too early, when it gets bypassed shortly just by processing the raw data
… because the systems get so good

jamesn: The accessibility tree is also fragile and causes crashes. Would be great if it didn't
… but that's the reality

Penny: On the topic of accessibility tree vs not. There's some speculation of using a cheap on device model to lower the cost, and then select between models if the problem isn't solved by the cheap one.
… I do think it should be as inexpensive as possible to be able to get the answer quickly, yes.

raginpirate: Wanted to return to question #1 (do we accept the premise). I do agree that we cannot do 100% by inference alone, but want to extend the question
… Will using 'hints' like ARIA even reach 100%?
… I am skeptical that the fundamental problem might not be feasible
… Want to challenge is this the right path, or is the right path that we need agents to NOT operate on the normal web, if we want to achieve actual 100%

Penny: As in, parallel MCP service or similar?

<csarven> I wanted to comment on no content duplication and enabling agents to "follow their nose". Some of this is not at all distinct from good / useful information for humans. The Self-Describing Web document which is a TAG Finding talks about this in depth and again RDFa is mentioned there for use in markup languages like HTML.

<csarven> There is a distinction to be made about semantic HTML and reuse of independently developed vocabularies to express some information. Take schemaorg as an example or countless other vocabularies on the web. They are able to describe different kinds of knowledge on the web as well as operations for applications.

raginpirate: Exactly

<Zakim> csarven, you wanted to mention TAG Finding, Self-Describing Web: https://www.w3.org/2001/tag/doc/selfDescribingDocuments which discusses use of RDFa for self-describing HTML

csarven: Wanted to talk about content duplication that you mentioned earlier.
… A lot of thats not distinct from useful information for humans
… Can have a single source of truth thats not hidden information
… Thats accessible, visible, and available to machines

<csarven> https://www.w3.org/2001/tag/doc/selfDescribingDocuments

csarven: There's a TAG finding about self describing web
… Really talks to the question of what is the purpose, the relationship between pieces of information
… Concrete example, a citation
… This sentence/argument/thought maybe refutes or augments this other argument somewhere else on the web
… Actually real linkage between those things because humans have a lot more context about the document or application they are interacting with and that context can be made more clear to the agents - at least reducing that gap

csarven: Second point, there's a distinction to be made about semantic HTML
… Worth noting that there are other vocabulary than HTML

csarven: we need to take into account different ways of expressing knowledge by different communities
… No central authority says what a trashcan should be
… We have bodies that can talk about icons
… We have bodies that can talk about genes or health data

csarven: Are we describing all of human knowledge at W3C?
… think about why schema.org exists in the first place

<mbgower> In response to the COGA group advocating using ARIA for adding meaning sematically, I did a very limited exploration of a way of using a controlled vocabulary to prompt users for unconventional uses of inputs and buttons. Based on the button example, it seems somewhat valid.

<mbgower> https://drive.google.com/file/d/1Nkpl3GFlGOuc_PdzxH6GJH1_zc6ZOdAH/view?usp=sharing

<Zakim> wendyreid, you wanted to note sometimes the machines make the same mistakes humans do

wendyreid: I ran into a real use case for this. Developer tool for developers to recreate accessibility bugs before they try to fix them
… QA writes really good test steps, so its useful to give to an AI to do stuff
… But when I asked it to write tests, the AI failed because the test says something like 'click on the buy now button' and the AI failed because it was a link not a button
… Humans say things like "that looks like a button" vs the underlying technological semantics

<mbgower> +1 to Wendy on the lack of correlation between semantic role and visual presentation

Yoav: Responding to noamr , we want our pages to talk to machines yes, but there's a significant difference between deterministic and non-deterministic machines
… Also very unclear how developers are meant to test any of this (+1 to johannhof )
… Beyond just the massive combinatorial problem, there's non-determinism

Yoav: I think one reason developers don't test ARIA is that its hard and not enough market share
… Agents will probably pass the market test side, but the testing is maybe even harder
… Maybe imperative solution like WebMCP is better than markup
… Does page markup really need to be machine readable
… Better if agent calls function rather than click a button

Penny: So is that giving up on functional declarative experiences that don't have JavaScript

Yoav: Maybe, good point

johannhof: I think there's a balance

<Zakim> dotproto, you wanted to speak to browser automation via WebExtensions

dotproto: Browser automation has been a common use-case for extensions for a long time
… Unsurprisingly we have seen people push into agentic space
… Had challenges in better serving automation use case
… Hesitant about exposing all data to a given extension

<GabrielBrito> dom yes please

dotproto: Things like "all URL" permission is terrifying, but is often used

<GabrielBrito> The queue has been closed, but I wanted to add that It's interesting to note that we (humans) also learn from our mistakes. So it would be interesting to think about similar mechanisms for agents. For example, If the agent deletes a cloud file by mistake, how can it identify that this happened and recognize that it should restore the file from the

<GabrielBrito> junk?

dotproto: I've heard from extension developers that there is a clear need to have a better way to interact with and identify with what content is available for interaction in a given webpage without requiring site author to do something
… looked at accessibility tree for this historically, but all the concerns we've heard today applied then

kush: This issue came up in WebMCP discussion.
… We are talking about having a declarative version of WebMCP
… We don't want to have a world where we're forcing developers to have script
… ARIA folks are reviewing the proposal

kush: We likely need new attributes to let sites opt into which parts of their page are relevant for agents
… But describing specific element purpose, it would be nice to merge accessibility and agents to minimize developer pain where relevant

<jamesn> /me kush is there a link to that proposal?

tomayac: For screenshotting, one big problem is discovering that there's more results possible for scrollable content
… This is not a problem for accessibility tree
… I suspect accessibility tree will become essential for the web

Mark_Foltz: I think there's a spectrum, between technologies that allow users to directly manipulate the page, vs fully automonous agent flows. Points in-between where the user is in the loop but not always acting.
… That middle ground may be the best space to augment or help

Penny: Want to thank everyone for the very interesting discussion
… Let's keep the conversation going
… Grab me in a break, lets chat in IRC or corridor

– DRAFT –
Semantics for the Agentic Web

11 November 2025

Attendees

Meeting minutes

Semantics for the Agentic Web

Diagnostics