Meeting minutes
Anssi: as a reminder, we'll use IRC-based queue management in this meeting:
https://
Anssi: to suggest agenda topics, use Agenda+ label, e.g.:
Anssi: please welcome our latest new participant, Tugce Tuncay, joining as an individual contributor
Prompt API
Repository: webmachinelearning/prompt-api
Model selection and availability
Anssi: issue #169
<gb> Issue 169 [Tag Review] - Model selection and availability (by etiennenoel) [Agenda+] [tag-tracker]
Anssi: TAG review of the Prompt API spec raised a question about how model selection and availability would work in the Prompt API
… per TAG review comment it would be "acceptable to indicate particular model capabilities instead of 'brand names'"
… concerns about cost of licensing a model
… Benjamin shared a reference to Mozilla's standards position
… two broader concerns "Calcifying around a single model and Lack of model neutrality"
Anssi: AiBrow browser extension for Chrome and Firefox has extended the Prompt API with a model selection mechanism
https://
const session = await AI.AIBrow.LanguageModel.create({
model: "phi-3-5-mini-instruct-q4-k-m"
})
Anssi: in the AiBrow implementation, the model is automatically downloaded if not present
… and the model becomes available to all sites on the machine after that
… I believe we can see fingerprinting and cross-site tracking issues with this design
<tomayac27> A similar model selection mechanism exists in Chrome's extension
Anssi: however, I think we also see potential solutions to mitigate these issues
Anssi: Thomas from Google shared another extension developed by the Chrome team that allows selecting from cloud and local backends
Tom: an experiment, to test polyfills of the built-in AI APIs
… testing if we can use this extension to use these APIs on other browsers
… this also works on Edge and mobile
… Android supported
… in this extension, cloud backends supported are Google Gemini, OpenAI, Google Firebase
… and local backend is Transformers.js
… options allow to override the native implementation
… an experiment mostly
Benjamin: picking between models solves calcification to some extent, but makes sites branching their code paths to models and their versions
… I shared an idea of a possible solution in the GH issue
… if the browser downloads 1-of-N diverse models, it would make it harder for a site to depend on a particular model behavior
Reilly: I agree with Ben's idea of downloading 1-of-N diverse models, with fully randomized models, not sure how that'd work
… the hard line would be, we can't let the site pick which model to choose, model selection API is a non-starter due to fingerprinting aspect, to detect which models are installed
… also practical question is it uses a lot of local storage
… if we allow sites to download multiple models it'd make this storage issue larger than with the current approach
… to require that the model B has an open license could be a path to pursue
Benjamin: introduction of openness on the models helps, as well as use of platform model, it is just one of the hard things, without a model picker, assuming language models are interchangeable
… if that is not the case, confronts the shape of the API
MikeWa: appreciate the discusion, I'm more aligned on Ben's comment on make the API treat the models fundamentally interchangeably, it has been our design goal
… to be able to request specific modalities for example, use that level of abstraction instead of low-level model details
… that may be more useful than "I want this specific model version"
… also, we do see Microsoft contributing in Chromium exploring the use of OS-provided models, also explore the use of Apple Intelligence
… OS-provided models in another good option, also +1 on Reilly to more toward open-weight models and open-source inference stacks, these are on our roadmap
… slightly different topic, one aspect we hope to explore is to reaffirm the interchangeability of models by common suite of use cases, samples and benchmarks, to test interoperability
… this is important as Chrome launches new versions of models, to ensure high-value use cases are addressed also when new model versions are being rolled out
Anssi: any preferred next steps in mind?
Benjamin: bringing proposals to the Mozilla's standards position issue might work
… and continue conversation on mitigation
Benjamin: I can also help bring Jake into the TAG review and this GH issue here to help drive productive discussion
MikeWa: I hope some discussion will continue in the Prompt API repo itself, to help advance the interop project, since Mozilla standards position issue is closed I don't expect us to engage there at this point
… appreciate continued engagement from Jake and other Mozillians in this repo and also in the TAG review issue
<reillyg> The latter.
<bvandersloot> +1 to latter
MikeWa: proposed resolution looks good to me, we also look into interop project with Microsoft to further this aim
RESOLUTION: Explore solutions for the Prompt API to address calcification and model neutrality issues. (issue #169)
Sampling modes
Anssi: issue #203
<gb> Issue 203 Proposal: Introduce Categorical Sampling Modes for the Prompt API (by isaacahouma) [enhancement] [interop] [Agenda+]
Anssi: Isaac opened this issue
… Isaac explains that initially sampling parameters temperature and topK were part of the API
… based on TAG feedback, these parameters were deprecated in the web pages context, exposed to extension only for now
… issue was behavior drift across models and versions that breaks API expectations
… there are valid use cases, so instead of removing entirely, proposal to bucket them into categorical sampling modes, currently:
… "deterministic", "precise", "balanced", "creative", "imaginative"
Anssi: it is an implementation detail how these modes map to temperature and topK and other sampling parameters
MikeWa: developers want to express more deterministic behavior, and explicit tuning parameters are not interoperable, but the modes we propose is a better way that can be implemented across model families and version interchangeably
… currently in Origin Trial in Chrome, getting developer feedback on it now
… feedback from this group would help determine if there are other options for configuring this aspect in an interoperable fashion
Anssi: the issue has the IDL proposal
… any comments or feedback on the proposal?
Reilly: wanted to clarify that this is intended to resolve TAG feedback
… hypothesis is this will give developers the level of control over model creativity without developers having to figure specific values
… we do this as an Origin Trial to explicitly ask developers feedback on this feature
… by running this experiment we want to get explicit feedback why they are using parameters, so we can build the API in a model agnostic way
<reillyg> We'd love to have the group's blessing to experiment. :)
<reillyg> +!
<msw> sgtm
<reillyg> +1
<bvandersloot> sgtm
RESOLUTION: Experiment with categorical sampling modes to inform the Prompt API design. (issue #203)
WebMCP
Repository: webmachinelearning/webmcp
WebMCP early wide review update
Anssi: I gave a heads up on our wide review plan to the chairs of the TAG, Security WG, Privacy WG
… for Architecture review with the TAG, the expectation is to file an incubation review with them, this class of review expects certain materials:
… - explainer that describes the problem to solve from an end-user's perspective
… - multi-stakeholder feedback from Chromium, Mozilla, WebKit, and other implementers, web developers, users
… - any major unresolved issues
Anssi: we have updated the explainer in PR #183, thanks Dominic for leading that rewrite
<gb> MERGED Pull Request 183 Explainer updates/rewrite (by domfarolino)
Anssi: minor tweak to the flow diagram in review, PR #189, expect this to land soon
<gb> Pull Request 189 Convert MCP flow diagram to mermaid (by bwalderman)
Anssi: for multi-stakeholder feedback, we have requested standards positions from WebKit and Mozilla:
WebKit/
<gb> Issue 670 WebMCP (by domfarolino) [topic: artificial intelligence (AI)] [from: Google] [venue: W3C Web Machine Learning CG]
mozilla/
<gb> Issue 1412 WebMCP (by domfarolino)
Anssi: when the explainer diagram is updated, we can proceed
… I propose the editors draft the proposed submission content somewhere in this repo, give the entire group a few days to review before we submit to the TAG so we have consensus on the content of the submission
… we can always update the content of the submission later if we need to and as we make progress, but it would be good to have a solid initial submission to not waste busy reviewers' time
… questions?
Brandon: the explainer cleanup looks good
Anssi: for W3C's security review with the Security WG, we are expected to write a "Security Considerations" section for the spec, taking into account the Self-Review Questionnaire on Security and Privacy
… we have combined security and privacy considerations due to the overlap and to avoid duplication of content, this is fine for this spec at this stage
… Security and Privacy considerations section landed in PR #181
<gb> MERGED Pull Request 181 Port Security & Privacy considerations from docs/ (by johannhof)
Victor: I looked at the self-review questionnaire while authoring the initial S&P consideration
https://
Anssi: thank you Johann, Victor and others for your work on this
Dominic: TAG review seem to like to have the questionnaire responses reference, so I propose we document that
Victor: I can open a PR to add that to the repo
Dominic: sounds great, thanks
Johann: I agree with Dominic, let's get the self-review documented too
… agentic AI risks are covered, traditional security model considerations could be better covered in the content
… I've proposed we explore the threat model we've discussion in parallel in a non-blocking manner, awaiting confirmation from the Security WG on their expectations
… comments?
Anssi: for W3C's privacy review, the expectation wrt materials is similar
… "Privacy Considerations" section, we'll offer the combine S&P considerations for this group as well
https://
Anssi: I have also informed the Privacy WG chair about our plans, and asked about their expectations for the review
… will update the group when I have more information
<domfarolino> +1
<Victor> +1
RESOLUTION: The group uses the updated Explainer and the newly authored Privacy and Security Considerations as a reference, and adds a self-review questionnaire responses, then stages review requests for the group to review before submission.
Hint for reversible or consequential actions
Anssi: issue #176
<gb> Issue 176 Hint for reversible or consequential actions (by johannhof) [Agenda+]
Anssi: the proposed design is that reversible or consequential actions hint is off by default, and the developer needs to explicitly mark the tool as such
… and last time we resolved to identify use cases to inform this design
… it looks like Dominic and Johann had a discussion around use cases for tools and concluded that most tools are not destructive or consequential, maybe ~90% fall in this category?
… Victor took a soft stance to support consequentialHint default=false based on review of existing use cases by MCP tools
… Johann notes the developer may forget to tag the tools, will use the defaults
… proposed design assumes tools are "not consequential" by default
Johann: for the readOnly tooling, agents could enter into e.g. research mode where only read-only tools are allowed
… explicit read-only tools would be valuable in that case
… I'm wondering if this is the same situation, do we want to list all consequential tools
… I tend to agree with others that this should be high user friction point
… "approve all the tool all the time" should not be the goal, due to prompt fatique style issues
… I think we have alignment on the proposed design
… all the discussion seem to be supporting the "off-by-default" model hypothesis is worth exploring further, but we don't have enough data to confirm that hypothesis yet
Dominic: we're Origin Trialing the feature and will get data from users whether this feature would work per our hypothesis
Johann: we have to make sure to work with partners to understand how to expose consequential tools, what is the right UX treatment for that
Benjamin: want to resurface that if the consequential defaults to false, it might be worth explore hint matching with MCP open world hints
Johann: a bit difficult situation, existing hint naming is not optimal, do we follow the existing naming or better names that convey the meaning better
Victor: I want to respond to Ben and why we should not be following MCP inspired hints
… I think not much consideration was put in the design of MCP hints
… MCP hints used to just prompt the user somehow
… copying MCP design may not be the best design for WebMCP hints
… I prefer the name reflect the intent of the hint
… in MCP clients, people need to enlist into use of the hints, only then they can be used
… we vet websites before use, thus defaults are more important than in MCP
… if people don't label things as destructive by default they assume them to be descructive
… lean on "consequential" hint to be false by default
Dominic: how would this be used other than prompting the user?
Dominic: should document why this is not called?
Victor: my rationale is, in this situation, this can be more of a hint than deterministic contract
… in the future the model may be so good does not need to prompt the user
… ack johannhof
Johann: I agree with Victor, people tend to misunderstand, they don't know what needs prompting
Victor: open to wait for the Origin Trial data, but also OK to proceed
<domfarolino> +1
<Victor> +1
<johannhof> +1
RESOLUTION: Specify consequential hint as default=false (issue #176)