AI Model Management – 25 September 2024

Meeting minutes

scribe sebastian

Slideset: https://github.com/webmachinelearning/hybrid-ai/blob/main/presentations/WebML%20Discussion%20-%20Hybrid%20AI%20for%20the%20Web%20-%20AI%20Model%20Management%20TPAC%202024%20Breakout.pdf

Problem Statement ... Issues and Constaints ... Discussions ... Next steps

Problem statements

McCool: AI models ...
… can be very large
… will often be shared
… are often updated

McCool: There is also the current same-origin storage portioning policy
… this preserves privacy issues
… this works for images (not shared) and for software libraries (often shared)

Use Cases for Large Models

McCool: Language translation, ...
… meeting captions
… background removal
… video creation and editing
… written language recognition
… personal assistant

<kenji_baheux> who is speaking?

XXX: Shall we specify a shared repository which only has my personal information?

@@@

Why run AI?

McCool: There are pros such as latency and being used offline.
… cons are size limitations, download time, and storage costs

Model Size vs Download Time

McCool: Average home networks speeds are between 45-216 Mbps
… downloading Phi-3-mini will result,e.g, to 22mins for baseline

Existing APIs and Experiments

McCool: For the same-origin there exists HTTP Cache, ...
… Cache API
… IndexedDB API,
… Origin Private File System API

McCool: For the cross-origin there exists the File System Access API

Caching Desired Properties

McCool: we need to reduce latency, ...
… Bandwidth
… Storage
… and preserve privacy

Security and Privacy Considerations

McCool: browsers implement only per-origin local caches
… the cross-site privacy risk based on cache timing analysis
… the per-origin caches tolerable for “typical” (non-AI) web resources. But AI Models are large and potentially shared

Issue Starter Pack

w3c/tpac2024-breakouts#15

McCool: Here are first issues for discussion:
… background model download and compilation,
… model naming and versioning
… allowing for model substitution when useful
… common interface for downloadable and “platform” models
… storage deduplication
… model representation independence
… API independence
… browser independence
… offline usage, including interaction with PWAs
… cache transparency

David: Seems a big issue. Are interested are you working on specific APIs?

<Zakim> dsinger, you wanted to mention similar problems

Alternatives

McCool: One option can be that we define model-aware caches
… use 'fake misses' to avoid redundant downloads
… identify cache items by content-dependent hashes

McCool: the idea is that model caches would behave as if they were per-origin caches

McCool: there is another alternative, the auto-expedite common models
… the more common a model is, the less of a tracking risk it is

<kenji_baheux> echoing Erik's point on finding if there is an actual problem; want to share what I've heard so far from partners. Enterprise / Edu customers would like to use a custom LLM that speaks their users' language / lingo across different origins (typically internal websites and/or popular 3p solutions to Enterprise/Edu needs). Other extreme, some

<kenji_baheux> partners want a custom LLM that speaks to their community but they wouldn't share it with other sites. Likely it would be running on the server as most of their users wouldn't necessarily want the big download.

<ErikAnderson> It's a proposal supported only by Chrome and Edge, but Related Website Sets is an example of a technology that may be applicable here _if_ we hear from customers that it's a common use case to share a company-specific model across multiple top-level eTLD+1s.

adjourn

– DRAFT –
AI Model Management

25 September 2024

Attendees