W3C workshop on
Web and Machine Learning
Web Platform: a 30,000 feet view
Dominique Hazael-Massieux – @dontcallmeDOM
Hello, Warm welcome to the W3C Web & Machine Learning workshop.
I am Dominique Hazael-Massieux, I am part of the W3C technical staff and have been working with the Workshop Program Committee in organizing this virtual event.
You can find me on twitter and github as @dontcallmeDOM (but do call me Dom).
This talk aims to give a very high-level overview of the specificities of what we call the Web platform for those of you who may not be as familiar with how browsers work.
Since a lot of the existing Machine Learning development happens in non-Web environments, the Program Committee thought it would be useful that we build a shared understanding of these specificities and how it impacts the deployment of new technologies in Web browsers.
So what is it that we call the Web platform?
So that's for the technology stack that constitutes the Web platform.
Web Platform specificities
- Device Independent
But there are four key aspects I want to highlight today that make browsers a unique execution environment.
Anchored in the network
A first aspect is that browsers are built with networking as a core design point this means that it is relatively easy to bring in resources from different network sources, but it also comes with constraints in that usage: in particular, a lot of the trust and security boundaries are anchored in network identifers, mostly via DNS and certificates.
Browsers mediate user needs and wishes, including
A second aspect is that browsers are designed as user agents they are positioned to be the agent of end users.
They are expected to keep the user in control and help ensure their needs are fulfilled W3C pays particular attention to considerations of security, privacy, accessibility and internationalization when it comes to user needs.
To respect that position agent of the user, it is often necessary to impose limits to what developers can do in the code that browsers will run for them.
A third important aspect is that all the technologies that constitute the platform are defined through open standardization processes, including naturally in W3C, and these standards are implemented in multiple competing products and are made available freely to developers and end-users alike.
- device type
- screen size
- hardware capabilities
- operating systems
Finally, browsers are available on most end-user devices, no matter their operating system and their hardware, including computing architectures and capabilities.
In particular, technologies made available through browsers need to be as device- and platform-independent as possible, and provide accommodations for situations where they cannot be.
Intersection with Machine Learning?
With that characterization in mind, what does it imply when it comes to bringing a new technology such as machine learning to the Web platform ?
This means developers mostly aren't exposed to concurreny issues, which is a good thing, but it also creates constraints in how programs can be architected.
Various mechanisms are available to enable developers to run different pieces of their code concurrently, most notably asynchonous APIs, workers and worklets, but they come with their own set of execution constraints.
The high level consideration to keep in mind is that technologies that require heavy CPU operations need to carefully integrate with these architectural constraints.
Network sources define the security boundaries in Web browsers
A second characteristic is that from within a Web application, most interactions with the network will by default be limited to the domain of where the application is hosted (more specifically, its origin), and access to other network destinations will only be granted if that other destination opts-in to it.
This helps limit exposures to private resources in local networks (for instance a home network or an enterprise network).
In the context of machine learning, this means that for instance, loading a trained model from a third party service would need to conform to this constraint and may also need additional sandboxing to respect the end user security and privacy.
Constraint: Tracking protection
Protecting users’ privacy via
- Limiting uniquely identifable data (“fingerprint”)
- Limiting how storage can be used by apps
A third similar characteristic, and one of rapidly growing importance, is that browsers limit what identifiable information can be tracked across Web sites: while a lot of parties attempt to identify users as they browse through different Web sites, browsers attempt to give users control on what gets shared and when and how to limit what gets shared without user awareness.
In particular, a lot of the recent evolutions in how “cookies” are managed, how data gets stored and how much uniquely-identifiable data is exposed in APIs (that data can be used to “fingerprint” a particular user or a particular device), a lot of these evolutions can have deep impact in how new Web technologies get designed.
Constraint: Platform neutrality
Optimize for but work without:
- Different CPU set up (architecture, # of cores, power)
<insert future hardwhare here>
As a final characteristic to call attention to in the context of bringing Machine Learning to the Web, Web technologies need to be designed to run across many different platform and architectures.
Machine learning tends to be very compute-intensive and thus would generally need a lot of optimization to run as efficiently as possible; while these optimizations are definitely relevant in the context of the Web platform, they need to be exposed to developers in a way that let browsers run these optimizations on all the devices they operate on.
In some cases, the lack of specific hardware makes running a given application likely not worth it various mechanisms of feature detection and context establishment have been deployed in other technologies (for instance, detecting the availability of virtual reality gear in WebXR) and they let developers determine if and how to adapt their application to the specific end user context, while preserving as much as possible the user privacy.
Web Platform Pillars
For people coming from other execution environments, these constraints can sometimes appear overwhelming, and in truth, they can in fact be pretty challenging to address in a number of cases a good chunk of the open standardization process that W3C hosts revolve around finding the right trade-offs among the competing needs they represent.
Naturally, these constraints don't exist in a vacuum they are the pilars that have enabled the Web platform to grow to be the most deployed software platform, available to more than 4 billion people around the globe, and with the largest community of developers available.
W3C has built processes, culture and institutional knowledge to ensure that the Web platform keeps pace with the evolving capabilities of computing devices while preserving the characteristics that make the Web the most universal platform available.
I hope the presentations and discussions at the workshop will pave the way for Machine Learning to bring its capabilities to the service of the many billions of users that rely on Web browsers in their daily life, and I hope this presentation helped better understand what it means to bring Machine Learning capabilities to the Web platform.
Thank you for your time.