APIs, Safety, and User Notifications on The Web

Mozilla strongly advocates One Web with different views into it, depending on the limitations or abilities of the device being used to access the web. At Mozilla, we generally apply the term Device API to mean Application Programming Interfaces (APIs) available to web content that allow access to hardware or underlying platform features, including cameras, GPS systems, connectivity, battery levels, and Personal Information Management (PIM) stores.

From Mozilla's perspective, these APIs will manifest themselves as Web APIs exposed to JavaScript that are equally applicable on desktop PCs, laptops, and mobile devices. Mozilla's software, which includes the Firefox web browser, the Fennec browser ("mobile Firefox") for mobile platforms, and platform technology such as XULRunner, propagates this notion of One Web within browsers, widgets, and web runtime variations. Thus, in the positions Mozilla advocates, security considerations for Device APIs are distilled as general security considerations for Web APIs.

Mozilla's Positions for "Security to Device APIs from the Web" Workshop

Mozilla encourages participation from user agent and web runtime software organizations to standardize Device APIs. As mentioned above, Mozilla sees these Device APIs as general Web APIs exposed to JavaScript, implemented in user agents, widgets, and web runtimes.

As a guiding principle, Mozilla believes the user should be in control of any sensitive information shared with Web applications; that permission(s), once granted, can easily be reviewed (e.g., see a list of sites that have been granted permissions and what those permissions are) and easily be revoked.

This paper presents three positions on security for access to Device APIs from the web, and does so from the perspective of a user agent (web browser) such as Firefox or Fennec:

User agents should NOT present modal dialog boxes to users for soliciting permission
Device APIs should be asynchronous in nature
Web content panes should not solicit user permission independently from a user agent

Furthermore, we introduce two areas for further exploration and discussion, but on which Mozilla does not present a position here:

In some cases, network operators should have control over security policy
Trust and reputation systems can be used by users to make decisions

The next section discusses these positions in further detail.

No Modal Dialog Boxes To Solicit Permission

Historically, when an application seeks to cross a safe "sandboxed" execution zone to one that has implications for the user (including privacy or security implications), a dialog box has been raised to alert users. Mozilla's position is that such dialog boxes should expressly NOT be modal and blocking in nature, AND that access to features that are not part of the safe execution zone should be denied by default.

Modal dialog boxes need user interaction, and application flow is blocked till an interaction takes place (e.g. the user is presented with a persistent, blocking dialog box asking whether they want to "Grant Permission Once | Always Grant Permission | Cancel"). This creates potentially unsafe scenarios where users click on a decision that may not be the safest decision, since they want the dialog box to be dismissed. Furthermore, it may also lead to inadvertent selections, and malicious content can create loops to force a decision. This is particularly a risk in the mobile context, when devices may feature touch screens, where the risks of inadvertent selections may increase. In general, Mozilla's experience here is that often users will make selections on dialog boxes without fully absorbing the implications of these selections. Papers and studies that support this reasoning about modal dialog boxes include:

A study conducted by the Psychology Department at North Carolina State University, cited by Ars Technica, on the actual impact dialog boxes have on user actions. The study suggests that users do not take sufficient time to absorb presented information, and that users generally find such interruptions more of a hindrance than a benefit in reasoning through security implications. Note that while Mozilla looks forward to publication of this study (scheduled for appearance in Proceedings of the Human Factors and Ergonomics Society), Mozilla does not agree with the editorial tone of the Ars Technica article with respect to user aptitude.
A paper by Peter Gutmann (University of Auckland) discussing usability factors in security. General themes include users "clicking to dismiss" for flow continuation, unhelpful messaging through dialog boxes, and the mismatch between a software developer's ideal expectations about a user's actions, and a user's actual actions.

Instead of modal dialog boxes, Mozilla thinks safer options include "infobar notifications" that do not block the user, and that do not block flow. A classic example of this approach has been Firefox's use of the infobar when popup windows are blocked (essentially forcing silent non-blocking failures of window.open()). A mockup for the Fennec device browser might look like this:

Fennec Popup Block Image

Note that web content continues to render below the infobar above, despite the notification of a silent failure (in this case, the "failure" to open a popup window, which is a JavaScript call which typically doesn't affect the flow of the rest of the content).

Caveat: User Interfaces (UI) presented in this section are for illustrative purposes only, and do not reflect final product UI in Fennec or Firefox. A "close" option, for example, is missing, and is likely to be added in subsequent iterations of this UI.

The user can deal with the information at their leisure, and process preferences outside the flow of web content, such that layout isn't blocked by the information. Extending this idea to Device APIs that cross a "sandboxed" web content barrier, the "infobar" could be used as an asynchronous mechanism to notify users, solicit a decision, and then render (or re-render after a silent failure) a page after the user's decision has been noted. Here is an example of this in the case of the Geolocation API, which makes a JavaScript call for location information from the underlying platform or hardware implementation:

Geolocation Invocation

Again, web content will render below the infobar. User agent implementations may choose to silently fail the initial invocation, but re-invoke the API after permission (and specificity) has been granted. In order to enable this mechanism of user notification and user interaction, Device APIs should be asynchronous.

Asynchronous Device APIs

Device APIs should be asynchronous; in particular, user agents should not have to block on return values of Device API function calls, and Device APIs should be driven by callbacks.

The stipulation for APIs to be asynchronous supports the point about NOT having modal (blocking) dialog boxes, since waiting for a return value (in synchronous APIs) makes non-modal user interfaces (UI) difficult to design. The security principle behind this position is thus one of UI flow for permission solicitation. In general, asynchronous APIs are useful in scenarios where developers have to consider network latency, since waiting for return values can cause blocking or application delays (including timeouts). Avoiding lags or timeouts with asynchronous APIs allows user agents to silently fail functions till user permission has been granted.

Furthermore, synchronous APIs can reveal information about the user. For instance, an API design that blocks until the user makes a selection must explicitly inform the web application that the user has denied them access, usually through an error code. In the more specific case of geolocation, simple failure to return location information could have many causes, including user permission denial, lack of signal strength, or lack of a GPS subsystem. The user agent need not provide any further information.

The W3C Geolocation API serves as a good illustration of asynchronous API design:

function showMap(position) { // Show a map centered at (position.latitude, position.longitude). }

// One-shot position request. navigator.geolocation.getCurrentPosition(showMap);

In the example above, the Geolocation object's asynchronous getCurrentPosition function is invoked with a developer defined callback function called showMap, which handles return values associated with the asynchronous function call.

Web Content Should Not Solicit User Permission Directly

Recently, a security bug (which has since been fixed) was uncovered in Flash whereby the configuration and settings panel could be spoofed via a clickjacking attack from within web content. In particular, this resulted in inadvertent access to the device's camera. This example illustrates the potential for malicious access to hardware and platform features through APIs available to web content (plugins in particular), but which fall outside the scope of Device APIs directly supported by user agents.

Mozilla proposes that web content should not directly solicit user permission, and strongly recommends that plugins in particular should adopt identical UI flow for soliciting user permission as Device APIs exposed to JavaScript in user agents.

A hypothetical scenario in which a plugin such as Flash supports a similar Device API as user agents illustrates this point. Suppose a plugin supports the W3C Geolocation API, which is also directly supported by user agents. Currently, in a web application leveraging this plugin, there would be a permission granting flow in UI specific to the plugin, and in a web application directly deployed in user agents, there would be a different permission granting flow specific to the user agent. The net result is user confusion, and an increased attack surface for tricking users to grant permission through web content.

Note that advocating this position does mean that the legacy Netscape Plugin API (NPAPI) may have to be reworked so that binary extensions of user agents can also asynchronously invoke the user agent's UI and permission model. It is not Mozilla's intent to dictate final UI in other applications, including plugins; in this paper, we advocate the position that one UI flow for permission solicitation for similar features minimizes user confusion, and is a recommendation, not a requirement.

Further Areas for Exploration

The areas below are presented as discussion points, but not as Mozilla's advocated positions. We welcome further discussion on these ideas.

The Role of Network Operators and Enterprise Proxies

Typically, security discourse on the web has featured protecting users from malicious web content. The proliferation of mobile devices as a means to access web content raises the additional role of the network operator as an entity that must also be protected by Device API security considerations. Not all network operators may wish to enable applications that access cameras, for example. There may be valid bandwidth reasons for limiting certain camera applications, alongside valid contractual reasons.

Similar reasoning applies for enterprise proxies, who may wish to limit access to hardware and platform features as well (including cameras and geolocation).

A hypothetical example illustrating this might be that a given device is in a location where it has a choice of multiple wireless networks. One of the networks expressly does NOT allow uploads from the device of image captures (whether video or image uploads). Another does. If both networks exposed this as a standardized policy, the device may be able to choose the network to leverage for camera API web applications based on policy.

Mozilla solicits general developer feedback on mechanisms (including standardized interfaces) that allow networks to expose policies.

The Potential Use of Trust and Reputation Systems

The proliferation of mobile devices to access web applications which are also Personal Information Management stores (address books) creates scenarios where trust and reputation systems can be used to make decisions, including hardware or platform access from web applications through Device APIs.

For example, in a non-modal infobar, the following information may be presented to users of a hypothetical application called Map Your 'Hood:

12 people in your address book have allowed Map Your 'Hood to know where they are. Would you like to tell Map Your 'Hood Exact Location | Neighborhood | City | Nothing ?

Your friend Dominique has allowed Map Your 'Hood to take photos of locations and post them. Would you like to allow Map Your 'Hood to access your camera?

Essentially, the potential use of social networks to establish trust and reputation for applications that seek to cross "sandboxed" boundaries may be evaluated against operator certification or evaluated against other reputation mechanisms including verification by a certificate authority in signed code scenarios.