Whitepaper: Handling Trust and Permissions in Web Applications

This is a work in progress and incomplete, comments are welcome!

Dave Raggett, W3C
July 2014

This white paper surveys both Web and native application platforms for how they approach the challenge of addressing trust, especially for capabilities requiring elevated permissions. The motivation for this work is to prepare for discussions on a road map for shared open standards for permissions for the Open Web Platform, as developers demand richer capabilities comparable to those available on native app platforms.

Introduction

The Open Web Platform is based upon open standards and is supported on billions of devices. It is the only vendor neutral platform that spans such a wide range of devices (e.g. desktop, smart phones, tablets, TVs and cars). For developers seeking to reach a wide range of devices and operating systems, the Web is the obvious choice. However, the success of the Web has encouraged proprietary platform owners to support native applications and app stores. This has been very successful on mobile devices, where developers have been able to take advantage of vendor support for a comprehensive range of APIs. A performance and capability gap has emerged between native and Web apps. Hybrid approaches have emerged which allow developers to use standard web technologies together with proprietary extensions, and then compile to the native app platform.

Web applications are traditionally hosted by HTTP servers, with the various resources making up the application being loaded dynamically by the web run-time (i.e. a Web Browser). Web applications can also be packaged for local installation, akin to native applications. There is a lack of interoperability for packaged apps due variations across platforms, e.g. for the associated manifests, and the use of proprietary APIs. This whitepaper surveys the field, but does not claim to be fully comprehensive. However, it should provide a broad picture of the approaches that have been taken in respect to handing trust and permissions, and potential ways to move towards a general consensus on how to extend the Open Web Platform.

Developer tools range widely in their sophistication. Some require advanced programming skills, whilst others are aimed at end users. Note that some of the platforms listed below are no longer available.

Native Platforms

This section looks at trust and permissions for native application platforms.

Google's Android Platform

Android 4.4 start screen

Android provides an extensive suite of APIs for applications written in the Java programming language. Each application includes a manifest with a declaration of the permissions that the app needs. Users are required to give their consent before the application is installed. The user may be asked again when the manifest for app that is about to be upgraded requests an expanded set of permissions.

The Android permissions are as follows:

screenshot

This list was included to illustrate the extensive range of permissions, and that many of these are specific to the design of the platform as opposed to generic capabilties. Not all devices will support all capabilities, e.g. NFC and IR.

Android's consent form provides generic descriptions of the requested permissions, but not what the given applications will do with them.

This approach encourages users to tap the ACCEPT button to proceed to try out the app, without an understanding of why the app needs these permissions. Trust is based upon the popularity and ranking of the app on the Android app store (Google Play), the name of the developer (if a well known brand), and faith in security apps like Lookout to intervene if you inadvertently are trying to install malware.

Android's approach to permissions means that developers can rely on all the permissions listed in the app's manifest as the user has to agree to them as a whole and can't pick and choose. Note that just because a permission was granted doesn't mean the device has the hardware to support the associated capability, so developers need to exercise some caution.

Apple's iOS Platform

Apple's iOS platform was the first major smart phone operating system, and runs apps developed with the Objective-C programming language.

iOS permission dialogue

A major difference between iOS and Android is that in Android, permissions are requested up front before installing an application, whereas in iOS, permissions are requested at the time of use of a particular capability, and users may deny the request. This means that app developers need to explain how the capability will be used prior to invoking the permission dialogue. If users tap "Don't Allow", there is no easy way for them to change their mind. Brenden Mulligan's techcrunch post on the right way to ask users for iOS permissions) recommends providing clear explanations of benefits, followed by an app generated dialogue asking for a permission, and if the user says yes, this is then followed by the operating system generated permissions dialogue. If the user says no to the app generated dialog, the app can later ask again, without the user having to gone through the complicated steps to undo the Operating System's record of the user's "Don't accept" action. Note that unlike Android, developers need to write their code to fail gracefully if the permission isn't forthcoming.

Changing the permissions in iOS6: Step 1 - activate the Settings dialogue. Step 2 - tap on "Privacy". Step 3 - tap on the category of capabilities you are interested in. Step 4 - step through the list of applications and toggle the permission on or off.

settings dialogue privacy dialogue app permissions dialogue

This illustrates another difference from Android. Apple has opted for a coarse set of capabilities for users to deal with when it comes to permissions, which compares to the very long list of fine grained capabilities listed above for Android.

Question: Does iOS allow you to view and change all the permissions for a given app in a single dialogue for that app?

Microsoft Windows Runtime

The Microsoft Windows Runtime enables developers to create apps with JavaScript, C#, Visual Basic, and C++ APIs for the Microsoft Windows Store. This covers devices ranging from tablets to desktops to large wall mounted touch screen displays. Having done so, you can port your app to Windows Phone for distribution on the Windows Phone Store. This wide range of devices presents challenges for designing the user experience, and Windows 8 supports a variety of navigation patterns.

"Windows Runtime APIs will look and feel familiar to experienced Web developers. They represent a clean extension of standards based web development APIs"

The Windows Runtime supports an extensive suite of asynchronous APIs, and developers can take advantage of standard components such as FileReader, Web Sockets, Geolocation, IndexedDB and others. Asynchronous programming is handled with Promises, which provide an object representing a value that has yet to be computed, or an error that has yet to occur. The Windows Runtime defines more than 800 individual classes and enums, along with a hierarchy of namespaces. The Windows Runtime APIs are split into the following categories:

Some APIs are restricted to Windows Runtime Apps and are not supported for desktop apps or browsers. APIs also distinguish between HTML and XAML as user interface markup languages. Windows Runtime apps using JavaScript are executing using the Windows Internet Explorer Standards mode. As a result some HTML and DOM APIs behave differently or aren't supported.

Anssi Kostiainen comments:

I believe the reason for Windows Runtime disabling some Web APIs is mainly due to security (document.write, innerHTML etc.) or UI/UX (alert, close etc.) reasons, and not due to the standards-compliant mode being used by the rendering engine. This is actually rather similar to how Chrome Apps [1] disable some APIs that are exposed to Chrome the browser by default.

This seems to suggest APIs should be designed in such a way they can be gated behind e.g. a promise.

Windows Phone

windows phone start screen

Apps for Windows Phone can be developed in JavaScript, C++, C#, and VB.NET. Windows Phone requires users to agree to a list of permissions upfront as a precondition to install apps from the Windows Phone Store. The permissions needed are listed on the left side of the app's page in the Store under the heading "App requires". Here is a list of permissions that apps can request according to the Windows Phone Central website.

The required permissions are declared in the app manifest which is an XML file generated by Visual Studio from the app's project settings. Windows Phone is similar to Android in requiring upfront permissions.

Blackberry 10 Native Apps

Blackberry 10 home screen

Blackberry 10 is basd on the QNX operating system. The Blackberry 10 native SDK allows you to develop apps in C/C++ or QML (a JavaScript like scripting language for Qt). The operating system generates a dialog box automatically to request permissions from the user. The user can decide which requested permissions to grant and the choices are recorded for use when the app runs the next time, or even after the app upgrades or updates. Developers must list the permissions they want in the bar-descriptor xml file. Permissions are divided into categories:

Permissions can structured into main and nested permissions. Users can choose whether to grant all sub-permissions under the main permission or just a subset.

Cross Platform Frameworks

These frameworks allow developers to create app using cross platform technologies, and either compile them into native apps that can run on a variety of operating systems, or provide a native run-time that can execute cross platform code. This can cut the time to deliver to multiple target platforms. Cross platform frameworks inherit the trust/permissions model from the native platform.

According to Research Guidance (see below), the most popular tools are PhoneGap and jQuery Mobile, followed by Adobe Air, Qt Creator, Unity 3D, Titanium, Marmalade, Sencha Touch, Xamarin, Unity Mobile, and Corona SDK. Cross platforms tools are mainly used to develop apps for games, followed by utilities, business, education and entertainment in decreasing order of popularity.

For more in-depth reviews of cross platform frameworks, see:

The following covers just a few cross platform frameworks. A wider set is covered in the appendices. These include: Telerik, Appcelerator Titanium, Xamarin/Mono, Qt, Unity 3D/Mobile, Corona SDK, Marmalade, GINGEE, Codename One, DragonRad, RunRev LiveCode, IBM Worklight, MoSynch, RhoMobile, and Whoop. Further research is needed to identify how most of these solutions approach the challenge of permissions.

PhoneGap/Apache Cordova

PhoneGap is an open source distribution of Cordova, which is a mobile app development framework supported by the Apache Cordova project, see the PhoneGap FAQ. It allows developers to use HTML5 and JavaScript to create native apps for Apple iOS, Blackberry, Google Android, LG webOS, Microsoft Windows Phone, Nokia Symbian OS, Tizen, Bada, Firefox OS and Ubuntu Touch. There are separate SDK's for each target platform. Cordova supports a small set of APIs, and these can be supplemented through plugins. The core plugins include:

Some of the commonly downloaded plugins include: device, console, file, inappbrowser, network-information, dialogs, splashscreen, camera and geolocation. At the time of writing this whitepaper, the cordova website listed 234 plugins.

Adobe AIR

Flash settings panel

AIR (Adobe Integrated Runtime) is a cross platform run-time system for desktop and mobile, based upon Adobe's ActionScript and the Adobe Flash Player, together with extensions for device capabilities such as access to the local file system, taskbar/dock integration, accelerometer and GPS. The permissions model is inherited from Flash and tailored for each target platform, e.g. iOS, Android, Blackberry, and so forth. The screen shot is for the Flash Settings panel on Linux and has tabs for:

Web-based Platforms

The Web is characterised by the availability of browsers on many different devices and operating systems, and from a variety of different vendors. There is good interoperability for core features, although application developers do need to consider varations in support, especially from older browsers, or for new features where the standards are still emerging. Popular web libraries like jQuery can simplify development through APIs that mask differences across browsers.

A recent trend is the emergence of platforms that combine the core features of the web with proprietary features for an alternative to native application platforms. These support server based applications in the same way as conventional web browsers, and also support installed applications, where the various components have been packaged into a single file for easy installation and offline operation.

More recently, there is growing interest in enabling server-hosted applications to work better offline using Service Workers. Related work on application manifests provides a means for developers to put metadata associated with a web application. As these mature, they are expected to provide cross vendor alternatives to packaged apps, which today need to be tailored to each platform due to variations in packaging formats across vendors.

Chrome Apps

Google Chrome Apps are essentially web applications that run on the Google Chrome web run-time and execute without the regular browser UI (aka chrome). Google supports both server hosted and packaged apps.

Chrome Apps deliver an experience as capable as a native app, but as safe as a web page. Just like web apps, Chrome Apps are written in HTML5, JavaScript, and CSS. But Chrome Apps look and behave like native apps, and they have native-like capabilities that are much more powerful than those available to web apps. Chrome Apps have access to Chrome APIs and services not available to traditional web sites.

The app lifecycle has the following steps:

Installation
The user picks the app from the app store and chooses to install it, and in the process explicitly grants the permissions the app requests in its manifest.
Start up
The app launches with an "event page" and one or more "app pages".
Termination
The user or the operating system can terminate apps at any time. Apps can save their state for subsequent invocations.
Update
Apps can be updated at any time, but this doesn't effect apps while they are currently running.
Uninstallation
Users can uninstall apps. Google Chrome ensures that all executing code and private data associated with the app are purged.

Chrome apps use the same security model as the Open Web Platform. This includes the Same Origin model, support for Content Security Policies, app local storage, and isolation between different windows for the same app. The permissions model requires upfront consent by the user. It is unclear what requirements there are for app developers to explain to the user just how each of the requested permissions will be used by the app. This makes Chrome Apps along with Android subject to the click through effect, where users feel encouraged to give consent in order to start using the app.

The following lists the currently available permissions:

This can be compared with the Android permissions as described above. Android provides a more extensive set of permissions that reflect the richer integration with operating system level capabilities.

Some web features aren't available for use by Chrome apps or else are supported in a different way. Google justify this as avoiding security issues and improving programming practices. Some examples include cookies and document.write. For more details see Google's page on Disabled Web Features.

Firefox OS

Firefox OS

Firefox OS is a Mozilla platform for smart phones and tablets based upon a web run-time layered on top of the Linux kernel. This allows developers to create apps using HTML5, JavaScript and CSS.

"The webapps platform that we use in FirefoxOS and Firefox Desktop allows any website to be an app store", Jonas Sicking, 2 June 2014, Mozilla

The way Firefox OS handles app permissions distinguishes between hosted apps and packaged apps. Hosted apps are dynamically downloaded from websites. Packaged apps are installed on the device analogous to native apps on other platforms, and are divided up into three categories:

Privileged and certified apps are required to have content security policies. Firefox OS makes use of JSON manifest files that are linked from the HTML for hosted apps or included as part of packaged apps. All apps are required to invoke an installation method to register the manifest. This directs Firefox OS to validate the app and ask the user for approval to install the app.

Here is a list of permissions with the minimum app type required, and whether the permission is enabled by default, or results in a prompt at the time of use. Note that permissions for certified apps are intended for system level applications.

Permission Description Minimum Default
alarms schedule notification or app to be started hosted allow
audio capture get audio stream from e.g. microphone hosted prompt
audio channel alarm alarms from clock or calendar privileged allow
audo channel content music, video hosted allow
audio channel normal UI sounds, web content, music, radio hosted allow
audio channel notification new email, incoming SMS privileged allow
browser enables browser in iframe privileged allow
camera take photos, video, record audio, control camera privileged prompt
contacts read/write access contacts on device or SIM privileged prompt
desktop notification display notification on desktop hosted prompt for hosted apps, otherwise allow
device storage music read/write access to music stored on device privileged prompt
device storage pictures read/write access to pictures stored on device privileged prompt
device storage sdcard read/write access to files stored on SD card privileged prompt
device storage videos read/write access to video stored on device privileged prompt
fmradio control fm radio hosted allow
geolocation access device location hosted prompt
keyboard allow app to act as virtual keyboard privileged allow
mobile network access network info e.g. MCC, MNC privileged allow
push enable app to wake up for notification hosted allow
storage utilize appcache, indexedDB hosted allow
system XHR enable cross origin XHR without CORS privileged allow
tcp socket create and use TCP sockets privileged allow
video capture obtain video stream from e.g. camera hosted prompt
attention allow apps to open window in front of other apps certified allowed
audio channel ringer incoming phone calls certified allowed
audio channel telephony telephone and VoIP calls certified allowed
audio channel notification forced camera shutter sounds certified allowed
background sensors listen to proximity events in background certified allowed
background service allow apps to run in background certified allowed
bluetooth low level access to Bluetooth hardware certified allowed
cell broadcast fire event e.g. on emergency network notification certified allowed
device storage apps read/write files in apps storage location certified allowed
embed apps allow embedding of apps in mozApp frames certified allowed
idle notify when user is idle certified allowed
mobile connection access to info about voice and data connection certified allowed
network events monitor network uploads and downloads certified allowed
network stats manage access stats of data usage per interface certified allowed
open remote window window.open as new process certified allowed
permissions allow app to manage permissions of other apps certified allowed
power turn screen on/off, control CPU, listen to lock events certified allowed
settings configure and read device settings certified allowed
sms send and receive SMS certified allowed
telephony enable telephony APIs to make and receive calls certified allowed
time set current time certified allowed
voicemail access voicemail certified allowed
webapps manage manage installed open web apps certified allowed
wifi manage enumerate networks, access strength, connect to network certified allowed
wappush receive WAP push messages certified allowed

The Firefox OS approach to permissions implicitly grants some permissions depending upon the application type (hosted, privileged or certified) and asks the user for approval for other permissions, e.g. access to the camera. When Firefox OS doesn't prompt, trust is based upon the review performed by a human being (the App Store reviewer) who approves adding the app to the app store.

An open question is whether Firefox OS allows you to view all the permissions for a given app and choose whether to allow or deny them. It is likely that you can't deny permissions for certified apps.

Ubuntu Web Apps

Note: Ubuntu also supports QML based apps, see e.g. this tutorial.

Ubuntu Web Apps enable Ubuntu users to run online applications like Facebook, Twitter, Last.FM, Ebay and GMail direct from the desktop, and treats web apps as first class citizens. This means that you can search for and invoke web apps in just the same way as for native apps. Web apps can also be selected for particular roles e.g. chat or photo sharing. Apps can access standard Web APIs as well as Ubuntu platform APIs like Content Hub, Alarms, and Online accounts, and others, such as Cordova, which provides access to system and device level functionality like camera and accelerometer, see the Ubuntu HTML5 apps developer page.

Ubuntu Policy Generation

HTML5 apps are executed in a security sandbox (AppArmor). Each app needs to provide a security profile using a web form to generate an AppArmor policy file before upload to the Ubuntu Software Center. Users can set their own security policy, and where this conflicts with app policies, this will block the apps from being installed or executed. AppArmor supports the following restrictions:

In addition to AppArmor, sandboxing will be required by other parts of Ubuntu, e.g.

In summary, Ubuntu web apps are subject to security policies set by the developer or the user for access to privileged APIs that extend the Open Web Platform.

Nokia's Cloudberry

"A cloud phone is a mobile device in which all customer-facing functionality is downloaded and cached dynamically from the Web, including all the applications and even the entire top-level user interface (UI) of the device."

Cloudberry is an HTML5-based cloud phone software platform developed by Nokia Research Center. In Cloudberry, all mobile device applications are written as Web applications, including core ones such as the phone dialer, contacts, calendar, messaging, music player, and maps. Offline support relies on standard HTML5 features. Device APIs are based on official W3C Device APIs wherever applicable, and proprietary APIs are used in those areas that standards don't yet cover. The security model is based upon Web Domains, i.e. the standard security model for HTML5. Cloudberry has a permission-based security model that restricts the use of device-specific functionality (such as device APIs) to only those applications from trusted domains.

The above draws upon material from a post by Antero Taivalsaari and Kari Systa in February 2013.

Tizen

Tizen supports web applications as signed web widgets installed from the Tizen app store, with the standard HTML5 APIs plus Tizen specific APIs. The Tizen specific APIs are scoped to the tizen object, e.g.

try {
  var adapter = tizen.nfc.getDefaultAdapter() ;
} catch (err) {
  console.log (err.name +": " + err.message);
}

The Tizen APIs fall into the following categories:

Widgets require authorization to access restricted APIs. The widget manifest file lists the features that the applications wants to be able to access. The manifest is represented in XML and each feature is assigned a URL based name, e.g.

<widget xmlns="http://www.w3.org/ns/widgets">
  <feature name = "http://example.com/api/contact" required = "false"/>
</widget>

Following the W3C Widget Access Request Policy (WARP), the app manifest is also used to declare which network resources (such as XMLHttpRequest, iframe, img, script, etc.) the widget would like to access, as by default, widgets are not allowed to access the network, e.g.

<access origin="http://example.org:8080" subdomains="false"/>

The Tizen web runtime grants access to features according to the policy, which sets which prompt type is to be used to request user approval.

Here is a sample Tizen policy file:

<policy-set id="Tizen-Policy" combine="first-matching-target”>
  <policy id="Tizen-Policy-Trusted" description="Tizen's policy for trusted domain" combine="permit-overrides”>
    <rule effect=”prompt-session">  <!– rules for specific resources -->
      <condition combine="and">
        <condition combine="or">
          <resource-match attr="device-cap" func="equal" match="XMLHttpRequest" />
          <resource-match attr="device-cap" func="equal” match="externalNetworkAccess" />
          <resource-match attr="device-cap" func="equal" match="messaging.send" />
        </condition>
        <environment-match attr="roaming" match="true" />
      </condition>
    </rule>
    <rule effect=”permit" />  <!– all other matches -->
  </policy>
</policy-set>

Tizen defines long lists of URLs for features, privileges, runtime, setting, and system. See also Setting Widget Configuration, which provides links to widget properties including license information, UI preferences, features, privileges, network policies, localization, and other properties.

Tizen runtime and system URLs are enums used by certain Tizen APIs such as System Information, and as such, not relevant to this paper. Settings are proprietary config.xml extensions, some of which are now being standardized in the W3C Manifest spec (e.g. orientation).

QNX automotive web apps

to be added

GM automotive web apps

to be added

GM provides an HTML5 platform for automotive apps

Ford SYNC AppLink

Ford Sync

Ford SYNC is an integrated system for Ford cars with support for telephone calls, music, traffic and navigation, etc. The system is based upon Microsoft's Windows Automotive Embedded platform. SYNC AppLink is an API for apps running on iOS, Android and Blackberry mobile devices to integrate with the car's stereo system, dashboard buttons and display, via a Bluetooth or USB connection.

For hands free operation, users can control apps with spoken commands, along with speech synthesis for feedback. Ford limits access to the AppLink API to certified applications as a safety measure. Apps are available for news and information, music and entertainment, and navigation and location. It is unclear whether Ford supports HTML5 apps with AppLink. News reports indicate that Ford will switch to QNX for its next generation Sync system.

TV web apps (HbbTV)

to be added

W3C and the Open Web Platform

Describe the standard security framework for web apps, CORS and CSP. Then describe the approach taken in recent W3C work (DAP, Geoloc, WebRTC, Automotive). Summarise sysapps thread on including justfication in browser generated consent dialog.

The Open Web Platform (OWP) is defined by the set of standards for web protocols (HTTP, Web Sockets), HTML, CSS, JavaScript, and graphics (e.g. JPEG, PNG, SVG), and covers the core features that are widely interoperable across Web browsers and Web run-times. The security model is based upon the same origin policy, which constrains web page scripts to only accessing the execution context for pages originating from the same origin (a combination of URI scheme, host name and port number). Scripts can only access HTTP or Web Socket connections on the same origin as the page that loaded the script.

There are work arounds, e.g. dynamically adding script elements to the web page as a means for remote procedure invocation. Cross-Origin Resource Sharing (CORS) is based upon additional headers in HTTP responses that indicate which origins may request this URI. The browser/web run-time interprets these headers to relax the same origin policy. The document.domain property provides another solution for documents with a common subdomain, e.g. foo.example.com and bar.example.com.

Cross document messaging is possible with postMessage even when the documents are from different domains. One document calls postMessage to deliver data which the other document can handle by adding a listener for the 'message' event.

Content Security Policies can be set by the web page to disable potentially harmful features. This can help defend against malicious changes to scripts, e.g. those loaded from other sites, or through content injection attacks on the web page itself.

Permissions in the Open Web Platform

The OWP has so far dealt with permissions individually, specification by specification. A key consideration has been to minimize disclosure of personal information without the consent of the user. APIs with minimal impact on privacy are enabled for any origin. For APIs with strong impacts on privacy, the user's action to invoke a feature may be taken as implicitly granting permission for use of an API, or the browser may ask the user for explicit consent. Browsers vary in how they support re-use of a decision in further sessions, and how users can revoke such decisions.

The geolocation API is exposed by the navigator.geolocation object. Scripts can access the device location by calling the getCurrentPosition() method, passing a function to be called with the current position. The browser then asks the user for permission for the app to access the location, before invoking that function. The W3C specification requires apps to disclose the purpose for the collection, how long the data is retained, how the data is secured, how the data is shared if it is shared, how users may access, update and delete the data, and any other choices that users have with respect to the data. Browsers typically offer to remember the user's decision for future sessions, along with a means to revoke permissions.

Other examples include media capture via HTML forms, media streams and image capture via a camera. An extension to HTML forms allows the browser to prompt the user to select a media file from the local file system and upload it as part of the form submission process. The action taken by the user to select the media file is taken as implicit consent for uploading the file. W3C is also working on APIs for taking a photo, or streaming audio or video from the device's microphone and camera. These are handled in a similar way to the geolocation API in that the request by a script to use these features results in the browser asking the user for permission.

The Full Screen API allows apps to present in fullscreen mode. After entering fullscreen mode, the user is made aware that the presentation is full screen, and given a chance to revoke the permission. The specification states:

User agents should ensure, e.g. by means of an overlay, that the end user is aware something is displayed fullscreen. User agents should provide a means of exiting fullscreen that always works and advertise this to the user. This is to prevent a site from spoofing the end user by recreating the user agent or even operating system environment when fullscreen.

See explanation by Chris Pearce.

The W3C Device APIs Working Group previously worked on a proposed means for standardizing names for permissions for given APIs (Permissions for Device API Access), however, this hasn't been updated since October 2010.

What has been done right or wrong?

Marcos Caceres has asked for a meaningful discussion of what has been done right or wrong on the Web. He cites the way the Fullscreen api's permission works, and how geolocation API works the same across the Web and iOS. The same with WebRTC and other permissions dialogs we encounter in browsers and how users manage those (e.g., pointer lock).

Boris Smus on installable hosted web apps

Boris Smus' blog on Installable web apps: extend the sandbox argues for installable server hosted apps with a clear "installation" step where apps can ask for additional permissions. The step also associates the app with an icon, e.g. on the user's home screen, that can be used to launch the app, review and revoke its permissions. He further suggests an API for installing a web page as an app. This would only work if the current execution thread is the result of a user action, e.g. clicking/tapping on a button.

var button = document.querySelector('button#install');
button.addEventListener('click', window.app.requestInstall);

He also proposes an API for apps to request permissions at install time:

window.app.requestInstall({permissions: ['audioCapture']});

Robert O'Callahan on Permissions For Web Applications

Robert O'Callahan's blog on Permissions For Web Applications argues against introducing Android-like bundling of permissions with "up front" permission grants. Instead, he encourages:

Discussions in the System Applications Working Group

A pertinent thread of discussion on the SysApps WG archives: Permissions UI & Necessary API

Doug initiated the discussion by referencing Brenden Mulligan's article on The Right Way To Ask Users For iOS Permissions. Anssi Kostiainen responds with:

I extracted the following recommendations that might work for the Web too. I probably missed some, so feel free to expand:

  1. Allow the developer to associate a custom text string with the permission request.

I observe some web-based platforms (see Firefox OS App manifest) already provide a similar mechanism, however, I’m not sure if e.g. Firefox OS uses the information in the context of use as recommended (or just for upfront grants)?

Marcos Caceres comments: AFAIK, in FxOS they are only used by marketplace reviewers to understand why a feature is being requested by the developer - and then so the store reviewer can make an assessment about the truthiness of the claim during code review. In other words, I think the descriptions are just things that mostly serve store owners. These things are not displayed to users - the APIs access is then granted by Mozilla based on a successful review.

Doug Reeder comments: Firefox OS requires an explanation string in the app manifest, for example:

"permissions": {
  "geolocation": {
    "description": "Needed for geotagging (where you wrote a memo)"
  }
}

IIRC, this is supposed to be displayed to the user during the install process. It is never displayed while an app is running. I'm proposing an explanation string per request, something like

navigator.geolocation.getCurrentPosition(
  successFunc, 
  errorFunc, 
  {timeout: 300000, description: "geotag memos"}
);

We (Mozilla) never prompt users to grant permissions at install time on Firefox OS. We only prompt at runtime for privacy related ones that users can understand: geolocation, sdcard access, contacts for instance.

We currently don't use the explanation string anywhere - one place we could is in the settings panel that let the user revoke or grant permissions for an app after installation.

Implications of this feature on the Web are a bit different than on native ecosystems in which the content is usually curated. For example, an evil application could lie to get you grant access to some capabilities it wants to use for other — potentially malicious or otherwise harmful — purposes than it told you to.

Doug Reeder: Colin Walters, in a comment on Robert O'Callahan's blog post (Permissions For Web Applications) points out "you have to know applications can pass messages to one another, so the permission set is in reality the union of all of the ones from any apps installed from a developer (or cooperating developers)"

Once info is passed to an app, there's no technical control over what it does with that info, only social control (reviews saying "this app lies about what it does!" ... or a consumer protection agency investigation). In the current model, the app makes no promises (other than app store boilerplate). An explanation per request allows an app to be be clear. If many apps are clear, the hope is that users will pay attention, be wary of apps that are vague, and avoid those that lie.

An explanation per request does not imply a security dialog per request. I envision the system showing one security dialog per description string. So, the user would grant permission to 'allow searchablenotes.hominidsoftware.com to use your computer's location to "geotag memos"'. Most apps & websites would use only one description, but some would use two or more different ones, allowing a separate permission for 'allow example.com to use your computer's location to "connect you to an appropriate call center"'.

Some further notes: infobars that have proliferated recently are part of the chrome, and the convention has been the web content is unable to modify this part of the UI. That said, there’s precedence in legacy alert(), prompt(), and confirm() which allow the developer to customise the message shown in these system UI widgets. These dialogs are modal, and are shown overlaid on top of the web content so they’re different to infobars in that regard.

Furthermore, in some browsers, e.g. on iOS Safari, modal dialogs are used similarly to infobars in other browsers to ask for permissions. Actually with confirm() you could pretty closely emulate the iOS Safari's “http://example.org Would Like to Use Your Foobar [ Don’t Allow ] / [ OK ]” dialog only if the button labels were developer-configurable (or if there would be a confirmPermission() with labels that match the platform conventions).

  1. Prefer user-triggered dialogs.

This reminds me of the good old <input type=file>. We’ve extended the model with some extra capabilities such as HTML Media Capture in the past.

Doug Reeder: This is great where you can do it, such as a standard map app, which can have a button "Show my location". I'm running into a situation where the user gets a system permission dialog, and it's not clear why to him or her. This is where it would be helpful for developers to pass a context string as part of an API request.

  1. Show an educational pre-permission overlay.

This does not require any changes to the platform APIs. The developer can build such a dialog with HTML and friends. I’d guess there must be some examples of this type of a pre-permission overlay being used on the Web too, anyone?

Coming back to the key finding of the article. It appears the approaches outlined make sense for iOS given the significantly increased acceptance rates, so I think it is a worthwhile exercise to see whether some of this could be used for the Web too.

I asked if it makes sense to allow the developer to associate a custom text string with the permission request. However, for uncurated applications there is a risk of apps misleading users into granting permission by providing incorrect descriptions of the purposes involved.

Dave Raggett notes:

Moving the explanation to the app's content (as suggested by Brenden Mulligan for iOS apps) won't stop apps from lying, so I don't find that to be a compelling solution. It is up to reviewers and curators of app stores to ascertain when an app is misleading users or otherwise falls foul of the app store's requirements.

A more compelling argument is that app developers will want control of how the justification for using a given capability is presented to the user. This further suggests the requirement for apps to determine which of the following apply:

  1. user has yet to be asked for a decision
  2. user has previously granted permission
  3. user has explicitly denied permission

Without this information, it would be hard for developers to provide the appropriate user experience. Does FirefoxOS offer developers this info?

p.s. if the user previously granted the permission just this time, the situation is essentially (a) in that attempting to access the capability now will result in the browser asking the user for permission.

Anssi responds:

An API that fulfils the requirements a-c above was experimented with in the Device APIs WG couple of years ago, known as Feature Permissions [1]. The spec was put on hold as "the only immediately obvious relevant use case [was] for Web Notifications”. Eventually, the Web Notifications API settled on a slightly different model [2] in which the UA’s default permission setting (allow or deny) is not exposed.

Some known issues with the model in [1] include privacy concerns over exposing user’s preference to the web content (from the privacy perspective, the web content should not need to know whether you have explicitly declined or just ignored the permission prompt) and other potential for misuse (e.g. block the user’s flow until she grants permission). That said, this thread suggests there may be also legitimate use cases for such a feature.

Doug - the API shape aside, do you think an API such as [1] would be part of the solution to your problem?

Doug responds: Yes. If my app knew the user had not granted permission (despite setting the app preference), it could open a dialog setting out choices to the user.

I’m wondering what are the lessons learned from the Web Notifications API that ships in Firefox, Chrome, Safari, and some others browsers. I recall reading web developers’ feedback a while ago but I’m unable to find a good pointer now.

Marcos responds to Dave:

In respect to a-c, it might be interesting to map these out for various APIs. For example, Geolocation:

  1. user has yet to be asked for a decision

The developer can record this in localStorage.

localStorage.geoEnabled = "haven't asked yet".
  1. user has previously granted permission
navigator.geolocation.getCurrentPosition(function(){
if(!localStorage.geoEnabled !== "yep"){
  localStorage.geoEnabled = "yep" 
});
  1. user has explicitly denied permission
navigator.geolocation.watchPosition(function(e){}, 
function(e){ 
  \\1 === PERMISSION_DENIED
  if(e.code === "1") {localStorage.geoEnabled = "denied"}; 
});

So, with Geolocation you might have enough information.

Doug: Unfortunately for my situation, if the user allows geolocation once, it says nothing about whether it will be allowed the next time. In Chrome, allowing geolocation is persistent, but in Firefox and Firefox OS it's not persistent unless the user clicks a second control.

Marcos responds: Ok, good to know. Do you think just adding

`geolocation.permission === 'enabled'`

or similar would address the use case? Also, maybe good if we could move this back to the list?

Doug responds: Yes, that would solve my problem.

Theoretically, permission could change between checking and calling getCurrentLocation, but if I'm calling them in the same tick, ocurrence should be vastly rarer than geolocation failures.

Anssi responds: I think that is not really a concern. I don’t see anything getting in between the following considering the single-threaded nature of JavaScript:

if (geo.permission === 'enabled') {
  geo.getCurrentPosition();
}

Or perhaps you have a more complete real life example in mind?

Anssi comments: Whether implementations persist, or allow the user to persist, permission settings vary by browser and by feature. And this will likely remain so.

To further complicate things, sometimes subtle hints of trustworthiness of the site are used to decide whether to allow persisting a permission or not. For example, Chrome allows persisting gUM permission only if the content is served over HTTPS, while for Geolocation the permission is persisted regardless of the protocol.

iOS is probably the strictest in this regard, and never persists permissions for regular web content (only for things bookmarked to homescreen, which is another hint of trustworthiness).

An opportunity to dig into this a bit more.

On May 6th, Anssi wrote:

A proposal by Nikhil elsewhere on how Promise.all() might be used:

Promise.all([
  Notification.requestPermission(), // Some shimmed form that returns a Promise
  navigator.push.hasPermission()
]).then(function(perms) {
  if (perms[0] == 'granted') { // notifications ok; }
  if (perms[1] == 'granted') { navigator.push.register(); }
});

Promise.all() returns a promise that resolves when all of the promises passed to it have resolved. I’m wondering if something like the following has obvious issues:

[NoInterfaceObject]
interface Permissions {
  Promise requestPermission ();
  Promise hasPermission ();
  /* ... */
};

Notification implements Permissions;
PushManager implements Permissions;
/* ... */
Geolocation implements Permissions;

I think not all APIs can be retrofitted with this, but many could.

Marcos responds: Seems unnecessary to have this return a promise, IMO. Just make it an attribute.

Anssi then says: Yeah, the reason for that was to make it work with Promise.all as suggested by Nikhil. However, it seems whatever passed to Promise.all is converted to a promise by means of Promise.cast, so we could make it an attribute as you suggest.

Details aside, I think the main question is does such an interface make sense in the first place?

Marcos then says: Like I said previously, I think the only way to know is to work through some example cases with real code. Doing thought experiments can only get us so far. We would also need to find a few more example cases in the wild and then we can take those to the appropriate WGs.

Anssi: I actually already asked Nikhil in the GH issue from where this idea originated from whether he has been doing further exploration in code. If someone comes up with other experiments, please let us know.

Marcos: I'd be inclined to prototype having an attribute in Gecko. The `requestPermission()` method seems redundant to me because interfaces already implicitly or explicitly have these methods (yes, they are inconsistent across the various APIs - but in the case of Geolocation the permission request is explicitly bound to an action - watchPosition(), getCurrentPosition(), and I think that is a "good thing"[tm]).

Some other APIs - specially new ones - would likely benefit from `requestPermission()` tho, but I'm still not sure if existing APIs would.

Anssi: It appears some implementers would prefer to have hasPermission() return a Promise, at least for some APIs, see Peter Beverloo's email of 29 Apr 2014:

The Notification specification defines a static Notification.permission accessor, which returns one of {granted, denied, default}. This requires the browser to synchronously determine whether the page has permission to show notifications, whereas checking this may be an asynchronous operation. This is the case in Chrome.

Before this becomes a paradigm, could we consider having a static hasPermission() instead, returning a Promise?

I'll add a UseCounter to Blink for tracking Notification.permission usage, but it will take some time before conclusive usage data comes in.

Chrome is collecting stats on the usage of the sync .permission, so we should get some data on how widespread the usage is.

Summary and Future Work

As the Open Web Platform expands, new capabilities are likely to require new ways of managing permissions. Some platforms, e.g. Android, ask users upfront for permission when an app is installed or first run, whereas others like iOS ask users for permission when the application is attempting to use a given capability. Rather than asking the user for permission in advance, another approach is to invite the user to continue or to cancel an action after it has occurred, i.e. asking for forgiveness rather than permission -- this is the approach taken in the Fullscreen API, see the explanation by Chris Pearce. In some cases, the user's actions can be taken as implicitly granting permission, for a detailed analysis of this approach, see Roesner et al. A further approach is for users to delegate decisions on permissions to a trusted 3rd party (if it's okay by them, it's okay by me). What is needed to arrive at a consensus for a cross vendor solution?

Some questions for further study

Group 1: User Consent

Group 2: Delegated Trust

Group 3: Permission Management

Group 4: Miscellaneous Topics

An open face to face meeting is now planned for early September 2014 to bring a variety of stakeholders together to discuss trust and permissions, and to try to determine a roadmap towards a broad consensus. One possibility would be to set up a W3C Community Group to continue the discussion with a view to feeding into the standards track with a chartered work item in a W3C Working Group. If there is a strong consensus on the approach to be taken, then we could skip the Community Group step and go straight to the Working Group, or perhaps to set up a Cross Working Group Task Force. Note that we need the agreement of sufficient browser vendors to ensure that the work is widely deployed.

At the end of the meeting we would like to have a clear idea for next steps:

Some further reading:

Note: this is just a sample, and not intended to be authoratative.

Acknowledgements

This work was done with support from the European Commission under grant agreement no: 611327 (HTML5 for Apps: Closing the Gaps). Thanks also to Anssi Kostiainen, Marcos Caceres and Dominique Hazael-Massieux for their comments on an earlier draft of this document.


Appendices

This section contains material that has been moved from earlier sections. This includes cross platform frameworks where the permissions model is inherited from each of the target platforms. It also includes libraries for web applications where the permissions model is inherited from the web platform.

Cross platform frameworks allow developers to create app using cross platform technologies, and either compile them into native apps that can run on a variety of operating systems, or provide a native run-time that can execute cross platform code. This can cut the time to deliver to multiple target platforms. Cross platform frameworks inherit the trust/permissions model from the native platform.

According to Research Guidance (see below), the most popular tools are PhoneGap and jQuery Mobile, followed by Adobe Air, Qt Creator, Unity 3D, Titanium, Marmalade, Sencha Touch, Xamarin, Unity Mobile, and Corona SDK. Cross platforms tools are mainly used to develop apps for games, followed by utilities, business, education and entertainment in decreasing order of popularity.

For more in-depth reviews of cross platform frameworks, see:

Telerik

Telerik provides UI controls for HTML5 and .NET, along with the Telerik platform for mobile app development.

Appcelerator Titanium

Appcelerator is a Californian technology company that provides the Appcelerator platform for cross platform mobile development with an open source Titanium SDK and an enterprise software suite covering development, testing, deployment and analytics. Titanium supports iOS, Android, Windows Phone, Blackberry OS and Tizen, enabling developers to create rich native mobile apps from a single JavaScript-based SDK. Titanum includes Alloy which features a model-view-controller architecture for rapid development of UI components based upon XML markup and style sheets, as a declarative alternative to creating UI components directly from JavaScript.

Xamarin/Mono

Xamarin enables developers to create apps for iOS, Android, and Windows Phone using the C# programming language, and derives from earlier work on the Mono open source project to support Microsoft's .NET framework on the Linux operating system. Xamarin includes binding for the indigenous SDKs for each of the platforms it supports. Where needed, you can directly invoke Objective-C, Java C and C++ libraries. Xamarin claim that applications can share up to 90% of their code across platforms when using the Xamarin Mobile library. Further details are given in the Xamarin developer guides.

Qt

Qt features a cross platform integrated development environment (Qt Creator) and run-time with support for C++, and the JavaScript like QML user interface modelling language. Qt targets desktop, mobile and embedded devices, e.g. Android, BlackBerry, iOS, Linux/X11, Mac OS X, Windows and Windows CE. Qt Cloud Services provides support for application backends. More details are available on the Qt Project website.

Unity 3D/Mobile

Unity is a cross platform game engine for web plugins, desktop, consoles and mobile devices. For scripting you can use UnityScript (similar to JavaScript) or Boo (similar to Python). Supported platforms include BlackBerry, Windows, Windows Phone, Mac OS X, iOS, Android, Adobe Flash, PlayStation, Xbox, and Wii.

Corona SDK

Corona SDK enables developers to create 2D games, business, eBooks and educational apps for mobile devices including iPhone, iPad, and Android. The SDK includes a wide range of third party tools and services. Scripting is based upon the Lua programing language.

Marmalade

The Marmalade SDK targets desktop, tablets, smart phones and TVs. Developers can work with C++, Lua, HTML or Objective-C. Apps are compiled to native binaries for ARM or x86 and combined with a platform specific loader. Target platforms include Windows, Mac OSX, LG Smart TV, ROKU, iOS, Android, BlackBerry, Windows Phone and Tizen.

GINGEE

GINGEE provides for cross platform development on mobile devices with a focus on games. Its Liquid UI adapts the UI to the device allowing apps to have the same look across all devices, based upon a library of UI components. The SDK claimes to offer near-native run-time speeds, GINGEE offers easy integration with social media, analytics and monetization. Targetted platforms include iOS, Android, Amazon Kindle Fire, Barnes & Noble NOOK, BlackBerry, Facebook, Smart TV and Windows PC.

Codename One

Codename One is an open source project that enables developers to use Java to create apps that have the native look and feel for iOS, Android, Windows Phone and Blackberry.

DragonRad

The DragonRAD Designer is developer tool for creating enterprise apps for smart phones and tablets BlackBerry, Android, iOS and Windows Mobile. It supports the Lua scripting language. The DragonRAD Host provides access to enterprise databases via a Windows or Linux server. The DragonRAD Client provides a native run-time that interprets and runs your application.

RunRev LiveCode

LiveCode is inspired by Apple Hypercard with a drag and drop developer tool and a scripting language resembling Hypercard's HyperTalk, see the beginner's guide. The component model includes stacks, cards and objects, and is event driven. LiveCode stacks can be build for Windows, Mac OS, Linux, iOS and Android and inherit the native platform's look and feel.

IBM Worklight

IBM Worklight provides a framework for developing, running and managing HTML5, hybrid and native apps on smart phones and tablets. You develop with HTML5 and JavaScript and then customize the resources for each target platform. The framework can generate web and native code specific to each target environment, For hybrid apps, Worklight relies on PhoneGap for access to device APIs. You can also take advantge of third party tools, libraries and frameworks including jQuery Mobile, Sencha Touch and DojoMobile.

MoSynch

MoSynch is an open source SDK that allows you to develop mobile apps using C/C++ and HTML5/CSS/JavaScript with a native look and feel for each target platform, including Android, iOS, Windows Mobile, Windows Phone, Symbian, Java Mobile and the Moblin platform.

RhoMobile

This is an integrated framework based upon the Ruby programming language.

Motorola Solutions' Rhomobile is an open source framework for developing native apps for smartphones including iOS, Android, BlackBerry, Windows Mobile and Windows Phone. Rhohub is a service for developing apps online. Developers can use JavaScript or Ruby scripting languages with the Rhodes API for access to device level capabilities including the camera and device location. Rhomobile is similar to PHP in allowing you to create user interfaces using HTML5 markup with embeded code that can be used to tailor the user experience to the target platform.

Whoop

The Whoop Creative Studio is a drag and drop design tool for easy development of mobile apps for iOS, Android, BlackBerry and Windows Phone. It seems to have been discontinued.

WAC 2.0

The former Wholesale Applications Community (WAC), together with mobile network operators (Carriers), developed a suite of specifications for mobile web applications. These were subsequently transferred to the GSMA in July 2012 when WAC was dissolved. WAC built upon the W3C specifications for HTML5 and the BONDI project of the preceding Open Mobile Terminal Platform Ltd.

WAC had the aim of enabling developers to create packaged web apps for use on home screens and app stores. The packaging format uses W3C Widgets, Access control policies are based upon rules expressed in the XML-based XACML rule language. According to wikipedia, Policies can be set on a widget provider level (for signed widgets) on a widget level or on an API call-by-call level for web pages.

I've asked Nick Allott for further information on the approach taken for user consent, and any feedback gathered from developers and end-users on its effectiveness.

jQuery

jQuery is a very popular cross brower JavaScript library that simplifies application development, for instance, network access, manipulating the DOM, and applying animations and effects. There is a growing set of plugins for extending jQuery. jQuery Mobile is a related cross browser JavaScript library built on top of jQuery Core, and aimed at making it easier to create responsive websites for smart phones, tablets and desktop. It supports a range of UI controls and themes. Both libraries stick within the existing browser security model.

Sencha Touch

This is a UI framework for mobile devices using a JavaScript library, and enables developers to develop mobile web applications that look and feel like native applications. Sencha Touch targets, iOS, Android, Windows, Tizen and Blackberry based devices.

Dojo Mobile

Netbiscuits Tactile

Netbiscuits Tactile is a cloud software service for cross-platform development, publication and monetization of mobile sites and apps based upon HTML5. It looks like Netbiscuits has now discontinued this service.

WidgetPad

This is an open source platform for developing HTML5 apps tailored for devices using Android, WebOS and iOS. It seems to have been discontinued.

webinos

An EU research project that focused on extending the Open Web Platform with a rich suite of APIs for accessing resources within devices that form a user's Personal Zone, e.g. a desktop PC, a tablet, a smart phone, a smart TV or a web-enabled car. The devices enrolled in a Personal Zone are subject to strong security with mutual authentication, and access control policies based upon XACML.