Protected Interactive Elements to defend against UI Redressing

This document proposes a straw man strategy for web applications to opt-in to a new type of UI control that can provide additional protections against the class of attacks known as Clickjacking or UI Redressing. This control must be implemented by a web user-agent to be effective, but sites may opt-in such that browsers unaware of this type of UI control still function (although users are not protected).

The basic idea is that browsers provide a new type of control that requires some level of user interaction, instead of just a click. This interaction might be a swipe, a scrub, or holding the mouse or touch while a timer counts down. The exact nature of the interaction is at the discretion of the user-agent and may not be specified by the application, as different interactions may be desirable on mouse vs. touch devices, to maintain platform consistency, or as part of assistive technologies. The interaction must take user intent and action to complete successfully, and should have a clear way for the user to abort the action without completing it.

With the interactive control, the application specifies a specific set of protected context markup. While the user is interacting with the control, the context markup MUST be displayed, and may not be obstructed or moved. A user-agent SHOULD dim or obscure display elements other than the protected context.

The protected context markup may be an already visible part of the page layout, or it may be displayed only when the control is interacted with. The protected context MUST contain the protected control.

It is not important that the control and context be unspoofable – just as it is always possible to create a fake “submit” button today. It only matters that a genuine protected control always give the genuine experience. A protected control should give expected behavior even with scripting disabled in the browser, and an iframe sandbox should not be able to disable such controls. The protected context is displayed on its own topmost layout area that is not constrained by the original dimensions of the browsing context it is contained within. (e.g. a 200x200 protected context will display in full, even if triggered by clicking on a protected control in a 1x1 IFRAME)

Implementation Options

The control might be invoked by an application in several ways:

As its own, new element type:

Or as decoration on a legacy element type:

An application that wishes backwards compatibility might use the latter, an application that only wants to receive protected clicks could use the former. Feature detection could be used in the latter case to add distinguishing parameters with the submission, if the receiving application wanted to treat protected and unprotected clicks differently, e.g. for fraud analytics.

Examples

Example 1

Attack site:

Obscured target site:

When user clicks on the slider:

Example 2

Use on a legacy image-submit: Innocuous looking “Like” button.

Misleading context replaced by trusted context on click, revealing what user is really about to like.

Issues

How to handle when most of protected context would be off screen? Could have an alternate system where the slider doesn’t look like a slider, and any click on it puts the protected system’s context centered in the browsing context.

For example, if the protected context were clipped off-screen as such:

Current proposal would give the following with no context:

Re-centering would give:

Basically, this re-creates the common “lightbox” UI experience as a protected browser built-in. The re-centering behavior could be optional, and only invoked if there was clipping of the protected context when displayed at its original cursor position. This would allow for more seamless experiences for well-designed pages, but still prevent abuse.

Advantages over other approaches

ClearClick in NoScript uses a screenshot comparison approach, looking at a diff between the rendered view seen by the OS and what an unobstructed rendering would give. This approach can be foiled by quickly moving or changing contexts just as a user interacts, so the screenshots have to be periodical or associated with some timeout period. This all leads to implementation complexity and background processing costs. It is also not clear how to deal with clipping or indication of what context is necessary to protect with this approach, unless hints are provided.

Finally, screen-shot approaches cannot protect against “fake cursor” attacks, where the user is shown a fake cursor offset from the real element they are clicking on, or simply misleading context, such as the “like” button at the Clown Store in example 2. Compared to the screen shot-approach, the protected UI approach is simple to implement in the browser and has few ambiguities. It defends against fake cursor attacks, and it allows for ordinary page layouts to provide pop-up context for protected actions as they are initiated, without lots of screen clutter.

Disadvantages over other approaches

Screen shot approaches do have better legacy compatibility than the protected UI approach. An HTTP header could be used to convey screen shot protection policies, but opting-in to the protected UI approach will always require markup changes.

Limitations

The amount of protected context that can be reasonably provided will always be relatively small. This shouldn’t be a serious limitation in current practice, but it limits the practical set of use cases. It would be difficult to employ this approach to protect an entire control-panel style interface. (e.g. the Flash interface for enabling the webcam) It is much more suited to transactional-style, click-oriented use cases such as pay, like, +1, follow, share, etc.

Authors

Brad Hill (bhill@paypal-inc.com) with thanks to Scott Stender of iSEC Partners