Enabling Read Access for Web Resources

W3C Working Draft 1 October 2007

This Version:: http://www.w3.org/TR/2007/WD-access-control-20071001/
Latest Version:: http://www.w3.org/TR/access-control/
Previous Versions:: http://www.w3.org/TR/2007/WD-access-control-20070618/; http://www.w3.org/TR/2007/WD-access-control-20070215/; http://www.w3.org/TR/2006/WD-access-control-20060517/; http://www.w3.org/TR/2005/NOTE-access-control-20050613/
Editor:: Anne van Kesteren (Opera Software ASA) <annevk@opera.com>

Abstract

This document defines a mechanism to selectively provide client side cross-site access to a Web resource. Using either a HTTP header or an XML processing instruction (or both) resources can indicate they allow read access from specified hosts (optionally using patterns). When a pattern is used, one can also exclude certain hosts. For instance, allow read access from example.org and its subdomains with the exception of public.example.org.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the 1 October 2007 Working Draft of the "Enabling Read Access for Web Resources" document. It is expected that this document will progress along the W3C Recommendation track. This document is produced by the Web Application Formats (WAF) Working Group. The WAF Working Group is part of the Rich Web Clients Activity in the W3C Interaction Domain.

Please send comments to the WAF Working Group's public mailing list public-appformats@w3.org with [access-control] at the start of the subject line. Archives of this list are available. See also W3C mailing list and archive usage guidelines.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

1. Introduction
- 1.1. Conformance Criteria
  - 1.1.1. Terminology
- 1.2. Security Considerations
2. Access Control Mechanism
- 2.1. Syntax
- 2.2. Processing Model
References
Acknowledgments

1. Introduction

The Web has a rich set of resources that can be combined to build content, applications and feature-rich Web sites. A contributor to this richness is Web sites including references (e.g. a link or an image inclusion) to resources residing in other domains.

To prevent information leakage, user agents, such as Web browsers, implement a same origin policy that allows a document (e.g. some JavaScript) to read, process, or otherwise interrogate the contents of another resource if, and only if, the other resource resides in the same domain. This policy prevents domain A, acting on behalf of the user, to get information from domain B. For instance, this prevents a malicious site from reading information from the user's intranet using a technology such as XMLHttpRequest.

This restriction is very strict and generally appropriate. However, there are scenarios where an application would like to get data from another resource on the Web without these restrictions. For this to work the browser's same origin policy has to be extended or eased. For example, a car reservation Web site may want to request trip itinerary data from an affiliated airline reservation website to streamline making a car reservation. The easing of read access restrictions is particularly important to Web browsers that implement the XMLHttpRequest object and VoiceXML 2.1 browsers using the data element.

To facilitate clear and controlled read access to resources, this specification defines a read access control mechanism that enables a Web resource to permit access to its content from external domains when such access would otherwise be prohibited by a same origin policy. The defined mechanism only works in conjunction with other specifications that are using the read access control mechanism to enable read access.

1.1. Conformance Criteria

This specification is applicable to both user agents and other specifications. This specification will only apply in certain contexts and specifications defining such contexts will define when and how this specification applies.

As well as sections marked as non-normative, all diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

In this specification, The words must, must not, should, should not and may are to be interpreted as described in RFC 2119. [RFC2119]

A conformant specification is one that implements all the requirements (the must and must not statements) listed in this specification that are applicable to specifications. For instance, a specification using the access control mechanism needs to define what the requesting URI is.

A conformant user agent is one that implements all the requirements listed in this specification that are applicable to user agents, while also being consistent with the requirements listed in the specifications that use the access control read policy.

User agents may optimize any algorithm given in this specification, so long as the end result is indistinguishable from the result that would be obtained by the specification's algorithms. (The algorithms in this specification are generally written with more concern for clarity than efficiency.)

1.1.1. Terminology

Terminology is generally defined throughout the specification. However, the few definitions that did not really fit anywhere else are defined here instead.

The term ToASCII algorithm means that the ToASCII algorithm as described in RFC 3490 is applied with both the AllowUnassigned and UseSTD3ASCIIRules flags set. [RFC3490]

There is a case-insensitive match of strings s1 and s2 if after uppercasing both strings (by mapping a-z to A-Z) they are identical.

U+0009, U+000A, U+000D and U+0020 are space characters.

A space-separated list is a string of which the items are separated by one or more space characters (in any order). The string may also be prefixed or suffixed with zero or more of those characters.

To obtain the values from a space-separated list user agents must replace any sequence space characters (in any order) with a single U+0020 character, dropping any leading or trailing U+0020 character, and then chopping the resulting string at each occurrence of a U+0020 character, dropping that character in the process.

An XML MIME type is text/xml, application/xml or any MIME type ending in +xml.

1.2. Security Considerations

The mechanism defined in this specification allows for extension of the same origin policy in contexts where the same origin policy applies.

When making requests to resources which before implementing this specification could not accessed user agents should ensure to:

Not reveal whether the requested exists, until access has been granted;
Not expose any trusted data, such as cookies, HTTP header data, inappropriately;
Not allow the author to set HTTP header data;
Not allow requests using another HTTP method than GET, unless extra steps are taken as outlined for access requests.

A user agent running inside a trusted corporate network and executing untrusted content should enforce a sandboxing policy by denying access. However, it may be appropriate to relax this policy when the user agent is executing only trusted applications that requires access to arbitrary resources on the local network. User agent vendors that allow this sandboxing policy to be configured are encouraged to provide guidance on the appropriate settings. It is critical that network administrators understand the security issues pertinent to their environment and configure their systems appropriately. In tandem, developers and Web server administrators are to be aware of the dangers of trusting a user agent that can be configured to disable sandboxing.

User agents which implement this specification must also take care to properly normalize Unicode and to properly interpret IDNs to prevent URI spoofing attacks as outlined in the specification. [RFC3490]

Application authors should be aware that content retrieved from another site is not itself trustable. Authors should take care to protect against exposing themselves to cross-site scripting attacks by rendering or executing the retrieved content directly without validation.

2. Access Control Mechanism

The access control mechanism is the process of the origin resource doing an access request, followed by an access check on the retrieved resource. As explained in the processing model section this gets slightly more complicated for non-GET requests and in the face of redirects.

The origin resource is the resource that performs the request using a particular mechanism, such as XMLHttpRequest. A specification defining such a mechanism must define for which requests and how the access control mechanism applies. A request where the access control mechanism applies is called an access request. This request gives a retrieved resource upon which an access check will be performed. Based on the outcome of that check the origin resource will or will not be provided with the contents of the retrieved resource.

The access control mechanism is described in the processing model section. That section details how user agents are to handle access requests and access checks.

The syntax section describes how to author a retrieved resource so that it can be accessed (or prevented from being accessed) from the origin resource. It also defines various syntactical constraints that user agents will verify as part of the processing model.

2.1. Syntax

Access to a retrieved resource can be given (or denied) using either an HTTP header or an XML processing instruction. Both share the access item syntax which is used for URI matching.

2.1.1. Access Item

An access item is either a single * character (always matches) or a domain that can contain a wildcard at the start and can optionally have a scheme and port specified. An access item must match the following EBNF:

access-item    ::= (scheme "://")? domain-pattern (":" port)? | "*"
domain-pattern ::= domain | "*." domain

scheme and port are used as defined in RFC 3986. domain is an internationalized domain name as defined in RFC 3490. [RFC3986] [RFC3490]

In addition to matching the above EBNF the ToASCII algorithm must apply successfully (without errors) to each label component of the subdomain (if any) from the access item.

When the access item is used as part of the Access-Control HTTP header authors must specify the result of applying the ToASCII algorithm to the internationalized domain name as HTTP does not support Unicode.

If the scheme or port is omitted they will match any scheme or port.

When * is used as part of domain-pattern it matches any number of internationalized labels before domain. If just domain is used it will match itself and any number of internationalized labels before domain.

Several examples of conforming access items:

*
*.example.org
https://*.example.org
https://example.org:8443

The following access items would make the user agent deny access to the resource:

https://*.*:80
*://example.org
http://example.org/
http://example.org/example
http://example.org:
http://example.org:*

The following access items are not identical:

http://example.org
http://example.org:80

The following access items would match http://foo.bar.example.org:80:

org
*.org
example.org
http://example.org
http://*.bar.example.org:80

2.1.2. `Access-Control` HTTP Header

Retrieved resources can have one or more Access-Control headers defined. These headers must match the following EBNF:

Access-Control ::= "Access-Control" ":" LWS? ruleset
ruleset        ::= LWS? rule LWS? ("," LWS? rule LWS?)*
rule           ::= rule-type (LWS pattern)+ (LWS "exclude" (LWS pattern)+)?
rule-type      ::= "allow" | "deny"
pattern        ::= "<" access item ">"

As stated by RFC 2616, multiple Access-Control headers may be combined.

The syntax of access items when used in the Access-Control HTTP header is restricted to internationalized domain names to which the ToASCII algorithm has been applied as HTTP does not support Unicode.

LWS is used as defined by RFC 2616. [RFC2616]

In case resources on a domain are not all in the control of a single person "deny" rules can be used by authors to deny read access from external resources to the entire domain. Read access from other domains is by default disallowed but individual resources on the domain could have <?access-control?> processing instructions specified which can allow access from other domains. Although files can contain processing instructions, HTTP headers can be set across an entire server making them far more effective. The "exclude" clause can be used to list exclusions to these "deny" rules.

"allow" rules can be used to allow read access from particular domains as long as those domains don't match any of the patterns listed in "exclude".

Access-Control: allow <*.example.org> exclude <*.public.example.org>
Access-Control: allow <webmaster.public.example.org>

Means that every subdomain of example.org can access the resource including webmaster.public.example.org, but with the exclusion of all other subdomains of public.example.org.

Access-Control: allow <example.org> <*.example.org>

Means that example.org and all its subdomains can access the resource.

2.1.3. `<?access-control?>` Processing Instruction

XML resources may include an <?access-control?> processing instruction within the XML Prolog to indicate, if the access control read policy applies, from which domains their content can be accessed. [XML]

The processing instruction takes three pseudo-attributes which each take a space-separated list of access items. These pseudo-attributes are allow, deny and exclude. Either the allow or deny pseudo-attribute must be specified. allow and deny must not be specified at the same time. If an attribute is specified it must at least contain an access item.

An <?access-control?> processing instruction that is part of the XML Prolog must be parsed using the same syntax rules as described in the XML Stylesheet PI specification. <?access-control?> processing instructions outside the XML Prolog are ignored. [XMLSSPI]

The above means that the following examples would be non-conforming and would make the user agent deny access to the resource:

<?access-control?>
<?access-control x?>
<?access-control x=""?>
<?access-control allow=""?>
<?access-control allow="http://example.org" x=""?>
<?access-control allow="allow.example.org" deny="deny.example.org"?>

2.2. Processing Model

2.2.1. Access Request

When a user agent performs a request to which the access control mechanism applies it performs an access request. The exact details of an access request such as how to deal with network errors, redirects, et cetera are out of scope this specification and must be defined by specifications using the access control mechanism. This specification does make some requirements on such access requests though.

Namely, requests using a non-GET HTTP method must be preceded by a request using the GET HTTP method. This request includes an If-Method-Allowed header specifying the desired method. If an access check does not raise any errors on the retrieved resource and that resource specifies an Allow header that lists the desired non-GET HTTP method a subsequent request must be made directly to the retrieved resource its URI (in case redirects were followed) using this HTTP method.

In addition, each access request must include a Referer-Root (sic) HTTP header which has the requesting URI as value.

2.2.2. Access Check

When a user agents has to make an access check for a particular resource it must then associate the following with that resource:

An unordered, initially empty, HTTP access control allow list of which each list item contains a match list and an exclude list.
An unordered, initially empty, HTTP access control deny list of which each list item contains a match list and an exclude list.
An unordered, initially empty, PI access control allow list of which each list item contains a match list and an exclude list.
An unordered, initially empty, PI access control deny list of which each list item contains a match list and an exclude list.
An allow access flag which is used in the algorithms to determine at certain points whether access will be granted. The flag has two values: "true" and "false". Its initial value is "false".

The match lists and exclude lists are unordered lists of access items. The match lists are guaranteed to be non-empty and the exclude lists can be empty.

After associating the aforementioned lists and when all HTTP headers have been received the user agent must run the following algorithm (unless stated otherwise):

Parse the Access-Control headers. If any value does not conform to the syntax required deny access to the resource and terminate the algorithm. If parsed successfully then for each rule run the following steps:
1. If rule-type is "allow" append a new list item to the HTTP access control allow list where the match list is constructed of each access item following "allow" and the exclude list of each access item following "exclude". If "exclude" is not present the exclude list will be empty.
2. If rule-type is "deny" append a new list item to the HTTP access control deny list where the match list is constructed of each access item following "deny" and the exclude list of each access item following "exclude". If "exclude" is not present the exclude list will be empty.
Then run the following steps for each list item (if any) in the HTTP access control deny list:
1. If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
2. If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
3. Deny access to the resource and terminate the overall algorithm.
Run the following steps for each list item (if any) in the HTTP access control allow list:
1. If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
2. If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
3. Set the allow access flag to "true" and go to the next step in the overall set of steps.
If the requested resource has an XML MIME type go to the next step. Otherwise, if the allow access flag is "false" deny access to the resource and terminate the overall algorithm. If the allow access flag is "true" user agents should grant access to the resource and must terminate the overall algorithm.
Parse the resource as an XML document using a streaming XML parser following the rules set forth in the XML specification up to and including the root element start tag. Then process the encountered <?access-control?> processing instructions (if any).

If there is either an XML parse error or failure to parse the processing instructions deny access to the resource and terminate the overall algorithm. Otherwise, run the following steps for each <?access-control?> processing instruction:
1. If the processing instruction has any other pseudo-attributes than deny, allow and exclude, has not exactly two pseudo-attributes or has both deny and allow specified terminate the overall algorithm and deny access to the resource.
2. Let temp match list be the result of parsing the allow or deny pseudo-attribute value, whichever is present. If any obtained value does not match the access item syntax or if no values was obtained terminate the overall algorithm and deny access to the resource.
3. If there is an exclude pseudo-attribute let temp exclude list be the result of parsing the exclude pseudo-attribute value. If any obtained value does not match the access item syntax or if no value was obtained terminate the overall algorithm and deny access to the resource. If there is no such pseudo-attribute let temp exclude list be empty.
4. If there is an allow pseudo-attribute append a new list item to the PI access control allow list where the match list is temp match list and the exclude list is temp exclude list.
  
  Otherwise, there is a deny psuedo-attribute. Append a new list item to the PI access control deny list where the match list is temp match list and the exclude list is temp exclude list.
Then run the following steps for each list item (if any) in the PI access control deny list:
1. If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
2. If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
3. Deny access to the resource and terminate the overall algorithm.
Then run the following steps for each list item (if any) in the PI access control allow list:
1. If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
2. If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
3. Set the allow access flag to "true" and go to the next step in the overall set of steps.
If the allow access flag is "false" deny access to the resource. If the allow access flag is "true" user agents should grant access to the resource.

2.2.3. URI Matching

The requesting URI is the scheme followed by ://, followed by the domain without any trailing U+002E (.) (if any), followed by :, followed by the port (defaulting to the default port for the scheme) of the resource from which the request originated. If the resource does not have a host-based authority (data: URI scheme for instance) the requesting URI is "null".

Specifications which use the access control mechanism must define on what the requesting URI is to be based.

To determine whether a requesting URI and an access item match user agents must run the following algorithm:

Let requesting URI be origin and access item be item.
If item is a single U+002A (*) there is a match. Terminate this algorithm.
If origin is "null" there is no match. Terminate this algorithm.
If item has a scheme and it does not case-insensitively match the scheme from origin there is no match. Terminate this algorithm.
If either item or origin has a scheme remove it including the :// sequence following it.
If item has a port and it does not match the port from origin there is no match. Terminate this algorithm.
If either item or origin has a port remove it including the U+003A (:) preceding it.
If item item has a single U+002E (.) as last character remove that character from item.
Let origin list be origin split on the U+002E (.) character (dropping that character in the process) and item list be item split on the U+002E (.) character (dropping that character in the process). Ensure that the order is preserved.
Reverse the order of origin list and item list.
Now process the first list item of both origin list and item list using the following steps:
1. Let the item from origin list be origin label and the item from item list be item label.
2. If item label is a single U+002A (*) character move to the next step in the overall set of steps.
3. Apply the ToASCII algorithm to origin label and item label and store the result in those variables respectively.
4. If origin label does not case-insensitively match item label there is no match (terminate the overall algorithm).
  
  Otherwise, apply these set of steps to the next list item of both origin list and item list. If the origin list has no next list item there is no match (terminate the overall algorithm.) If the item list has no next list item go to the next step in the overall set of steps.
There is a match. Terminate this algorithm.

References

[RFC2119]: Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. IETF, March 1997.
[RFC2616]: Hypertext Transfer Protocol -- HTTP/1.1, R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, editors. IETF, June 1999
[RFC3490]: Internationalizing Domain Names in Applications (IDNA), P. Faltstrom, P. Hoffman, A. Costello. IETF, March 2003.
[RFC3986]: Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter, editors. IETF, January 2005.
[XML]: Extensible Markup Language (XML) 1.0 (Fourth Edition), T. Bray et al., editors. W3C, August 2006.; Namespaces in XML 1.0 (Second Edition), T. Bray et al., editors. W3C, August 2006.
[XMLSSPI]: Associating Style Sheets with XML documents, J. Clark, editor. W3C, June 1999

Acknowledgments

The editor would like to thank the following people for their contributions to this specification (ordered by first name):

Arthur Barstow
Benjamin Hawkes-Lewis
Cameron McCormack
David Håsäther
Dean Jackson
Eric Lawrence
Frank Ellerman
Ian Hickson
Jonas Sicking
Lachlan Hunt
Maciej Stachowiak
Marc Silbey
Marcos Caceres
Mark Nottingham
Martin Dürst
Matt Womer
Mohamed Zergaoui
Sharath Udupa
Sunava Dutta
Thomas Roessler
Zhenbin Xu

Special thanks to Brad Porter, Matt Oshry and R. Auburn who helped editing earlier versions of this document.