Copyright © 2007 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document defines a mechanism to selectively provide client side
cross-site access to a Web resource. Using either a HTTP header or an XML
processing instruction (or both) resources can indicate they allow read
access from specified hosts (optionally using patterns). When a pattern is
used, one can also exclude certain hosts. For instance, allow read access
from example.org
and its subdomains with the exception of
public.example.org
.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the 1 October 2007 Working Draft of the "Enabling Read Access for Web Resources" document. It is expected that this document will progress along the W3C Recommendation track. This document is produced by the Web Application Formats (WAF) Working Group. The WAF Working Group is part of the Rich Web Clients Activity in the W3C Interaction Domain.
Please send comments to the WAF Working Group's public mailing list public-appformats@w3.org with [access-control] at the start of the subject line. Archives of this list are available. See also W3C mailing list and archive usage guidelines.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The Web has a rich set of resources that can be combined to build content, applications and feature-rich Web sites. A contributor to this richness is Web sites including references (e.g. a link or an image inclusion) to resources residing in other domains.
To prevent information leakage, user agents, such as Web browsers,
implement a same origin policy that allows a document (e.g. some
JavaScript) to read, process, or otherwise interrogate the contents of
another resource if, and only if, the other resource resides in the same
domain. This policy prevents domain A, acting on behalf of the
user, to get information from domain B. For instance, this
prevents a malicious site from reading information from the user's
intranet using a technology such as XMLHttpRequest
.
This restriction is very strict and generally appropriate. However,
there are scenarios where an application would like to get data from
another resource on the Web without these restrictions. For this to work
the browser's same origin policy has to be extended or eased. For example,
a car reservation Web site may want to request trip itinerary data from an
affiliated airline reservation website to streamline making a car
reservation. The easing of read access restrictions is particularly
important to Web browsers that implement the XMLHttpRequest
object and VoiceXML 2.1 browsers using the data
element.
To facilitate clear and controlled read access to resources, this specification defines a read access control mechanism that enables a Web resource to permit access to its content from external domains when such access would otherwise be prohibited by a same origin policy. The defined mechanism only works in conjunction with other specifications that are using the read access control mechanism to enable read access.
This specification is applicable to both user agents and other specifications. This specification will only apply in certain contexts and specifications defining such contexts will define when and how this specification applies.
As well as sections marked as non-normative, all diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
In this specification, The words must, must not, should, should not and may are to be interpreted as described in RFC 2119. [RFC2119]
A conformant specification is one that implements all the requirements (the must and must not statements) listed in this specification that are applicable to specifications. For instance, a specification using the access control mechanism needs to define what the requesting URI is.
A conformant user agent is one that implements all the requirements listed in this specification that are applicable to user agents, while also being consistent with the requirements listed in the specifications that use the access control read policy.
User agents may optimize any algorithm given in this specification, so long as the end result is indistinguishable from the result that would be obtained by the specification's algorithms. (The algorithms in this specification are generally written with more concern for clarity than efficiency.)
Terminology is generally defined throughout the specification. However, the few definitions that did not really fit anywhere else are defined here instead.
The term ToASCII
algorithm means that
the ToASCII
algorithm as described in RFC 3490 is applied
with both the AllowUnassigned
and
UseSTD3ASCIIRules
flags set. [RFC3490]
There is a case-insensitive match of strings s1 and s2 if after uppercasing both strings (by mapping a-z to A-Z) they are identical.
U+0009, U+000A, U+000D and U+0020 are space characters.
A space-separated list is a string of which the items are separated by one or more space characters (in any order). The string may also be prefixed or suffixed with zero or more of those characters.
To obtain the values from a space-separated list user agents must replace any sequence space characters (in any order) with a single U+0020 character, dropping any leading or trailing U+0020 character, and then chopping the resulting string at each occurrence of a U+0020 character, dropping that character in the process.
An XML MIME type is text/xml
,
application/xml
or any MIME type ending in +xml
.
The mechanism defined in this specification allows for extension of the same origin policy in contexts where the same origin policy applies.
When making requests to resources which before implementing this specification could not accessed user agents should ensure to:
GET
,
unless extra steps are taken as outlined for access requests.
A user agent running inside a trusted corporate network and executing untrusted content should enforce a sandboxing policy by denying access. However, it may be appropriate to relax this policy when the user agent is executing only trusted applications that requires access to arbitrary resources on the local network. User agent vendors that allow this sandboxing policy to be configured are encouraged to provide guidance on the appropriate settings. It is critical that network administrators understand the security issues pertinent to their environment and configure their systems appropriately. In tandem, developers and Web server administrators are to be aware of the dangers of trusting a user agent that can be configured to disable sandboxing.
User agents which implement this specification must also take care to properly normalize Unicode and to properly interpret IDNs to prevent URI spoofing attacks as outlined in the specification. [RFC3490]
Application authors should be aware that content retrieved from another site is not itself trustable. Authors should take care to protect against exposing themselves to cross-site scripting attacks by rendering or executing the retrieved content directly without validation.
The access control mechanism is the process
of the origin resource doing an access request, followed by an access
check on the retrieved resource. As explained in the processing model section this gets slightly
more complicated for non-GET
requests and in the face of
redirects.
The origin resource is the resource that
performs the request using a particular mechanism, such as
XMLHttpRequest
. A specification defining such a mechanism must define for which requests and how the access control
mechanism applies. A request where the access control mechanism applies is
called an access request. This request gives
a retrieved resource upon which an access check will be performed. Based on the
outcome of that check the origin resource will or will not be provided
with the contents of the retrieved resource.
The access control mechanism is described in the processing model section. That section details how user agents are to handle access requests and access checks.
The syntax section describes how to author a retrieved resource so that it can be accessed (or prevented from being accessed) from the origin resource. It also defines various syntactical constraints that user agents will verify as part of the processing model.
Access to a retrieved resource can be given (or denied) using either an HTTP header or an XML processing instruction. Both share the access item syntax which is used for URI matching.
An access item is either a single *
character (always
matches) or a domain that can contain a wildcard at the start and can
optionally have a scheme and port specified. An access item must match the following EBNF:
access-item ::= (scheme "://")? domain-pattern (":" port)? | "*" domain-pattern ::= domain | "*." domain
scheme
and port
are used as defined in RFC
3986. domain
is an internationalized domain name as defined
in RFC 3490. [RFC3986] [RFC3490]
In addition to matching the above EBNF the ToASCII
algorithm must
apply successfully (without errors) to each label
component
of the subdomain
(if any) from the access item.
When the access item is used as part of the
Access-Control
HTTP
header authors must specify the result of applying the
ToASCII
algorithm to the
internationalized domain name as HTTP does not support Unicode.
If the scheme or port is omitted they will match any scheme or port.
When *
is used as part of
domain-pattern
it matches any number of internationalized
labels before domain
. If just domain
is used it
will match itself and any number of internationalized labels before
domain
.
Several examples of conforming access items:
*
*.example.org
https://*.example.org
https://example.org:8443
The following access items would make the user agent deny access to the resource:
https://*.*:80
*://example.org
http://example.org/
http://example.org/example
http://example.org:
http://example.org:*
The following access items are not identical:
http://example.org
http://example.org:80
The following access items would match
http://foo.bar.example.org:80
:
org
*.org
example.org
http://example.org
http://*.bar.example.org:80
Access-Control
HTTP HeaderRetrieved
resources can have one or more Access-Control
headers
defined. These headers must match the following EBNF:
Access-Control ::= "Access-Control" ":" LWS? ruleset ruleset ::= LWS? rule LWS? ("," LWS? rule LWS?)* rule ::= rule-type (LWS pattern)+ (LWS "exclude" (LWS pattern)+)? rule-type ::= "allow" | "deny" pattern ::= "<" access item ">"
As stated by RFC 2616, multiple Access-Control
headers may be
combined.
The syntax of access items when used in the Access-Control
HTTP header is restricted to
internationalized domain names to which the ToASCII algorithm has been
applied as HTTP does not support Unicode.
LWS
is used as defined by RFC 2616. [RFC2616]
In case resources on a domain are not all in the control of a single
person "deny" rules can be used by authors to deny read access from
external resources to the entire domain. Read access from other domains is
by default disallowed but individual resources on the domain could have
<?access-control?>
processing instructions specified which can allow access from other
domains. Although files can contain processing instructions, HTTP headers
can be set across an entire server making them far more effective. The
"exclude" clause can be used to list exclusions to these "deny" rules.
"allow" rules can be used to allow read access from particular domains as long as those domains don't match any of the patterns listed in "exclude".
Access-Control: allow <*.example.org> exclude <*.public.example.org>
Access-Control: allow <webmaster.public.example.org>
Means that every subdomain of example.org
can access the
resource including webmaster.public.example.org
, but with
the exclusion of all other subdomains of public.example.org
.
Access-Control: allow <example.org> <*.example.org>
Means that example.org
and all its subdomains can access
the resource.
<?access-control?>
Processing InstructionXML resources may include an <?access-control?>
processing
instruction within the XML Prolog to indicate, if the access control
read policy applies, from which domains their content can be
accessed. [XML]
The processing instruction takes three pseudo-attributes which each take
a space-separated list of access items. These
pseudo-attributes are allow
, deny
and
exclude
. Either the allow
or deny
pseudo-attribute must be specified. allow
and deny
must not be specified at the same
time. If an attribute is specified it must at least
contain an access item.
An <?access-control?>
processing instruction that is part of the XML Prolog must be parsed using the same syntax rules as described in
the XML Stylesheet PI specification. <?access-control?>
processing
instructions outside the XML Prolog are ignored. [XMLSSPI]
The above means that the following examples would be non-conforming and would make the user agent deny access to the resource:
<?access-control?>
<?access-control x?>
<?access-control x=""?>
<?access-control allow=""?>
<?access-control allow="http://example.org"
x=""?>
<?access-control allow="allow.example.org"
deny="deny.example.org"?>
When a user agent performs a request to which the access control mechanism applies it performs an access request. The exact details of an access request such as how to deal with network errors, redirects, et cetera are out of scope this specification and must be defined by specifications using the access control mechanism. This specification does make some requirements on such access requests though.
Namely, requests using a non-GET
HTTP method must be preceded by a request using the GET
HTTP method. This request includes an If-Method-Allowed
header specifying the desired method. If an access
check does not raise any errors on the retrieved resource and that resource
specifies an Allow
header that lists the desired
non-GET
HTTP method a subsequent request must be made directly to the retrieved resource its URI (in
case redirects were followed) using this HTTP method.
In addition, each access request must include a
Referer-Root
(sic) HTTP header which has the requesting URI as value.
When a user agents has to make an access check for a particular resource it must then associate the following with that resource:
An unordered, initially empty, HTTP access control allow list of which each list item contains a match list and an exclude list.
An unordered, initially empty, HTTP access control deny list of which each list item contains a match list and an exclude list.
An unordered, initially empty, PI access control allow list of which each list item contains a match list and an exclude list.
An unordered, initially empty, PI access control deny list of which each list item contains a match list and an exclude list.
An allow access flag which is used in the algorithms to determine at certain points whether access will be granted. The flag has two values: "true" and "false". Its initial value is "false".
The match lists and exclude lists are unordered lists of access items. The match lists are guaranteed to be non-empty and the exclude lists can be empty.
After associating the aforementioned lists and when all HTTP headers have been received the user agent must run the following algorithm (unless stated otherwise):
Parse the Access-Control
headers. If any
value does not conform to the syntax required deny access to the
resource and terminate the algorithm. If parsed successfully then
for each rule run the following steps:
If rule-type
is "allow"
append a new list
item to the HTTP access control allow list
where the match list is constructed of each
access item following "allow"
and the exclude list of each access item following "exclude"
.
If "exclude"
is not present the exclude list will be
empty.
If rule-type
is "deny"
append a new list
item to the HTTP access control deny list
where the match list is constructed of each
access item following "deny"
and the exclude list of each access item following "exclude"
.
If "exclude"
is not present the exclude list will be
empty.
Then run the following steps for each list item (if any) in the HTTP access control deny list:
If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
Deny access to the resource and terminate the overall algorithm.
Run the following steps for each list item (if any) in the HTTP access control allow list:
If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
Set the allow access flag to "true" and go to the next step in the overall set of steps.
If the requested resource has an XML MIME type go to the next step. Otherwise, if the allow access flag is "false" deny access to the resource and terminate the overall algorithm. If the allow access flag is "true" user agents should grant access to the resource and must terminate the overall algorithm.
Parse the resource as an XML document using a streaming XML parser
following the rules set forth in the XML specification up to and
including the root element start tag. Then process the encountered
<?access-control?>
processing instructions (if any).
If there is either an XML parse error or failure to parse the
processing instructions deny access to the resource and
terminate the overall algorithm. Otherwise, run the following steps for
each <?access-control?>
processing instruction:
If the processing instruction has any other pseudo-attributes than
deny
, allow
and exclude
, has
not exactly two pseudo-attributes or has both deny
and
allow
specified terminate the overall algorithm and
deny access to the resource.
Let temp match list be the result of parsing the allow
or deny
pseudo-attribute value, whichever is present. If any obtained value
does not match the access item syntax or if
no values was obtained terminate the overall algorithm and deny
access to the resource.
If there is an exclude
pseudo-attribute let temp exclude list be the result of parsing the
exclude
pseudo-attribute value. If any obtained value
does not match the access item syntax or if
no value was obtained terminate the overall algorithm and deny
access to the resource. If there is no such pseudo-attribute let
temp exclude list be empty.
If there is an allow
pseudo-attribute append a new list
item to the PI access control allow list
where the match list is temp
match list and the exclude list is
temp exclude list.
Otherwise, there is a deny
psuedo-attribute. Append a
new list item to the PI access control deny
list where the match list is temp match list and the exclude
list is temp exclude list.
Then run the following steps for each list item (if any) in the PI access control deny list:
If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
Deny access to the resource and terminate the overall algorithm.
Then run the following steps for each list item (if any) in the PI access control allow list:
If there is no match for any access item from the match list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
If the exclude list is non-empty and there is a match for any access item from the exclude list against the requesting URI process the next list item. If there is no next list item go to the next step in the overall set of steps.
Set the allow access flag to "true" and go to the next step in the overall set of steps.
If the allow access flag is "false" deny access to the resource. If the allow access flag is "true" user agents should grant access to the resource.
The requesting URI is the
scheme
followed by ://
, followed by the
domain
without any trailing U+002E (.
) (if any),
followed by :
, followed by the port
(defaulting
to the default port for the scheme
) of the resource from
which the request originated. If the resource does not have a host-based
authority (data:
URI scheme for instance) the requesting URI
is "null".
Specifications which use the access control mechanism must define on what the requesting URI is to be based.
To determine whether a requesting URI and an access item match user agents must run the following algorithm:
Let requesting URI be origin and access item be item.
If item is a single U+002A (*
) there
is a match. Terminate this algorithm.
If origin is "null" there is no match. Terminate this algorithm.
If item has a scheme
and it does not
case-insensitively match the scheme
from origin there is no match. Terminate this algorithm.
If either item or origin has a
scheme
remove it including the ://
sequence
following it.
If item has a port
and it does not
match the port
from origin there is no
match. Terminate this algorithm.
If either item or origin has a
port
remove it including the U+003A (:
)
preceding it.
If item item has a single U+002E (.
)
as last character remove that character from item.
Let origin list be origin
split on the U+002E (.
) character (dropping that character
in the process) and item list be item split on the U+002E (.
) character
(dropping that character in the process). Ensure that the order is
preserved.
Reverse the order of origin list and item list.
Now process the first list item of both origin list and item list using the following steps:
Let the item from origin list be origin label and the item from item list be item label.
If item label is a single U+002A
(*
) character move to the next step in the overall set of
steps.
Apply the ToASCII
algorithm to
origin label and item label
and store the result in those variables respectively.
If origin label does not case-insensitively match item label there is no match (terminate the overall algorithm).
Otherwise, apply these set of steps to the next list item of both origin list and item list. If the origin list has no next list item there is no match (terminate the overall algorithm.) If the item list has no next list item go to the next step in the overall set of steps.
There is a match. Terminate this algorithm.
The editor would like to thank the following people for their contributions to this specification (ordered by first name):
Special thanks to Brad Porter, Matt Oshry and R. Auburn who helped editing earlier versions of this document.