XML representations of presentation markup and data are widely available to web browsers over HTTP. Web browsers often run with a higher privilege level than the applications running in those browsers. In order to prevent applications from accessing privileged content, browsers restrict applications to only read XML resources from the application's domain (e.g. LSParser in [DOM3LS] or the <data> element in [VXML21]). This limitation restricts the universe of XML content available to an application and precludes the open sharing of public XML data between applications.
This document describes one mechanism in use by voice browser vendors to allow XML content providers to specify which application domains can access their XML content. For example, the National Oceanic and Atmospheric Administration (NOAA) may declare that their XML weather data can be accessed by any application, while a stock ticker provider can allow access to individual partner applications that have licensed that data.
This document is based on the W3C's 13 June 2005 Working Group Note Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0 [AC-NOTE].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the 17 May 2006 Working Draft of the Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction, the first publication of this specification. This document is produced by a Task Force of the Voice Browser, Web APIs and Web Application Formats (WAF) Working Groups under the auspices of the WAF Working Group. The Web API and WAF Working Groups are part of the Rich Web Clients Activity and the Voice Browser Working Group is part of the Voice Browser Activity. Both of these Activities are within the W3C's Interaction Domain.
The W3C has not analyzed the security problems which motivated the publication of this document. This document only addresses a subset of the security issues involved in exposing XML data over HTTP. This document documents an existing practice used under certain circumstances but in no way implies that the technique would be appropriate or secure to protect document access under all circumstances. Implementors should perform their own security analysis.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
A plethora of applications and data are exposed as XML over HTTP. User agents such as Voice and Web browsers fetch and execute applications but restrict the XML content accessible to those applications merely to the URLs located in the same domain as the application. To take advantage of the rich XML content available on the Web, application developers must resort to proxying the content through the domain hosting their application thereby increasing overhead and limiting scalability.
This note describes a mechanism being used in the industry that allows a content provider to use a processing instruction embedded within the XML prolog to specify the access policy of that content. In this model a user agent can safely extend the sandbox in which it has restricted the application to include access to the XML content if and only if the specified policy grants permission.
The processing instruction is designed explicitly to enable extending the sandbox and is not designed as a restriction mechanism. The expectation is that the user agent's default policy is more strict. Therefore, it is always safe to fall-back to default policy in the event of an error.
ISSUE: The Task Force would like to enable this mechanism as an HTTP header (e.g. Content-Access-Control). We expect to apply this change to a later draft.
Before allowing an application executing in the context of a user agent to manipulate external XML content, a user agent validates that the host requesting the content is allowed to access the content. This validation is performed by comparing the hostname and IP Address of the document server from which the requesting application was fetched to the list of hostnames, hostname suffixes, and IP addresses listed in the <?access-control?> processing instruction included in the XML prolog to be fetched. When comparing hostnames, the user agent must perform a case insensitive match as specified in [RFC2616].
If the XML prolog specifies one or more <?access-control?> processing instructions, access to the content is allowed based on the following algorithm:
If the IP address of the requesting application matches a value in the deny attribute, access is denied, and the search algorithm is stopped.
If the IP address of the requesting application matches a value in the allow attribute, access is allowed, and the search algorithm is stopped.
If the fully qualified domain name of the requesting application exactly matches a value in the deny attribute, access is denied, and the search algorithm is stopped.
If the fully qualified domain name of the requesting application exactly matches a value in the allow attribute, access is allowed, and the search algorithm is stopped.
The user agent then searches for the best match using wildcards on the domain name. Best match is defined as the closest match using the wildcards (e.g. "bert.evil.example.com" matches "*.evil.example.com" more closely than "*.example.com").
If a best match occurs in the deny attribute, access is denied, and the search algorithm is stopped.
If a best match occurs in the allow attribute, access is allowed, and the search algorithm is stopped.
If there is no match on any of the allow or deny attributes, the search algorithm is stopped and the user agent should its own security policy.
ISSUE: There are two choices. One is to assume that if a PI exists, that the PI defines the universe of access allowed. For instance, if a PI exists but does not explicitly allow or deny example.net, then either the user agent may default to its own policy as currently specified. It may be better to say if a PI exists but does not make a statement about example.net, then example.net must be denied access.
Processing instructions outside the XML prolog must be ignored.
If the user agent encounters multiple <?access-control?> processing instructions in the XML prolog of the retrieved XML content, the user agent must collect the attributes from each instruction and create a merged ruleset in document order before executing the access algorithm. For example, the following two blocks are equivalent:
<?access-control allow="*.example.com" deny="*.visitors.example.com"?>
<?access-control allow="www.example.org" deny="www.example.net"?>
<?access-control allow="*.example.com www.example.org" deny="*.visitors.example.com www.example.net"?>
ISSUE: This combination algorithm is under review as there are many options with different intended results. Feedback and use cases are greatly appreciated.
The require-secure attribute is optional. If not specified, the default is false. If the attribute require-secure is specified to be true, the user agent is responsible for ensuring that the requesting application's host has been validated using a secure protocol (e.g. HTTPS).
If the XML prolog does not contain an <?access-control?> processing instruction, access to the XML content is dependent on the user agent's security environment. Similarly, if the requesting application is from a non-remote protocol (e.g. file://) access to the XML content is dependent on the user agent's security environment.
If the processing instruction is not well-formed for any reason, the user agent should ignore all processing instructions and default to its security policy.
The following grammar describes the syntax for the <?access-control?> processing instruction to be embedded in the XML content retrieved by the user agent. The grammar is specified using Extended Backus-Naur Form (EBNF) notation. For more information on this syntax, see section 6, Notation, in [XML]. For definitions of the HostName and IPv4address, and IPv6address productions, see [RFC3986].
ISSUE: The working group would like to move to using URIs in combination with or instead of hosts, domains, and IP addresses. This might also eliminate the need for the require-secure attribute and any special handling for non-remote URI schemes.
On 5/6/2006 Charles McCathieNevile wrote: Current implementations are based on IP numbers, or domain names, allowing for wild cards. There are alternatives, such as the rules for the scope attribute of P3P's HINT element . Although those are similar, they allow a greater graularity, providing for scheme (e.g. file:, http: etc) and port number constraints, as follows: [quote cite=http://www.w3.org/TR/P3P/#hints] ... the host part of the authority MAY begin with a wildcard, as defined in Section 126.96.36.199.2. The scope attribute MUST NOT contain a wildcard in any other position, MUST be encoded according to the conventions in Section 188.8.131.52.2, and MUST NOT contain a path, query or fragment URI component. Additionally, if the authority is a server, it SHOULD NOT contain a userinfo part. For example, legal values for scope include: http://www.example.com http://www.example.com:81 http://*.example.com ftp://ftp.example.org The following are illegal values for the scope attribute: http://www.*.com ; the wildcard can only be at the start http://www.example.com/ ; the trailing slash is not allowed www.example.com ; the scheme must be stated *://www.example.com ; the scheme cannot contain a wildcard http://www.example.com:*; the port cannot contain a wildcard The path attribute is used to locate the policy reference file on the hinted site. It is a relative URI whose base is the URI scheme and authority matched in the scope attribute. The path attribute MUST NOT be an absolute URI, so that the policy reference file is always retrieved from the same site that it is applied to. [/quote] Since this syntax is almost compatible with that already used, but provides greater granularity, I think we could readily use it. In order to provide for backwards compatibility I think we should allow domains to be specified without schemes (although in the general case it could raise more security issues, which should be noted in the relevant section of the spec). Should we adapt it to allow for wildcarding IP ranges? There is another approach taken by the RDF-CL XG, which allows for specifying a base pattern, and exceptions, covering particular files or groups of files. Although this is clearly useful, it goes some way beyond the current approach and requires a significant and incompatible change of syntax. I suggest that is beyond the scope of the current work, although it could be used by future work, and might be worth pointing to as an informative reference. It is already used to provide access control, so it does seem a natural complementary approach that we should at least mention.
In the following example, the hosts named "voice.example.com" and "voice.example.org" are allowed access to the XML content. An XML request from an application located on all other hosts (e.g. "www.example.com") will fail.
<?access-control allow="voice.example.com voice.example.org"?>
Numerous hosts within a domain may require XML content access, and listing them all is impractical. For this reason, the user agent should support wildcard matching through the use of an asterisk ('*') at the beginning of a domain name. Use of the '*' will provide access to any and all applications hosted in that domain. For example, "*.example.com" will allow applications from voice.example.com, www.example.com, mail.example.com, example.com, and any other host ending in example.com to access that data.
In the following example, all applications hosted within the "example.org" and "example.com" domains are allowed access to the XML content containing the processing instruction:
<?access-control allow="*.example.org *.example.com"?>
To allow any application hosted in any domain to access the XML content, set the value of allow to a single asterisk ('*') as shown in the following example:
To allow any application hosted in the "example.com" domain with the exception of applications hosted within the "visitors.example.com" domain to access the XML content, set the value of allow to "*.example.com" and the value of deny to "*.visitors.example.com" as shown in the following example:
<?access-control allow="*.example.com" deny="*.visitors.example.com"?>
The processing instruction is designed explicitly to enable extending the sandbox for access to XML content for "read". It is not designed to used to enforce sandboxing itself restriction or provided generalized trust validation. The expectation is that the user agent's default sandboxing policy is more strict. Therefore, it is always safe to fall-back to default policy in the event of an error.
A user agent running inside a trusted corporate network and executing untrusted content should enforce a sandboxing policy by denying access. In contrast, it may be appropriate to relax this policy when the user agent is executing only trusted applications that requires access to arbitrary XML feeds on the local network. User agent vendors that allow this sandboxing policy to be configured are encouraged to provide guidance on the appropriate settings. It is critical that network administrators understand the security issues pertinent to their environment and configure their systems appropriately. In tandem, developers and web server administrators must be aware of the dangers of trusting a user agent that can be configured to disable sandboxing.
User agents which implement this capability should take care not to expose other trusted data (cookies, HTTP header data) inappropriately. The access-control processing instruction is only designed to enable access to the XML content.
User agents which implement this capability should also take care to properly normalize Unicode and to properly interpret IDNs to prevent URL spoofing attacks.
Application authors should be aware that XML content retrieved from another site is not itself trustable. Authors should take care to protect against exposing themselves to cross-site scripting attacks by failing to validate the content returned or executing the retrieved content directly.