Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0

W3C Working Group Note 13 June 2005

This version:
Latest version:
Previous version:
This is the first version.
Matt Oshry, Tellme Networks (Editor-in-Chief)
Brad Porter, Tellme Networks
RJ Auburn, Voxeo Corporation


XML representations of presentation markup and data are widely available to web browsers over HTTP. Web browsers often run with a higher privilege level than the applications running in those browsers. In order to prevent applications from accessing privileged content, browsers restrict applications to only read XML resources from the application's domain (e.g. LSParser in [DOM3LS] or the <data> element in [VXML21]). This limitation restricts the universe of XML content available to an application and precludes the open sharing of public XML data between applications.

This Note describes one mechanism in use by voice browser vendors to allow XML content providers to specify which application domains can access their XML content. For example, the National Oceanic and Atmospheric Administration (NOAA) may declare that their XML weather data can be accessed by any application, while a stock ticker provider can allow access to individual partner applications that have licensed that data.

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is a W3C Working Group Note, made available by the W3C Voice Browser Working Group as part of the Voice Browser Activity. The authors of this document are the Voice Browser Working Group participants.

This WG Note is being published for information purposes only. The Working Group does not plan to issue updates and therefore has no current plans or process by which to handle feedback.

The W3C has not analyzed the security problems which motivated the publication of this NOTE. This NOTE only addresses a subset of the security issues involved in exposing XML data over HTTP. This NOTE documents an existing practice used under certain circumstances but in no way implies that the technique would be appropriate or secure to protect document access under all circumstances. Implementors should perform their own security analysis.

The public is invited to send comments to the Working Group's public mailing list www-voice@w3.org (archive). See W3C mailing list and archive usage guidelines. Comments received may be taken into consideration if the material in this Note is used in some form in the creation of a Recommendation-track document.

This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only).

This document was produced under the 5 February 2004 W3C Patent Policy. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing [and excluding] a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.

Publication as a Working Group Note does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Table of Contents

1 Introduction
2 <?access-control?> Processing Instruction Algorithm


A References

1 Introduction

A plethora of applications and data are exposed as XML over HTTP. User agents such as Voice and Web browsers fetch and execute applications but restrict the XML content accessible to those applications merely to the URLs located in the same domain as the application. To take advantage of the rich XML content available on the Web, application developers must resort to proxying the content through the domain hosting their application thereby increasing overhead and limiting scalability.

This note describes a mechanism being used in the industry that allows a content provider to use a processing instruction embedded within the XML content to specify the access policy of that content. In this model a user agent can safely extend the sandbox in which it has restricted the application to include access to the XML content if and only if the specified policy grants permission.

2 <?access-control?> Processing Instruction Algorithm

Before allowing an application executing in the context of a user agent to manipulate external XML content, a user agent validates that the host requesting the content is allowed to access the content. This validation is performed by comparing the hostname and IP Address of the document server from which the requesting application was fetched to the list of hostnames, hostname suffixes, and IP addresses listed in the <?access-control?> processing instruction included in the XML content to be fetched. When comparing hostnames, the user agent must perform a case insensitive match as specified in [RFC2616].

If the user agent encounters multiple <?access-control?> processing instructions in the retrieved XML content, it combines them in document order.

If the XML content specifies one or more <?access-control?> processing instructions, access to the content is allowed based on the following algorithm:

  1. If the IP address of the requesting application matches a value in the deny attribute, access is denied, and the search algorithm is stopped.

  2. If the IP address of the requesting application matches a value in the allow attribute, access is allowed, and the search algorithm is stopped.

  3. If the fully qualified domain name of the requesting application exactly matches a value in the deny attribute, access is denied, and the search algorithm is stopped.

  4. If the fully qualified domain name of the requesting application exactly matches a value in the allow attribute, access is allowed, and the search algorithm is stopped.

  5. The user agent then searches for the best match using wildcards on the domain name. Best match is defined as the closest match using the wildcards (e.g. "bert.evil.example.com" matches "*.evil.example.com" more closely than "*.example.com").

  6. If a best match occurs in the deny attribute, access is denied, and the search algorithm is stopped.

  7. If a best match occurs in the allow attribute, access is allowed, and the search algorithm is stopped.

  8. If there is no match on any of the <?access-control?> processing instructions, access is denied, and the search algorithm is stopped.

If the XML content does not contain an <?access-control?> processing instruction, access to the XML content is dependent on the user agent's security environment. A user agent running inside a trusted corporate network and executing untrusted content should enforce a sandboxing policy by denying access. In contrast, it may be appropriate to relax this policy when the user agent is executing only trusted applications that requires access to arbitrary XML feeds on the local network. User agent vendors that allow this sandboxing policy to be configured are encouraged to provide guidance on the appropriate settings. It is critical that network administrators understand the security issues pertinent to their environment and configure their systems appropriately. In tandem, developers and web server administrators must be aware of the dangers of trusting a user agent that can be configured to disable sandboxing.

The following grammar describes the syntax for the <?access-control?> processing instruction to be embedded in the XML content retrieved by the user agent. The grammar is specified using Extended Backus-Naur Form (EBNF) notation. For more information on this syntax, see section 6, Notation, in [XML]. For definitions of the HostName and IPv4address, and IPv6address productions, see [RFC2732].

Access Control Processing Instruction
[1]    AccessControlPI    ::=    '<?access-control' (S 'allow="'AccessList'"' | S "allow='"AccessList"'")? (S 'deny="'AccessList'"' | S "deny='"AccessList"'")? S? '?>'
[2]    AccessList    ::=    AccessItem (S AccessItem)* | '*'
[3]    AccessItem    ::=    HostName | PartialHostName | IPv4address | IPv6address
[4]    PartialHostName    ::=    '*.' HostName

In the following example, the hosts named "voice.roadrunner.edu" and "voice.acme.edu" are allowed access to the XML content. An XML request from an application located on all other hosts (e.g. "voice.coyote.net") will fail.

<?access-control allow="voice.roadrunner.edu voice.acme.edu"?>

Numerous hosts within a domain may require XML content access, and listing them all is impractical. For this reason, the user agent should support wildcard matching through the use of an asterisk ('*') at the beginning of a domain name. In the following example, all applications hosted within the "roadrunner.edu" and "acme.net" domains are allowed access to the XML content containing the processing instruction:

<?access-control allow="*.roadrunner.edu *.acme.edu"?>

To allow any application hosted in any domain to access the XML content, set the value of allow to a single asterisk ('*') as shown in the following example:

<?access-control allow="*"?>

To allow any application hosted in the "example.com" domain with the exception of applications hosted within the "visitors.example.com" domain to access the XML content, set the value of allow to "*.example.com" and the value of deny to "*.visitors.example.com" as shown in the following example:

<?access-control allow="*.example.com" deny="*.visitors.example.com"?>

A References

Document Object Model (DOM) Level 3 Load and Save Specification, ed. Johnny Stenback and Andy Heninger. W3C Recommendation, April 2004. See http://www.w3.org/TR/DOM-Level-3-LS/.
Hypertext Transfer Protocol -- HTTP/1.1, ed. R. Fielding et al. IETF RFC 2616, June 1999. See http://www.ietf.org/rfc/rfc2616.txt.
IPv6 Literal Addresses in URL's, ed. R. Hinden et al. IETF RFC 2732, December 1999. See http://www.ietf.org/rfc/rfc2732.txt.
VoiceXML 2.1, ed. Matt Oshry et al. W3C Candidate Recommendation, June 2005. See http://www.w3.org/TR/2005/CR-voicexml21-20050613/.
Extensible Markup Language (XML) 1.0, ed. Tim Bray et al. W3C Recommendation, February 2004. See http://www.w3.org/TR/2004/REC-xml-20040204/.