Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0

W3C Working Group Note 13 June 2005

This version:
Latest version:
Previous version:
This is the first version.
Matt Oshry, Tellme Networks (Editor-in-Chief)
Brad Porter, Tellme Networks
RJ Auburn, Voxeo Corporation


XML representations of presentation markup and data are widely available to web browsers over HTTP. Web browsers often run with a higher privilege level than the applications running in those browsers. In order to prevent applications from accessing privileged content, browsers restrict applications to only read XML resources from the application's domain (e.g. LSParser in [DOM3LS] or the <data> element in [VXML21]). This limitation restricts the universe of XML content available to an application and precludes the open sharing of public XML data between applications.

This Note describes one mechanism in use by voice browser vendors to allow XML content providers to specify which application domains can access their XML content. For example, the National Oceanic and Atmospheric Administration (NOAA) may declare that their XML weather data can be accessed by any application, while a stock ticker provider can allow access to individual partner applications that have licensed that data.

Table of Contents

1 Introduction
2 <?access-control?> Processing Instruction Algorithm


A References

1 Introduction

A plethora of applications and data are exposed as XML over HTTP. User agents such as Voice and Web browsers fetch and execute applications but restrict the XML content accessible to those applications merely to the URLs located in the same domain as the application. To take advantage of the rich XML content available on the Web, application developers must resort to proxying the content through the domain hosting their application thereby increasing overhead and limiting scalability.

This note describes a mechanism being used in the industry that allows a content provider to use a processing instruction embedded within the XML content to specify the access policy of that content. In this model a user agent can safely extend the sandbox in which it has restricted the application to include access to the XML content if and only if the specified policy grants permission.

2 <?access-control?> Processing Instruction Algorithm

Before allowing an application executing in the context of a user agent to manipulate external XML content, a user agent validates that the host requesting the content is allowed to access the content. This validation is performed by comparing the hostname and IP Address of the document server from which the requesting application was fetched to the list of hostnames, hostname suffixes, and IP addresses listed in the <?access-control?> processing instruction included in the XML content to be fetched. When comparing hostnames, the user agent must perform a case insensitive match as specified in [RFC2616].

If the user agent encounters multiple <?access-control?> processing instructions in the retrieved XML content, it combines them in document order.

If the XML content specifies one or more <?access-control?> processing instructions, access to the content is allowed based on the following algorithm:

  1. If the IP address of the requesting application matches a value in the deny attribute, access is denied, and the search algorithm is stopped.

  2. If the IP address of the requesting application matches a value in the allow attribute, access is allowed, and the search algorithm is stopped.

  3. If the fully qualified domain name of the requesting application exactly matches a value in the deny attribute, access is denied, and the search algorithm is stopped.

  4. If the fully qualified domain name of the requesting application exactly matches a value in the allow attribute, access is allowed, and the search algorithm is stopped.

  5. The user agent then searches for the best match using wildcards on the domain name. Best match is defined as the closest match using the wildcards (e.g. "bert.evil.example.com" matches "*.evil.example.com" more closely than "*.example.com").

  6. If a best match occurs in the deny attribute, access is denied, and the search algorithm is stopped.

  7. If a best match occurs in the allow attribute, access is allowed, and the search algorithm is stopped.

  8. If there is no match on any of the <?access-control?> processing instructions, access is denied, and the search algorithm is stopped.

If the XML content does not contain an <?access-control?> processing instruction, access to the XML content is dependent on the user agent's security environment. A user agent running inside a trusted corporate network and executing untrusted content should enforce a sandboxing policy by denying access. In contrast, it may be appropriate to relax this policy when the user agent is executing only trusted applications that requires access to arbitrary XML feeds on the local network. User agent vendors that allow this sandboxing policy to be configured are encouraged to provide guidance on the appropriate settings. It is critical that network administrators understand the security issues pertinent to their environment and configure their systems appropriately. In tandem, developers and web server administrators must be aware of the dangers of trusting a user agent that can be configured to disable sandboxing.

The following grammar describes the syntax for the <?access-control?> processing instruction to be embedded in the XML content retrieved by the user agent. The grammar is specified using Extended Backus-Naur Form (EBNF) notation. For more information on this syntax, see section 6, Notation, in [XML]. For definitions of the HostName and IPv4address, and IPv6address productions, see [RFC2732].

Access Control Processing Instruction
[1]    AccessControlPI    ::=    '<?access-control' (S 'allow="'AccessList'"' | S "allow='"AccessList"'")? (S 'deny="'AccessList'"' | S "deny='"AccessList"'")? S? '?>'
[2]    AccessList    ::=    AccessItem (S AccessItem)* | '*'
[3]    AccessItem    ::=    HostName | PartialHostName | IPv4address | IPv6address
[4]    PartialHostName    ::=    '*.' HostName

In the following example, the hosts named "voice.roadrunner.edu" and "voice.acme.edu" are allowed access to the XML content. An XML request from an application located on all other hosts (e.g. "voice.coyote.net") will fail.

<?access-control allow="voice.roadrunner.edu voice.acme.edu"?>

Numerous hosts within a domain may require XML content access, and listing them all is impractical. For this reason, the user agent should support wildcard matching through the use of an asterisk ('*') at the beginning of a domain name. In the following example, all applications hosted within the "roadrunner.edu" and "acme.net" domains are allowed access to the XML content containing the processing instruction:

<?access-control allow="*.roadrunner.edu *.acme.edu"?>

To allow any application hosted in any domain to access the XML content, set the value of allow to a single asterisk ('*') as shown in the following example:

<?access-control allow="*"?>

To allow any application hosted in the "example.com" domain with the exception of applications hosted within the "visitors.example.com" domain to access the XML content, set the value of allow to "*.example.com" and the value of deny to "*.visitors.example.com" as shown in the following example:

<?access-control allow="*.example.com" deny="*.visitors.example.com"?>

A References

Document Object Model (DOM) Level 3 Load and Save Specification, ed. Johnny Stenback and Andy Heninger. W3C Recommendation, April 2004. See http://www.w3.org/TR/DOM-Level-3-LS/.
Hypertext Transfer Protocol -- HTTP/1.1, ed. R. Fielding et al. IETF RFC 2616, June 1999. See http://www.ietf.org/rfc/rfc2616.txt.
IPv6 Literal Addresses in URL's, ed. R. Hinden et al. IETF RFC 2732, December 1999. See http://www.ietf.org/rfc/rfc2732.txt.
VoiceXML 2.1, ed. Matt Oshry et al. W3C Candidate Recommendation, June 2005. See http://www.w3.org/TR/2005/CR-voicexml21-20050613/.
Extensible Markup Language (XML) 1.0, ed. Tim Bray et al. W3C Recommendation, February 2004. See http://www.w3.org/TR/2004/REC-xml-20040204/.