"Deep Linking" in the World Wide Web

TAG Finding 17 Feb 2003

This version:
Latest version:
http://www.w3.org/2001/tag/doc/deeplinking (XML)
Previous versions:
12 Feb TAG draft, Draft 2, Draft 1
Tim Bray, Antarctica Systems <tbray@textuality.com>


The community of Web users has been engaged in discussion and litigation concerning the practice of "deep linking." This document is designed to provide input to this discussion based on the architecture of the underlying technology.

Status of this Document

This document has been developed for discussion by the W3C Technical Architecture Group. This finding addresses issue deepLinking-25.

This finding was first accepted by the TAG at its 7 February 2003 face-to-face meeting, and then reconfirmed (with a small change) at its 17 February 2003 teleconference.

Publication of this finding does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time.

Additional TAG findings, both approved and in draft state, may also be available. The TAG expects to incorporate this and other findings into a Web Architecture Document that will be published according to the process of the W3C Recommendation Track.

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Introduction and summary
2 Deep linking background
3 The Uniform Resource Identifier and Web Architecture
4 Access Control and Accountability on the Web
5 Deep Linking by Analogy
6 Resource Access is an Issue of Public Policy
7 Conclusion
8 References

1 Introduction and summary

This finding discusses the issues raised by the controversies around deep linking. This discussion includes a survey of the usage and meaning of Web addresses and the mechanisms available to control access to resources on the Web.

The conclusion is that any attempt to forbid the practice of deep linking is based on a misunderstanding of the technology, and threatens to undermine the functioning of the Web as a whole. The two chief reasons for this are:

2 Deep linking background

People engaged in delivering information or services via the World Wide Web typically speak in terms of "Web sites" which have "home pages" or "portal pages." Deep linking is the practice of publishing a hyperlink from a page on one site to a page "inside" another site, bypassing the "home" or "portal" page.

Certain Web publishers wish to prevent or control deep linking into their site, and wish to establish a right to exercise such control as a matter of public policy, i.e., through litigation based on existing law or by instituting new legislation.

3 The Uniform Resource Identifier and Web Architecture

This issue centers around the use of hyperlinks. The central feature of a hyperlink, and indeed a central feature of Web architecture, is the notion of a "Uniform Resource Identifier" (URI), often called a "Uniform Resource Locator" or URL, or in everyday speech a "Web address." Every object on the Web must have a URI, which is simply a string of characters that may be typed into a Web browser, read over the phone, or painted on the side of a vehicle.

The only purpose of a URI is to identify a Web resource. It is basic to the architecture of the Web that URIs may be freely interchanged, and that once one knows a URI, one may pass it onto others, publish it, and attempt to access whatever resource it identifies. There is a clear distinction between identifying a resource and accessing it. It is entirely reasonable to control access to a resource, but entirely futile to prevent it being identified.

The formal definition of the URI, on which all of the software that successfully drives the Web is built, is in [RFC2396]. This formal definition has no notion of a "home" or "portal" page, nor does any of the vast amount of software deployed to process URIs. Thus, from the point of view of the underlying technology, all links are deep links.

4 Access Control and Accountability on the Web

While the Web does not limit anyone's ability to refer to any resource, it offers a rich suite of access-control facilities. The procedures by which resources may be accessed over the web are those of the Hypertext Transfer Protocol (HTTP), which is formally defined in [RFC2616]. When any piece of software attempts to access a resource via its URI, it sends a request which typically contains a variety of information including:

When such a request is received, it may succeed, or it may fail. It may fail because there is no resource identified by the URI (the well-known "404 Not Found") or because the server refuses, based on the information available, to grant access ("401 Permission Denied"). A server can be programmed to deny access to any resource for a variety of reasons, including:

5 Deep Linking by Analogy

Two analogies have been proposed to help illuminate the question of deep linking through parallels in the real world.

The first analogy is with buildings, which typically have a number of doors. A building might have a policy that the public may only enter via the main front door, and only during normal working hours. People employed in the building and in making deliveries to it might use other doors as appropriate. Such a policy would be enforced by a combination of security personnel and mechanical devices such as locks and pass-cards. One would not enforce this policy by hiding some of the building entrances, nor by requesting legislation requiring the use of the front door and forbidding anyone to reveal the fact that there are other doors to the building.

The second analogy is with a library, which has a well-known street address. Each book on the shelves of this library also has an identifier, composed of its title, author, call number, shelf location, and so on. The library certainly will exercise access control to the individual books; but it would be counterproductive to do so by forbidding the publication of their identities.

These analogies are compelling in the context of the deep linking issue. A provider of Web resources who does not make use of the built-in facilities of the Web to control access to a resource is unlikely to achieve either justice or a good business outcome by attempting to suppress information about the existence of the resource.

6 Resource Access is an Issue of Public Policy

The Web's structure includes facilities to implement nearly any imaginable set of business policies as regards access control. For example, access policies based on the "Referer" field could restrict access to links from a "home page."

Unethical parties could, of course, attempt to circumvent such policies, for example by programming software to transmit false values in various request fields, or by stealing passwords, or any number of other nefarious practices. Such a situation has clearly passed from the domain of technology to that of policy. Public policy may need to be developed as to the seriousness of such attempts to subvert the system, the nature of proof required to establish a transgression, the appropriate penalties for transgressors, and so on.

7 Conclusion

Attempts at the public-policy level to limit the usage, transmission and publication of URIs at the policy level are inappropriate and based on a misunderstanding of the Web's architecture. Attempts to control access to the resources identified by URIs are entirely appropriate and well-supported by the Web technology.

This issue is important because attempts to limit deep linking are in fact risky for two reasons:

  1. The policy is at risk of failure. The Web is so large that any policy enforcement requires considerable automated support from software to be practical. Since a deep link looks like any other link to Web software, such automated support is not practical.

  2. The Web is at the risk of damage. The hypertext architecture of the Web has brought substantial benefits to the world at large. The onset of legislation and litigation based on confusion between identification and access has the potential to impair the future development of the Web.

8 References

T. Berners-Lee, R. Fielding, L. Masinter. Uniform Resource Identifiers (URI): Generic Syntax, RFC2396, August 1998. (See http://www.ietf.org/rfc/rfc2396.txt.)
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee Hypertext Transfer Protocol -- HTTP/1.1, RFC2616, June 1999. (See http://www.ietf.org/rfc/rfc2616.txt.)