<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN"
               "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [
  <!-- ================================================================ -->
  <!ENTITY draft.day "11">
  <!ENTITY draft.month "09">
  <!ENTITY draft.monthname "Sep">
  <!ENTITY draft.year "2003">
  <!ENTITY iso6.doc.date "&draft.year;-&draft.month;-&draft.day;">
  <!ENTITY http-ident "http://www.w3.org/2001/tag/doc/deeplinking">
]>
<spec w3c-doctype='other'>
<?CVS $Id: deeplinking.xml,v 1.8 2003/09/11 16:30:36 ijacobs Exp $?>
<header>
<title>"Deep Linking" in the World Wide Web</title>
<w3c-designation>&http-ident;-&iso6.doc.date;</w3c-designation>
<w3c-doctype>TAG Finding</w3c-doctype>
<pubdate><day>&draft.day;</day>
<month>&draft.monthname;</month>
<year>&draft.year;</year>
</pubdate>
<publoc>
<loc href="&http-ident;-&draft.year;&draft.month;&draft.day;">&http-ident;-&draft.year;&draft.month;&draft.day;</loc>
</publoc>
<latestloc><loc href="&http-ident;.html">&http-ident;</loc>
(<loc href="&http-ident;.xml">XML</loc>)
</latestloc>
<prevlocs>
<loc
href="http://www.w3.org/2001/tag/doc/deeplinking-20030217.html">17 Feb
2003 TAG
draft</loc>
</prevlocs>
<authlist>
<author><name>Tim Bray</name>
<affiliation>Antarctica Systems</affiliation>
<email href="mailto:tbray@textuality.com">tbray@textuality.com</email></author>
</authlist>
<copyright>
<p>
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice#Copyright">Copyright</loc> &#xA9; 2003
<loc href="http://www.w3.org/">W3C</loc><sup>&#xAE;</sup>
(<loc href="http://www.lcs.mit.edu/">MIT</loc>,
<loc href="http://www.ercim.org/">ERCIM</loc>,
<loc href="http://www.keio.ac.jp/">Keio</loc>),
All Rights Reserved. W3C
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer">liability</loc>,
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice#W3C_Trademarks">trademark</loc>,
<loc href="http://www.w3.org/Consortium/Legal/copyright-documents">document use</loc>, and
<loc href="http://www.w3.org/Consortium/Legal/copyright-software">software licensing</loc>
rules apply.
</p></copyright>

<abstract>

<p>The community of Web users has been engaged in discussion and
litigation concerning the practice of "deep linking." This document is
designed to provide input to this discussion based on the architecture
of the underlying technology.</p>
</abstract>

<status>
<p>This document has been developed for discussion by the
<loc href="/2001/tag/">W3C Technical Architecture Group</loc>.
This finding addresses <loc href="http://www.w3.org/2001/tag/ilist#deepLinking-25">issue
deepLinking-25</loc>.</p>

<p>The only change in the &draft.day; &draft.monthname; &draft.year;
finding is the addition of a reference to a court decision in Germany
that relates to deep linking. The TAG decided to add this
to the finding at its <a
href="http://www.w3.org/2003/07/21-tag-summary.html#july22">July
face-to-face meeting</a>.</p>

<p>This finding was first accepted by the TAG at its <loc
href="http://www.w3.org/2003/02/06-tag-summary#deepLinking-25">7
February 2003 face-to-face meeting</loc>, and then reconfirmed (with a
small change) at its <loc
href="http://www.w3.org/2003/02/17-tag-summary.html">17 February 2003
teleconference</loc>.</p>

<p>Publication of this finding does not imply endorsement by the W3C
Membership. This is a draft document and may be updated, replaced or
obsoleted by other documents at any time.</p>

<p><loc href="/2001/tag/findings">Additional TAG findings</loc>, both
approved and in draft state, may also be available. The TAG expects to
incorporate this and other findings into a Web Architecture Document
that will be published according to the process of the <loc
href="/Consortium/Process-20010719/tr#Recs">W3C Recommendation
Track</loc>.</p>

<p>Please send comments on this finding to the publicly archived TAG
mailing list <loc
href="mailto:www-tag@w3.org">www-tag@w3.org</loc>
(<loc
href="http://lists.w3.org/Archives/Public/www-tag/">archive</loc>).</p>

</status>
<pubstmt>
<p>World-Wide Web Consortium,
Draft TAG Finding, 2003.</p>
</pubstmt>
<sourcedesc>
<p>Created in electronic form.</p>
</sourcedesc>
<langusage>
<language id="EN">English</language>
</langusage>
<revisiondesc>
<slist>
<sitem>2003-02-12: Published draft</sitem>
</slist>
</revisiondesc>
</header>
<body>

<div1 id="intro">
<head>Introduction and summary</head>

<p>This finding discusses the issues raised by the controversies around
deep linking. This discussion includes a survey of the usage and
meaning of Web addresses and the mechanisms available to control
access to resources on the Web.</p>

<p>The conclusion is that any attempt to forbid the practice of deep
linking is based on a misunderstanding of the technology, and
threatens to undermine the functioning of the Web as a whole. The two
chief reasons for this are:</p>

<ulist>

<item><p>A Web Address ("URI," or "URL") is just an identifier. There is a
clear distinction between identifying a resource on the Web and
accessing it; suppressing the use of identifiers is not logically
consistent.</p></item>

<item><p>It is entirely reasonable for owners of Web resources to
control access to them. The Web provides several mechanisms for doing
this, none of which rely on hiding or suppressing identifiers for
those resources.</p></item>
</ulist>
</div1>

<div1 id="background">
<head>Deep linking background</head>

<p>People engaged in delivering information or services via the World
Wide Web typically speak in terms of "Web sites" which have "home
pages" or "portal pages." Deep linking is the practice of publishing
a hyperlink from a page on one site to a page "inside" another site,
bypassing the "home" or "portal" page.</p>

<p>Certain Web publishers wish to prevent or control deep linking into
their site, and wish to establish a right to exercise such control as
a matter of public policy, i.e., through litigation based on existing
law or by instituting new legislation.</p>

</div1>

<div1 id="on-uris">
<head>The Uniform Resource Identifier and Web Architecture</head>

<p>This issue centers around the use of hyperlinks. The central
feature of a hyperlink, and indeed a central feature of Web
architecture, is the notion of a "Uniform Resource Identifier" (URI),
often called a "Uniform Resource Locator" or URL, or in everyday
speech a "Web address." Every object on the Web must have a URI, which
is simply a string of characters that may be typed into a Web browser,
read over the phone, or painted on the side of a vehicle.</p>

<p>The only purpose of a URI is to identify a Web resource. It is
basic to the architecture of the Web that URIs may be freely
interchanged, and that once one knows a URI, one may pass it onto
others, publish it, and attempt to access whatever resource it
identifies. There is a clear distinction between identifying a
resource and accessing it. It is entirely reasonable to control access
to a resource, but entirely futile to prevent it being identified.</p>

<p>The formal definition of the URI, on which all of the software that
successfully drives the Web is built, is in <bibref
ref="rfc2396"/>. This formal definition has no notion of a "home" or
"portal" page, nor does any of the vast amount of software deployed to
process URIs. Thus, from the point of view of the underlying
technology, all links are deep links.</p>

</div1>

<div1 id="access-control">
<head>Access Control and Accountability on the Web</head>

<p>While the Web does not limit anyone's ability to refer to any
resource, it offers a rich suite of access-control facilities. The
procedures by which resources may be accessed over the web are those
of the Hypertext Transfer Protocol (HTTP), which is formally defined
in <bibref ref="rfc2616"/>. When any piece of software attempts to
access a resource via its URI, it sends a request which typically
contains a variety of information including:</p>

<ulist>
    <item><p>The identity of the software (for example, Microsoft Internet Explorer or the Google indexing engine).</p></item>
    <item><p>The URI of the resource which contained the link which is
being followed, known as "the Referer" [sic]. This field is not compulsory but is widely provided by popular user agents such as Web Browsers.</p></item>
    <item><p>Optionally, a user identification and password for the
resource being accessed.</p></item>
</ulist>

<p>When such a request is received, it may succeed, or it may fail. It
may fail because there is no resource identified by the URI (the
well-known "404 Not Found") or because the server refuses, based on
the information available, to grant access ("401 Permission
Denied"). A server can be programmed to deny access to any resource
for a variety of reasons, including:</p>

<ulist>

    <item><p>Resource access requires a username and password, and none was provided, or the username was not recognized, or the password was wrong.</p></item>
    <item><p>The server has a policy about which pages are allowed to link to this page, and the Referer field in the request was not in the approved list or was not provided.</p></item>
    <item><p>The software requesting access was not of an approved
type, for example some sites limit access to particular Web
browsers. The TAG emphasizes that the practice of 
denying access to content to someone because of their choice
of user agent is generally counter-productive (e.g., one is likely
to exclude an entire class of user agents such as those running
on millions of mobile devices) and harmful (e.g.,
because it may deny access to users with specialized user agents,
including some users with a disability).</p></item>
</ulist>
</div1>

<div1 id="analogy">
<head>Deep Linking by Analogy</head>

<p>Two analogies have been proposed to help illuminate the question of
deep linking through parallels in the real world.</p>

<p>The first analogy is with buildings, which typically have a number
of doors. A building might have a policy that the public may only
enter via the main front door, and only during normal working
hours. People employed in the building and in making deliveries to it
might use other doors as appropriate. Such a policy would be enforced
by a combination of security personnel and mechanical devices such as
locks and pass-cards. One would not enforce this policy by hiding some
of the building entrances, nor by requesting legislation requiring the
use of the front door and forbidding anyone to reveal the fact that
there are other doors to the building.</p>

<p>The second analogy is with a library, which has a well-known street
address. Each book on the shelves of this library also has an
identifier, composed of its title, author, call number, shelf
location, and so on. The library certainly will exercise access
control to the individual books; but it would be counterproductive to
do so by forbidding the publication of their identities.</p>

<p>These analogies are compelling in the context of the deep linking
issue. A provider of Web resources who does not make use of the
built-in facilities of the Web to control access to a resource is
unlikely to achieve either justice or a good business outcome by
attempting to suppress information about the existence of the
resource.</p>
</div1>

<div1 id="policy">
<head>Resource Access is an Issue of Public Policy</head>

<p>The Web's structure includes facilities to implement nearly any
imaginable set of business policies as regards access control. For
example, access policies based on the "Referer" field could restrict
access to links from a "home page."</p>

<p>Unethical parties could, of course, attempt to circumvent such
policies, for example by programming software to transmit false values
in various request fields, or by stealing passwords, or any number of
other nefarious practices. Such a situation has clearly passed from
the domain of technology to that of policy. Public policy may need to
be developed as to the seriousness of such attempts to subvert the
system, the nature of proof required to establish a transgression, the
appropriate penalties for transgressors, and so on.</p>
</div1>

<div1 id="conclusion">
<head>Conclusion</head>

<p>Attempts at the public-policy level to limit the usage,
transmission and publication of URIs at the policy level are
inappropriate and based on a misunderstanding of the Web's
architecture. Attempts to control access to the resources identified
by URIs are entirely appropriate and well-supported by the Web
technology.</p>

<p>This issue is important because attempts to limit deep linking
are in fact risky for two reasons:</p>

<olist>
<item><p>The policy is at risk of failure.
The Web is so large that any policy enforcement requires considerable
automated support from software to be practical. Since a deep link
looks like any other link to Web software, such automated
support is not practical.</p></item>

<item><p>The Web is at the risk of damage.
The hypertext architecture of the Web has brought substantial benefits
to the world at large. The onset of legislation and litigation based
on confusion between identification and access has the potential to
impair the future development of the Web.</p></item>
</olist>
</div1>

<div1 id="references">
<head>References</head>

<blist>
<bibl id="rfc2396" href="http://www.ietf.org/rfc/rfc2396.txt"
key="RFC2396">T. Berners-Lee, R. Fielding,
L. Masinter. <titleref>Uniform Resource Identifiers (URI): Generic
Syntax</titleref>, RFC2396, August 1998.</bibl>

<bibl id="rfc2616" href="http://www.ietf.org/rfc/rfc2616.txt"
key="RFC2616">R. Fielding, J. Gettys, J. Mogul, H. Frystyk,
L. Masinter, P. Leach, T. Berners-Lee <titleref>Hypertext Transfer Protocol -- HTTP/1.1</titleref>, RFC2616, June 1999.</bibl>

</blist>

</div1>

<div1 id="policies">
<head>Policies</head>

<p>Below is an incomplete list of policies related to deep linking.</p>

<blist>
<bibl id="Bundesgerichtshof"
href="http://juris.bundesgerichtshof.de/cgi-bin/rechtsprechung/document.py?Gericht=bgh&amp;Sort=3&amp;Datum=2003&amp;Art=pm&amp;client=3&amp;Blank=1&amp;nr=26553&amp;id=1058517255.04"
key="Bundesgerichtshof"> <titleref>Internet-Suchdienst für Presseartikel
nicht rechtswidrig</titleref>, Press release from the German Federal
High Court, Nr. 96/2003, July 2003</bibl>
</blist>

</div1>

</body>
</spec>

