Publishing and Linking on the Web

TAG Draft

This version:: ...
Latest version:: ...
Previous versions:: ...
Editors:: Daniel Appelquist; Jeni Tennison

This document is also available in these non-normative formats: XML.

Abstract

The Web borrows concepts from physical media (the notion of a "page," for example) and overlays them on top of a networked infrastructure (the Internet) and digital presentation medium (the browser software). This mapping is an abstraction to enable the Web user to more easily interact with content and applications. However, when social or legal concepts and frameworks relating documents, publishing and speech are applied to the Web, this abstraction often does not suffice. Publishing a page on the Web is actually fundamentally different from printing and distributing a page in a magazine or book but because the social conventions around these physical media are so strong and have been reinforced through our society for hundreds of years, it is all too tempting to try to apply them to the Web when in fact this application may not be appropriate.

This document was written, in part, because of some legal issues that were raised to the TAG. It does not attempt to answer these legal questions, but rather it seeks to set definitions for terms which could inform future social and legal dialog and opinion around publishing and linking on the Web.

Introduction
1. Status
Publishing
Techniques
1. Originators
2. Websites
Conclusions
References
Appendix A: Linking Methods
1. Linking by Reference
2. Including
  1. Exploiting
  2. Transcluding

Introduction

The act of viewing a web page is a complex interaction between a user's browser and any number of web servers. Unlike giving someone a book, say, viewing a web page is an act of copying: the data held on the servers is copied onto the user's computer. The page itself may cause more copying to take place — of images, videos and other files, perhaps from other servers — without the user's explicit knowledge or consent.

Intermediaries such as proxies and services that combine and repackage data from other sources may also retain copies of this material, due to the user's original request for the page. These intermediaries may transform, translate or rewrite some of the material that passes through them, to enhance the user's experience of the web page.

Still other services on the web, such as search engines and archives, make copies of content as a matter of course, to provide value to their users and to the original authors of the web page (as it enables the content to be found more easily).

Licenses that describe how material may be copied and altered by others tend not to take account of this complexity, for example to distinguish between a proxy compressing a web page to make it load faster and someone editing and republishing the page on their own website. To illustrate, the Creative Commons Attribution-NoDerivs [CCAND] defines the terms:

Adaptation

means a work based upon the Work, or upon the Work and other pre-existing works, such as a translation, adaptation, derivative work, arrangement of music or other alterations of a literary or artistic work, or phonogram or performance and includes cinematographic adaptations or any other form in which the Work may be recast, transformed, or adapted including in any form recognizably derived from the original, except that a work that constitutes a Collection will not be considered an Adaptation for the purpose of this License. For the avoidance of doubt, where the Work is a musical work, performance or phonogram, the synchronization of the Work in timed-relation with a moving image ("synching") will be considered an Adaptation for the purpose of this License.

Distribute

means to make available to the public the original and copies of the Work through sale or other transfer of ownership.

Reproduce

means to make copies of the Work by any means including without limitation by sound or visual recordings and the right of fixation and reproducing fixations of the Work, including storage of a protected performance or phonogram in digital form or other electronic medium.

Similarly, legislation that governs the possession and distribution of illegal material (such as child pornography) often needs to exempt certain types of services, such as caching or hosting, as it would be impractical for the people running those services to police all the material that passes through their servers. An example of legislation that does this is the Coroners and Justice Act 2009 Schedule 13; from the Explanatory Notes:

Paragraphs 3 to 5 of [Schedule 13] provide exemptions for internet service providers from the offence of possession of prohibited images of children in limited circumstances, such as where they are acting as mere conduits for such material or are storing it as caches or hosts.

Examples of the kind of legal questions that have arisen are:

Is it illegal to link to a page that contains illegal material? Is it illegal to embed illegal material within a page? See http://act.demandprogress.org/sign/dhscomplaint/.
Does embedding an image that you do not have a license to copy within a web page constitute a copyright infringement? Does creating a thumbnail version of that image constitute a copyright infringement? Perfect 10, Inc. v. Amazon.com, Inc.
If a proxy automatically rewrites scripts to combine and compress them, changes markup in the page or compresses images, are these classed as adaptations?
If a browser uses an online service to translate the text of a web page, is this classed as an adaptation?

There are many other examples on the Wikipedia page on Copyright aspects of hyperlinking and framing.

This document does not aim to address whether particular activities on the web are illegal or legal; this is outside the scope of the TAG. Instead, it aims to:

provide an explanation of the way that material is published on the web to help inform people writing licenses and legislation
provide definitions of terms related to web publishing and linking that may be useful within licenses and legislation
describe the technical measures that websites can take to back up any restrictions that they place on the use of content they make available on the web
describe the mechanisms by which websites that reuse material can ensure they meet the restrictions on the use of that material, for example through attribution

Status

This is a draft document created to aid discussion of these issues by the W3C Technical Architecture Group. It has not been reviewed or approved by anyone.

Publishing

The concept of publishing on the web has evolved as the web's ecosystem has enlarged and diversified, and as the capabilities of browsers and the web standards that they implement have developed. There is no single definition of what publishing on the web means. Instead there are a number of activities that could be viewed as publication or distribution, or something else. This section describes each of these activities and how they work.

Hosting

The basic form of publication on the web is hosting. A server hosts a representation if it stores the representation on disk or generates the representation from data that it stores, and that representation did not (to the server's knowledge) originally come from elsewhere on the web.

The presence of data on a server does not necessarily mean that the organisation that owns and maintains the server has an awareness of that data being present. Many websites are hosted on shared hardware that is owned by a service provider that stores and serves data for other controlling individuals and organisations which determine the data they provide on the site. Because of this, multiple servers may host the same representation at different URIs. For example, an artist could upload the same image to multiple servers, which then store the image and serve it to others.

There are many different types of service provider. Some may exercise practically no control over the software and data that they host but provide hardware on which code can run. Others may focus on particular types of content, such as images (eg Flickr) or videos (eg YouTube). There may be many service providers involved in the publication of data on the web: some providing hardware, others providing different kinds of publishing support.

Service providers that host particular types of material often employ automatic filters to prevent the publication of illegal material, but it is often impossible for a service provider to detect and filter out everything. As well as holding a copy of the representation, they may also automatically perform transformations on it, as a service, such as converting to alternative formats, clipping or resizing. If illegal material is not successfully filtered out, automatic processing (including the transformation) of files will still take place.

To add to the complexity, it is possible for each of the following to be in different jurisdictions:

the individual or organisation who controls the data
the service provider(s)
the physical servers that host the data

Copying and Distributing

Some servers provide access to representations that are hosted elsewhere on the web (on an origin server). These representations might be stored on the server and provided again at a later time, in which case for the purposes of this document it is termed a copier, or might simply pass through the server in response to a request, in which case for the purposes of this document it is termed a distributor.

It is usually impossible to tell whether a server is providing a stored response or has made a new request to an origin server and is serving the results of that request. Servers commonly store the results of some requests and not others, acting as a copier some of the time and as a distributor the rest.

In both cases, the representation the copying or distributing server provides may be different from the original one that it has accessed from the origin server. For example:

links within an HTML page may be rewritten so that they point to pages that are also served by the copier or distributor
Javascript and CSS files may be combined and compressed to provide speedier access
banners may be added within an HTML page to highlight that the representation is a copy of an original somewhere else
files may be compressed or converted to different formats
wholly new documents may be created that bring together information from multiple different sources

Copiers and distributors are extremely useful on the web. There are four main types of copiers and distributors discussed here: proxies, archives, search engines and reusers. The distinctions between them are summarised in the table below.

	proxy	archive	search engine	reuser
purpose	increase network performance	maintain historical record	locate relevant information	better understand information
refreshing	based on HTTP headers	never	variable	based on HTTP headers
retrieval	on demand	proactive	proactive	usually on demand
URI use	usually uses same URI	uses new URI	uses new URI	uses new URI

Proxies

A proxy is an application that sits in between a server (such as a website) and a client (such as a browser). On the web, caching proxies are often used to speed up users' experience of the web by copying pages and other resources onto the proxy server so that they can be accessed more quickly, directly from the proxy rather than from the origin server. Some proxies, particularly those used by mobile operators, perform other actions on the content that passes through them, such as rewriting or merging Javascript to make it faster to access.

Proxies come in four general flavours:

Forward proxies serve a given community, such as a company. The browsers of the members of that community are set up to use the proxy, and whenever they fetch a resource, the request is routed through the proxy. If the proxy already has a cached copy of the resource, it returns that copy rather than forwarding the request on to the origin server. If many people within the community are requesting the same pages, this can speed up their access and lower the bandwidth use of the community. The proxy can also be used to prevent access to unsuitable sites and to carry out virus checking on the content.
Reverse proxies sit in front of a (private) server and cache responses from that server. This can reduce the load on the server and speed up responses, which is particularly important when the response to a request takes time to compute. An extreme version of this is a content-delivery network (CDN) which operates a set of proxies and directs requests to the nearest of these, saving transmission time as well as processing time.
Gateways are proxies which usually do not cache the result of a request: they simply pass on the request to an origin server and pass the response back. Gateways may be used to prevent access to set up barriers to particular parts of the web, or to bypass those blocks (for example, gateways may be used to route around IP-based blocks as the IP address of the gateway is different from the IP address of the original browser).
Transforming proxies are proxies which transform the Web content being delivered through them (often these are used to compress content, for example) [reference to Mobile Web Best Practices working group work on best practices for transforming proxies].

The use of a forward proxy, gateway or transforming proxy may be configured either on an individual machine or transparently for a particular network. Users may have no idea that their requests are channelled through a given proxy, or they may have configured their set-up to use the proxy.

Reverse proxies appear to be normal servers to users: it is impossible for a user to tell that their request is actually passed on to a completely different origin server, or where that server is. This is intentional as the origin server in this case is a private one.

To improve performance, some proxies, particularly CDNs, may pre-fetch resources that a page includes, since these resources are likely to be requested by the browser soon after the page is viewed. In other words, although generally the contents of a proxy's cache will be determined by the requests that users of that proxy have made, the proxy might also in some cases contain content that no one has ever requested.

Search Engines

Search engines aim to catalog and provide access to as many web pages as they can, so that they can direct users to appropriate information in response to a search. They use crawlers to fetch pages and other resources from the web, analyse them and store them on their own servers to support further analysis.

Search engines are most interested in indexing resources and providing links to them rather than in the content of the resource itself. They might not copy the page itself, but they always store metadata about the page, derived from it and other information on the web.

Search engines play an important role in the web in enabling people to find information, including that which would otherwise be lost or is temporarily unavailable. When a user views a stored page from a search engine, it is usually obvious both that the search engine is involved (from the URI of the page and from banners or framing), that the content originally came from somewhere else, and where it came from. The links within the page are not usually rewritten.

Reusers

Data reuse is becoming more prevalent as web servers act as services to others. A server that is a reuser fetches information from one or more origin servers and either provides an alternative URI for the same representation or adds value to it by reformatting it or combining it with other data. Good examples are the BBC Wildlife Finder, which incorporates information from Wikipedia, Animal Diversity Web and other sources or triplr.org, which converts RDF data from one format to another as a service.

Reusers that do not change the information from the origin server may be used to simplify access to the origin server (by mapping simple URLs to a more complex query) or to provide a route around the same-origin policy (as servers are not limited in where they access resources from).

Since reused information is, by design, seamlessly integrated into a page that is served from the reuser, people viewing that page will not generally be aware that the information originates from elsewhere. The URIs used for the pages will be those of the reuser, for example. The Techniques section will describe how a reuser can indicate the sources of the information on such page for both humans and computers.

Aliasing

An alias is a URI that points the browser to another URI on an origin server. A server can automatically redirect a browser using a HTTP 3XX status code and a Location header. Web pages from a server can do the same thing using a <meta> element with an http-equiv attribute set to Refresh; this technique is often used with a slight delay to indicate to the user that they are beign redirected to another page.

Aliases do not involve any of the information from the origin server passing or being stored by the redirecting server, but the redirecting server will be able to record when a particular URI is requested.

Although it is preferable to only have one URI for a particular resource, redirections are a useful mechanism for managing change on the web. They are used within websites when the structure of the website changes, or between websites when a new website is created that supersedes the first, or to archived information when a host no longer wants to provide access to a representation itself.

Redirections are also used to provide other services. Link shorteners provide a short URI for a resource that is then redirected to the canonical URI, and are useful in locations where space is limited such as in print or on Twitter. Depending on their implementation, link-tracking services can use a similar technique to enable servers to analyse which links are followed from their site: the link tracker records the request and redirects the user to the true target page.

When aliasing is used, users may not be aware about the eventual target of a link, or the involvement of an aliasing server, both of which are important. Shortened links, for example, hide the target location behind a URI that often has no visible relationship to the eventual destination of the page. Some implementations of link tracking do not change the origin URI in the link (such that the status bar on a browser shows the eventual target of the page) but instead use the onclick event to direct the user to the intermediate server.

Following a redirection, browsers change the address bar to the new location, but this is often the only indication, so users may or may not be aware of this happening.

Including

Web pages typically rely on many resources other than the HTML in which the page is written, such as images, video, scripts, stylesheets, data and other HTML. The HTML in a web page refers to these external resources in markup, for example, an <img> element uses the src attribute to reference an image which should be shown within the page.

HTML supports several different ways of including other resources in a page, which are listed in Appendix A: Linking Methods, but they all work in basically the same way. When a user navigates to a web page, the browser typically automatically fetches all the included resources into its local cache and executes them or displays them within the page.

Inclusion is different from caching or distributing a representation because the information is never stored on, nor passes through, the server that hosts the web page doing the including. As such, although the included resources are an essential component of the page to make it appear and function as a whole, the server of the web page does not have control over their content.

It is useful to distinguish between two general types of inclusion:

transclusion embeds some content, such as an image, video or other HTML, into a web page such that it is visible to the user of the page
exploitation uses a resource, such as a stylesheet or script, in a way that may affect the presentation or behaviour of the page but does not include additional content; other examples are prefetched resources, offline applications and hidden resources included for website analytics

Users are not typically aware that exploited resources are used within a page at all. While transcluded resources are visible to a user, it won't be clear that an image or video is from a third-party website rather than the website that they are visiting unless this is explicitly indicated within the content of the page.

A resource that is included into a popular page causes a large number of requests to the server on which the resource is published, which can be burdensome to the third-party. Publishers who intend their resources to be reused in this way therefore typically have terms and conditions that apply to the reuse of those resources and may have to put in place technical barriers to prevent it.

As with normal links, included resources may or may not have the same origin as the page that includes them. Resources such as images and scripts that are embedded within the web page may be from any site. However, browsers implement a same-origin policy which generally means that third-party resources cannot be fetched and processed by scripts running on the page, for example through XMLHttpRequests [XHR] (though typically these scripts can write markup into the page which embeds such resources).

Inclusion Chains

When scripts or HTML are included into web pages, the included resource may itself include other resources (which may include still more and so on). The author of the original web page has control over which resources it includes, but will not have control over which resources they go on to include.

Whatever level of remove from the origin page, the publishers of included resources may change the content of those resources at any time, possibly without warning. This has been used in cases where websites included third-party images without permission, to substitute the image with something distasteful or to redirect to a link that performed an action on the user's behalf; see Preventing MySpace Hotlinking.

Hidden Requests

Some of the resources that are included within a page may be invisible to the user. An example is a hidden image that is used for tracking purposes: each time a user navigates to the page, the hidden image is requested; the server uses the information from the request of the image to build a picture of the visitors to the site.

This facility can be used for malicious purposes. An <img> element can point to any URI (not just an image) and causes a GET request on that resource. If a website has been constructed such that GET requests cause an action to be carried out (such as logging out of a website), a page that includes this "image" will cause the action to take place.

Linking

Linking is a fundamental notion for the web. HTML pages use <a> elements to insert links to other pages on the web, with the href attribute holding the URI for the linked page. Some of the links will be to be pages from the same origin; others will be cross-origin links to pages on third-party's sites that hold related information.

A user can usually tell where a link is going to take them prior to selecting (clicking, tapping, etc...) it through the browser UI (e.g. by "mousing over" it) or after the link is selected through the status bar in the browser, although some links are overridden by onclick event handling that takes them to a different location. Some websites, such as Wikipedia, use icons to indicate when a link is a cross-origin link and when it will take a user to a page on the same server. The use of intersticial pages or dialog boxes which warn the user they are about to leave the site in question can obscure the eventual destination of the link, as discussed in the section on Aliasing, above.

If the link is a cross-origin link (or even in some cases where it is an internal link), the publisher of the origin page will have no control over the content or access policies of the linked page. These are the responsibility of the publisher of that page; the TAG Finding on "Deep Linking" in the World Wide Web [DEEPLINKING] describes the ways in which publishers can control access to their pages and the fundamental principle that addressing (linking to) a page is distinct from accessing it.

Traditionally, a user must take a specific action in order to navigate to the linked page, such as by clicking on the link or selecting it with a keystroke or a voice command. In these cases, the linked page cannot be accessed without the user's knowledge and consent. When the linked page is accessed, the address bar in the browser changes which confirms to the user that they are now on a different website. Before clicking, the status bar of the browser shows the location of the page that is being linked to; again this is something that the user can check before they navigate to a page.

Legal issues related to linking

As described above, in the context of architecture of the the World Wide Web, in linking from one page to another, a Web page author is refering to a part of the linked-to Web site or service. These links, as accessed through and processed by Web browsers are designed to be public identifiers. The existence of the link does not imply the right to access, and Web sites are free to use any one of many access control techniques to restrict access. Hence, linking is a “speech act”. It is the opinion of the TAG that linking should therefore enjoy the same protections enjoyed by any other type of protected speech.

Freedom of expression is a right enshrined under Article 19 of the Universal Declaration of Human Rights. Individual countries that adhere to this idea of freedom of expression implement it in differing terms. It is the TAG's opinion that this definition of expression includes the right to link.

The above is probably too strong but is intended as a placeholder for what we could helpfully say about the "link" between linking and freedom of expression.

Linking Out of Control

There are three practices used by some sites that mean that users do not necessarily have control over whether a link is followed:

Browsers can be made to navigate to another page using scripted navigation, which may simply run automatically (navigating the user to another page after a period of time, for example) and can hide the location of a link, such that users don't know where they will be navigating to. The HTML5 history API [HTML5] enables a script to change the address bar, which can mean that the address bar does not reflect the actual location of the page (although this use is limited to locations that have the same origin as the original page).
Browsers might pre-fetch resources that seem likely to be visited next, so that the target page is loaded more quickly when the user accesses it. A page can indicate which links should be pre-fetched using the prefetch link relation in a link. For example, a page might indicate that the first first result in a list of search results should be fetched before the user actually navigates the link.
Browsers may support offline web applications [HTML5] which direct browsers to a manifest that lists files that the browser then downloads so that the web application can be used when an internet connection is not available.

TODO: Are users informed about this in browsers that implement it?

Is Appcache a kind of link? The manifest must follow same origin policy so may not apply

It should be noted that even in the case where a link is navigated to or pre-fetched without user control, the Web site containing the linked-to page can still restrict or deny access.

Terminology

This section lists terminology used within this document.

representation: a file or object that can be retrieved on the Web, e.g. a Web page, an image, a video; defined in Architecture of the World Wide Web, Volume One [WEBARCH]
upload: to put a file on a server such that it is given an address on the web
host: to store a file and provide access to it on the web
???: to store a copy of a file that is hosted by another server
distribute: to provide access to a file hosted by another server without keeping a copy
alias: to give an alternative address for a file an redirect requests to the main address
include: to use another file, such as an image or a script, within a web page
link: to include a pointer to the address of a file within a web page

To be completed!

Techniques for Restricting Access

The description above about how information is published on the web highlights how hard it can be for end users (both human and machine) to be aware of the original source of content on the web, and the ways in which it may have been changed en route to them. It also shows that the originators of content need to be clear about how that content can be used elsewhere, both in human-readable prose and in the technical barriers that they put up that limit access.

Originators

Once material is put on the public web (that is, on the internet and unprotected by authentication barriers), it is impossible to completely limit how that material is used through technical means — HTTP headers can be faked, metadata can be ignored. However, there are a number of standard techniques that originators can use to indicate how they intend their material to be used, which good-faith reusers should pay attention to.

Controlling Access

Publishers can control access to resources that are unprotected by authentication through HTTP, by refusing or redirecting connections to particular resources based on:

the Referer HTTP header; this is useful for preventing linking to particular resources from outside a website, or preventing the inclusion of a resource in another website
the User-Agent HTTP header; this is particularly useful for preventing access from crawlers
the domain name or IP address of the client making the connection; this may be useful to prevent specific reusers from accessing material

TODO: good practice not to use GET for resources that cause actions to take place, and to check the Referer in these cases (prevent your site being manipulated through embedded "images" on other sites)

Controlling Inclusion

As well as the techniques above, which can be used to control any access to pages, it's also possible to provide additional control over the inclusion of resources in a third-party's web pages.

In the case of HTML pages, publishers can include a script that checks whether the document is the top document in the window, to prevent it from being embedded within a frame.

The Cross-Origin Resource Sharing Working Draft [CORS] defines a set of HTTP headers that can be used to give the publisher of the third-party resource greater control over access to their resources. These are usually used to open up cross-origin access to resources that publishers want to be reused, such as JSON or XML data exposed by APIs, by indicating to the browser that the resource can be fetched by a cross-origin script.

Note: A new From-Origin or Embed-Only-From-Origin HTTP header is also currently under discussion by the Web Applications Working Group and described within the Cross-Origin Resource Embedding Restrictions Editor's Draft [CORER]. This would enable publishers to control which origins are able to embed the resources they publish into their pages.

Controlling Caching

There are a number of HTTP headers [HTTP] that enable content providers to indicate whether a proxy should cache a given representation and for how long it should keep the copy. These are described in detail within Section 13: Caching in HTTP [HTTP]. For example, a server can use the HTTP header Cache-Control: no-store to indicate that a particular resource should not be cached by an intermediate server.

Publishers of websites can also indicate which pages should not be fetched or indexed by any search engine or archive through robots.txt [ROBOTS] and the robots <meta> element [META]. They can indicate other characteristics of web pages, such as how frequently they might change and their importance on the website, through sitemaps [SITEMAPS]. More sophisticated publishers may use the Automated Content Access Protocol (ACAP) extensions [ACAP] to attempt to indicate access policies.

Controlling Processing

The Cache-Control: no-transform HTTP header indicates that an intermediate server must not change the original representation, nor the headers:

Content-Encoding
Content-Range
Content-Type

For example, an intermediate server must not convert a TIFF served with Cache-Control: no-transform into a JPG, nor should it rewrite links within an HTML page.

Licensing

TODO: Talk about terms & conditions pages, rel="license" and RDFa/microdata methods of indicating the sources and licenses of information on a page.

Websites

This section describes the techniques that you should use when operating a website that incorporates material from other sources.

TODO: Computer-readable and human-readable attribution
TODO: Talk about honouring Cache-Control headers
TODO: Maybe talk about browser behaviour wrt Cache-Control headers; some always request afresh.
TODO: Are there any headers that proxies might/should add to indicate they are serving a cached resource? a resource from another location?

TODO: link shortening services running on your own site
TODO: rel="canonical"
TODO: incorporating link tracking

Conclusions

TODO

References

[ACAP]: http://www.the-acap.org/Files/84/84532e47-1160-46c1-adcc-30e76757a084.pdf

Cross-Origin Resource Embedding Restrictions
Editor's Draft 28 February 2011
http://dvcs.w3.org/hg/from-origin/raw-file/tip/Overview.html
Anne van Kesteren (Opera Software ASA) <annevk@opera.com>

[CORER]: http://dvcs.w3.org/hg/from-origin/raw-file/tip/Overview.html

Cross-Origin Resource Sharing
W3C Working Draft 27 July 2010
http://www.w3.org/TR/2010/WD-cors-20100727/
http://www.w3.org/TR/cors/
Anne van Kesteren (Opera Software ASA) <annevk@opera.com>

[CORS]: http://www.w3.org/TR/cors/

"Deep Linking" in the World Wide Web
TAG Finding 11 Sep 2003
http://www.w3.org/2001/tag/doc/deeplinking-20030911
Tim Bray, Antarctica Systems <tbray@textuality.com>

[DEEPLINKING]: http://www.w3.org/2001/tag/doc/deeplinking-20030911

HTML5
A vocabulary and associated APIs for HTML and XHTML
W3C Working Draft 13 January 2011
http://www.w3.org/TR/2011/WD-html5-20110113/
http://www.w3.org/TR/html5/
Ian Hickson, Google, Inc.

[HTML]: http://www.w3.org/TR/html5/

Hypertext Transfer Protocol -- HTTP/1.1
Request for Comments: 2616
R. Fielding
UC Irvine
J. Gettys
Compaq/W3C
J. Mogul
Compaq
H. Frystyk
W3C/MIT
L. Masinter
Xerox
P. Leach
Microsoft
T. Berners-Lee
W3C/MIT
June 1999

[HTTP]: http://tools.ietf.org/html/rfc2616

The Web Robots Pages
About the Robots <META> tag

[META]: http://www.robotstxt.org/meta.html

The Web Robots Pages
About /robots.txt

[ROBOTS]: http://www.robotstxt.org/robotstxt.html

[SITEMAPS]: http://sitemaps.org/protocol.php

Architecture of the World Wide Web, Volume One
W3C Recommendation 15 December 2004
http://www.w3.org/TR/webarch/
Ian Jacobs, W3C
Norman Walsh, Sun Microsystems, Inc.
[WEBARCH]: http://www.w3.org/TR/webarch/

XMLHttpRequest
W3C Candidate Recommendation 3 August 2010
http://www.w3.org/TR/2010/CR-XMLHttpRequest-20100803/
http://www.w3.org/TR/XMLHttpRequest/
Anne van Kesteren (Opera Software ASA) <annevk@opera.com>

[XHR]: http://www.w3.org/TR/XMLHttpRequest/

Appendix A: Linking Methods

Linking by Reference

a
link

Including

Exploiting

script
style
prefetching
offline applications

Transcluding

img
frames
iframes
object
AJAX

Publishing and Linking on the Web

TAG Draft

Abstract

Contents

Introduction

Status

Publishing

Hosting

Copying and Distributing

Proxies

Archives

Search Engines

Reusers

Aliasing

Including

Inclusion Chains

Hidden Requests

Linking

Legal issues related to linking

Linking Out of Control

Terminology

Techniques for Restricting Access

Originators

Controlling Access

Controlling Inclusion

Controlling Caching

Controlling Processing

Licensing

Websites

Conclusions

References

Appendix A: Linking Methods

Linking by Reference

Including

Exploiting

Transcluding