HTML and URLs 

Contents

  1. Universal Resource Locators (URLs)
    1. Fragment URLs
    2. Relative URLs
    3. URLs in HTML

The World Wide Web is a network of information resources. The Web relies on three mechanisms intended to make these resources readily available to the widest possible audience:

  1. A single naming scheme, to give access to any resource on the Web in a uniform way (URLs).
  2. Protocols, to enable the exchange of named resources over the Web (HTTP).
  3. Hypertext, for easy navigation among resources (HTML).
In this section of the reference manual, we present (minimal) information about those Web topics that have an impact on HTML.

Universal Resource Locators (URLs) 

Every resource available on the Web --- HTML document, image, video clip, program, etc. --- has an address that may be encoded by a Universal Resource Locator, or "URL" (defined in [RFC1738]).

URLs typically consist of three pieces:

  1. The name of the protocol used to transfer the resource over the Web.
  2. The name of the machine hosting the resource.
  3. The name of the resource itself, given as a path.

Consider the URL that designates the current HTML specification:

http://www.w3.org/TR/WD-html4/cover.html

This URL may be read as follows: Use the HTTP protocol to transfer the data residing on the machine www.w3.org in the file /TR/WD-html4/cover.html

URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive. /u

The character set of URLs that appear in HTML is specified in [RFC1738].

Fragment URLs 

The URL specification en vigeur at the writing of this document ([RFC1738]) offers a mechanism to refer to a resource, but not to a location within a resource. The Web community has adopted a convention called "fragment URLs" to refer to anchors within an HTML document. A fragment URL ends with "#" followed by an anchor identifier. For instance, here is a fragment URL pointing to an anchor named section_2:

http://somesite.com/html/top.html#section_2

Relative URLs 

A relative URL (defined in [RFC1808]) doesn't contain any protocol or machine information, and its path generally refers to an HTML document on the same machine as the current document. Relative URLs may contain relative path components (".." means the parent location) and may be fragment URLs.

Relative URLs may be resolved to full URLs, for example when the user attempts to follow a link from one document to another. [RFC1808] defines the normative algorithm for resolving relative URLs. The following description is for convenience only.

Briefly, a full URL is derived from a relative URL by attaching a "base" part to the relative URL. The base part is a URL that may come from any or all of the following sources:

[RFC1808] specifies the precedence among multiple sources of base information. For the purposes of this explanation, the last piece of base information takes precedence over the others and HTTP headers are considered to occur earlier than the document HEAD.

If no explicit base information accompanies the document, the base URL is that which designates the location of the current document.

Given a base URL and a relative URL (that does not begin with a slash), a full URL is derived as follows:

URLs in HTML 

In HTML, URLs play a role in these situations:

In each case, authors may use a full URL, a fragment URL, or a relative URL. Please consult the section on anchors for more information about links and URLs.

MAILTO URLs 

In addition to HTTP URLs, authors might want to include MAILTO URLs (see [RFC1738]) in their documents. MAILTO URLs cause email to be sent to some email address. For instance, the author might create a link that, when activated, causes the user agent to open a mail program with the destination address in the "To:" field.

MAILTO URLs have the following syntax:

mailto:email-address

User agents may support MAILTO URL extensions that are not yet Internet standards (e.g., appending subject information to a URL with the syntax "?Subject=my%20subject" where any space characters are replaced by "%20").