Recommendations

This section describes the syntax for "Uniform Resource Locators" (URLs): that is, basically physical addresses of objects which are retrievable using protocols already deployed on the net. The generic syntax provides a framework for new schemes for names to be resolved using as yet undefined protocols.

The syntax is described in two parts. Firstly, we give the syntax rules of a completely specified name; secondly, we give the rules under which parts of the name may be omitted in a well-defined context.

URL syntax

A complete URL consists of a naming scheme specifier followed by a string whose format is a function of the naming scheme. For locators of information on the internet, a common syntax is used for the IP address part. A BNF description of the URL syntax is given in an a later section. The components are as follows. Fragment identifiers and partial URLs are not involved in the basic URL definition.

PrePrefix

To be a Uniform Resource Locator as currently defined by the URI working group, the whole string must start with a constant prefix "URL:". Note that to save space in this document, some URLs may have been quoted throughout without this preprefix.

Scheme

Within the URL of a object, the first element is the name of the scheme, separated from the rest of the object by a colon. The rest of the URL follows the colon in a format depending on the scheme.

Internet protocol parts

Those schemes which refer to internet protocols mostly have a common syntax for the rest of the object name. This starts with a double slash "//" to indicate its presence, and continues until the following slash "/". Within that section are
An optional user name,
if required (as it is with a few FTP servers). The password, is present, follows the user name, separated from it by a colon; the user name and optional password are followed by a commercial at sign "@". The user of user name and passwords which are public is discouraged.
The internet domain name
of the host in RFC1037 format (or, optionally and less advisably, the IP address as a set of four decimal digits)
The port number,
if it is not the default number for the protocol, is given in decimal notation after a colon.
Path
The rest of the locator is known as the "path". It may define details of how the client should communicate with the server, including information to be passed transparently to the server without any processing by the client.
The path is interpreted in a manner dependent on the scheme being used. Generally, the reserved slash "/" character (ASCII 2F hex) denotes a level in a hierarchical structure, the higher level part to the left of the slash.

Encoding prohibited characters

When a system uses a local addressing scheme, it is useful to provide a mapping from local addresses into URLs so that references to objects within the addressing scheme may be referred to globally, and possibly accessed through gateway servers.

Any mapping scheme may be defined provided it is unambiguous, reversible, and provides valid URLs. It is recommended that where hierarchical aspects to the local naming scheme exist, they be mapped onto the hierarchical URL path syntax in order to allow the partial form to be used.

The following encoding method shall be used for mapping WAIS, FTP, Prospero and Gopher addresses onto URLs. Where the local naming scheme uses octet values which are not allowed in the URL, these shall be represented in the URL by a percent sign "%" followed by two hexadecimal digits (0-9, A-F) giving the value for that octet. This specification makes no assumptions or requirements about the character sets, if any, referred to be the (decoded) octets a URL. Character codes other than those allowed by the syntax shall not be used unencoded in a URL.

The same encoding method may be used for encoding characters whose use, although technically allowed in a URL, would be unwise due to problems of corruption by imperfect gateways or misrepresentation due to the use of variant character sets, or which would simply be awkward in a given environment. Because a % sign always indicates an encoded character, a URL may be made safer simply by encoding any characters considered unsafe, while leaving already encoded characters still encoded. Similarly, in cases where a larger set of characters is acceptable, % signs can be selectively and reversibly expanded.

The reserved characters shall however never be arbitrarly encoded and decoded.