W3C Web Characterization Activity
Terminology Sheet

Editors
Jim Pitkow, Xerox PARC,
Henrik Frystyk Nielsen, W3C
Last Updated
$Date: 1999/03/23 21:29:44 $ by $Author: frystyk $

This is the terminology used by the W3C Web Characterization Activity (WCA). We don't consider this list exhaustive nor necessarily uncontroversial with respect to other terminology sections although we have tried to reuse the established terms defined by the URI spec, HTTP/1.1, and HTML 4.0 where applicable.

It is important to note that in the general case, Web Characterization, just like the Web itself, is not defined in terms of HTTP and HTML that although widely popular really only are specific examples of how information can be exchanged on the Web. Rather, the Web is defined in terms of resources and relationships between those resources.

See also Yahoo's sections on Web FAQ and tutorials. Feedback should be sent to the WCA mailing list

General Terminology

Resource
A resource can be anything that has identity. Note that we do not determine whether the identity is a name or an address as this is in the eye of the beholder and not an absolute property of the identity. Familiar examples include an electronic document, an image, a service (e.g., "today's weather report for Los Angeles"), and a collection of other resources. In this particular context we only consider resources that are network accessible. The resource is the conceptual mapping to an entity or set of entities, not necessarily the entity which corresponds to that mapping at any particular instance in time. Thus, a resource can remain constant even when its content – the entities to which it currently corresponds – changes over time, provided that the conceptual mapping is not changed in the process.
URI
A sequence of characters with a restricted syntax referencing a resource
Entity
A serialized representation of a resource as it looked at a specific point in time from a specific viewpoint. The relationship between a resource and an entity can be compared to that between a physical object and a photograph of that object: The former may change and look different depending on the viewpoint and point in time - the latter is a representation describing a specific state of the object. Often entities contain information describing themselves (natural language, media type, size, resolution, etc.).
Message
The basic unit of communication between two peers, typically consisting of a structured sequence of octets defined by the message type. Typically messages are carried over a virtual transport layer virtual circuit established between two peers connected via a network.
Request
A message containing an atomic operation to be carried out on a resource identified by a URI. An often used operation is the request for dereferencing (or resolving) a URI identifying a resource, hence creating an entity. This is equivalent to the HTTP GET method, for example.
Response
Zero, one or more messages containing the result of an executed request. Responses may in certain cases be cached by intermediaries along the message path and served by these caches in order to fulfill other requests as long as the response is fresh and the operation allows it.
Server
An application that accepts requests and services these requests by generating responses. Requests can for example be that of resolving a URI but may potentially be any type of operation to be carried out on the resource identified by a URI.
Client
An application that issues requests, either to resolve a URI or to perform some other operation on the resource identified by a URI.
Web page
A collection of information, often organized as a structured document containing textual information, references to resources intended to be represented by value embedded in the Web page, and references to external resources intended to be represented by reference external to the Web page. Which parts are represented by value and which are represented by reference is a function of user preferences and the capabilities of the client rendering the Web page.
Web site
A collection of resources maintained and resolved by the same naming authority (indicated in the absolute URI preceded by a double slash "//" and terminated by the first slash "/" thereafter). Typically, the naming authority is a fully qualified domain name but it may also be an IP or some registry-based naming authority. Naming authorities that can not readily be identified as identical by simple URI canonicalization of the name (add domain name, remove default port numbers and trailing dots ".", etc.), are considered to be independent naming authorities.


Same as before...

Web Client Terminology

User Session
A cohesive set of user requests across one of more Web sites. In the absence of rigorous client side instrumentation (i.e., think aloud protocols, video taping, controlled experimentation), most users session are delineated heuristically, normally via timeouts.
Temporal Session Length
The total amount of time that elapses during the course of a user session.
User Reading Time
The amount of time between user page requests, also referred to as "Active Off Times." In the absence of rigorous client side instrumentation, interruptions and multi-tasking can introduce noise into reading times. To date the amount of noise has not been quantified.
Session Path Length
The total number of clicks that occur during the course of a user session.
Site Session
When a user issues requests from a Web site, also called a "visit."
Site Reading Time
The amount of time between user page requests within a site. As with User Reading Times, interruptions and multi-tasking can introduce an undeterminable amount of noise into reading times.
Site Path Length
The number of clicks within a site. The distribution of path lengths has been model as an inverse Gaussian distribution, which for the typical parameters approximates the lognormal distribution.
Client Request Header Size
The number of bytes in the HTTP headers sent by a client requesting information
Client Request Header Size
The number of bytes sent by a client delivering the content, i.e., the content of PUT
Total Client Request Size
= Content Response + Header Response

Web Server Terminology

Server Response Content Size
The number of bytes transferred by the server delivering the requested content
Server Response Header Size
The number of bytes transferred by the server delivering HTTP headers
Total Server Response Size
= Content Response + Header Response
Domain
Domains are defined by an IP address. All devices sharing a common part of the IP address are said to be in the same domain.

Don't know Yet

Click
The requesting of a URL/hyperlink. A "click" can be accomplished in a myriad of ways, including the selection of a hyperlink embedded in a document, the selection of a hyperlink embedded in a browser interface, or manually typed in via the keyboard. Clicks can occur as a result of a human actively requesting a URL, a human instructing an agent to request the URLs (i.e., offline reading), or by autonomous agents (i.e., spiders and robots).
Click through Rate
How often users select a particular URL/hyperlink. Click through rate is often used to measure how well a Web advertisement is performing
Cookie
Arbitrary information sent by a server to a client, to be stored by the client, and sent back to the server on subsequent requests. Various parameters can be set to control the information including how long the information should reside on the client and which URLs to associate the information. Cookies are often used to enable electronic commerce (shopping baskets), customize content (store preferences), and to track the activity of individual users within a Web site (cookie counting).


Jim Pitkow, Xerox PARC, Henrik Frystyk Nielsen, W3C,
@(#) $Id: Terms.html,v 1.11 1999/03/23 21:29:44 frystyk Exp $