W3C Web Characterization Activity
Terminology Sheet
- Editors
-
Jim Pitkow, Xerox PARC,
Henrik Frystyk Nielsen, W3C
- Last Updated
-
$Date: 1999/03/23 21:29:44 $ by $Author: frystyk $
This is the terminology used by the W3C Web
Characterization Activity (WCA). We don't consider this list exhaustive
nor necessarily uncontroversial with respect to other terminology sections
although we have tried to reuse the established terms defined by the URI
spec, HTTP/1.1,
and HTML 4.0 where
applicable.
It is important to note that in the general case, Web Characterization,
just like the Web itself, is not defined in terms of HTTP and HTML that
although widely popular really only are specific examples of how information
can be exchanged on the Web. Rather, the Web is defined in terms of resources
and relationships between those resources.
See also Yahoo's
sections on Web FAQ and tutorials. Feedback should be sent to the WCA mailing list
- Resource
-
A resource can be anything that has identity. Note that we do not determine
whether the identity is a name or an address as this is in the eye of the
beholder and not an absolute property of the identity. Familiar examples
include an electronic document, an image, a service (e.g., "today's weather
report for Los Angeles"), and a collection of other resources. In this
particular context we only consider resources that are network accessible. The
resource is the conceptual mapping to an entity or set of entities, not
necessarily the entity which corresponds to that mapping at any particular
instance in time. Thus, a resource can remain constant even when its content –
the entities to which it currently corresponds – changes over time, provided
that the conceptual mapping is not changed in the process.
- URI
-
A sequence of characters with a restricted syntax referencing a resource
- Entity
-
A serialized representation of a resource as it looked at a specific point in
time from a specific viewpoint. The relationship between a resource and an
entity can be compared to that between a physical object and a photograph of
that object: The former may change and look different depending on the
viewpoint and point in time - the latter is a representation describing a
specific state of the object. Often entities contain information describing
themselves (natural language, media type, size, resolution, etc.).
- Message
-
The basic unit of communication between two peers, typically consisting of a
structured sequence of octets defined by the message type. Typically messages
are carried over a virtual transport layer virtual circuit established between
two peers connected via a network.
- Request
-
A message containing an atomic operation to be carried out on a resource
identified by a URI. An often used operation is the request for dereferencing
(or resolving) a URI identifying a resource, hence creating an entity. This is
equivalent to the HTTP GET method, for example.
- Response
-
Zero, one or more messages containing the result of an executed request.
Responses may in certain cases be cached by intermediaries along the message
path and served by these caches in order to fulfill other requests as long as
the response is fresh and the operation allows it.
- Server
-
An application that accepts requests and services these requests by generating
responses. Requests can for example be that of resolving a URI but may
potentially be any type of operation to be carried out on the resource
identified by a URI.
- Client
-
An application that issues requests, either to resolve a URI or to perform
some other operation on the resource identified by a URI.
- Web page
-
A collection of information, often organized as a structured document
containing textual information, references to resources intended to be
represented by value embedded in the Web page, and references to external
resources intended to be represented by reference external to the Web page.
Which parts are represented by value and which are represented by reference is
a function of user preferences and the capabilities of the client rendering
the Web page.
- Web site
-
A collection of resources maintained and resolved by the same naming authority
(indicated in the absolute URI preceded by a double slash "//" and terminated
by the first slash "/" thereafter). Typically, the naming authority is a fully
qualified domain name but it may also be an IP or some
registry-based naming authority. Naming authorities that can not readily
be identified as identical by simple URI canonicalization of the name (add
domain name, remove default port numbers and trailing dots ".", etc.), are
considered to be independent naming authorities.
Same as before...
- User Session
-
A cohesive set of user requests across one of more Web sites. In the absence
of rigorous client side instrumentation (i.e., think aloud protocols, video
taping, controlled experimentation), most users session are delineated
heuristically, normally via timeouts.
- Temporal Session Length
-
The total amount of time that elapses during the course of a user session.
- User Reading Time
-
The amount of time between user page requests, also referred to as "Active Off
Times." In the absence of rigorous client side instrumentation, interruptions
and multi-tasking can introduce noise into reading times. To date the amount
of noise has not been quantified.
- Session Path Length
-
The total number of clicks that occur during the course of a user session.
- Site Session
-
When a user issues requests from a Web site, also called a "visit."
- Site Reading Time
-
The amount of time between user page requests within a site. As with User
Reading Times, interruptions and multi-tasking can introduce an undeterminable
amount of noise into reading times.
- Site Path Length
-
The number of clicks within a site. The distribution of path lengths has been
model as an inverse Gaussian distribution, which for the typical parameters
approximates the lognormal distribution.
- Client Request Header Size
-
The number of bytes in the HTTP headers sent by a client requesting
information
- Client Request Header Size
-
The number of bytes sent by a client delivering the content, i.e., the content
of PUT
- Total Client Request Size
-
= Content Response + Header Response
- Server Response Content Size
-
The number of bytes transferred by the server delivering the requested content
- Server Response Header Size
-
The number of bytes transferred by the server delivering HTTP headers
- Total Server Response Size
-
= Content Response + Header Response
- Domain
-
Domains are defined by an IP address. All devices sharing a common part of the
IP address are said to be in the same domain.
Don't know Yet
- Click
-
The requesting of a URL/hyperlink. A "click" can be accomplished in a myriad
of ways, including the selection of a hyperlink embedded in a document, the
selection of a hyperlink embedded in a browser interface, or manually typed in
via the keyboard. Clicks can occur as a result of a human actively requesting
a URL, a human instructing an agent to request the URLs (i.e., offline
reading), or by autonomous agents (i.e., spiders and robots).
- Click through Rate
-
How often users select a particular URL/hyperlink. Click through rate is often
used to measure how well a Web advertisement is performing
- Cookie
-
Arbitrary information sent by a server to a client, to be stored by the
client, and sent back to the server on subsequent requests. Various parameters
can be set to control the information including how long the information
should reside on the client and which URLs to associate the information.
Cookies are often used to enable electronic commerce (shopping baskets),
customize content (store preferences), and to track the activity of individual
users within a Web site (cookie counting).
Jim Pitkow, Xerox PARC, Henrik Frystyk Nielsen, W3C,
@(#) $Id: Terms.html,v 1.11 1999/03/23 21:29:44 frystyk Exp $