Web Characterization Activity
Characterization Metrics

Jim Pitkow, Xerox PARC

Last Updated:
Friday, December 18, 1998

This document contains a taxonomy of WWW specific metrics that can be characterized. The metrics are broken into the following areas:


Metrics for Client and Proxy Characterizations



Basic Facts

Classification of users (educational, home, ISP, or corporate)

Access method of users (LAN, modem, mobile, or wireless)

Users, response rate, and attrition rate.

Date and duration of study

Description and rational for any cleaning, filtering, etc. of data

Sampling methodology


Entire Study¾ User Centric

Files transferred per user

Unique files transferred per user

Pages transfer per user

Unique pages transfer per user

Sites visited per user

Unique sites visited per user

Reoccurrence rates for files, pages, and sites per user

Entire Study¾ Web Centric

Embedded images per page

Mime-type percentage breakdown (e.g., html, jpg, ps, etc.)

Protocol percent breakdown (e.g., http, shttp, gopher, etc.)

Hyperlinks per page

Sessions¾ General

Sessions per user

Sessions¾ Temporal

Length of sessions per user

Inter-session time per user (session to session time)

Sessions¾ Paths

Length of sessions per user

Stack distance per user

Per Session¾ Temporal

Inter-request time per user (request to request time)

Intra-request time per user (request to render time)

Length of visit per site per user

Per Session¾ Paths

Length of visit per site per user

Metrics for Server Characterizations

Please referrer to the following paper for the source of many of the metrics:

Manley, S. and Seltzer, M. (1997). Web Facts and Fantasy.
Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems,
Monterey, CA, December 1997.



Basic Facts

Domain classification of server (.com, .edu, etc.)

Description of content contained on site

Cost to access material on site (free, pay-for-view, etc.)

Type of service provider (single server, virtual hosting, etc.)

Birth and modification history of server (major revisions of content)

Date and duration of study

Description and rational for any cleaning, filtering, etc. of data

Site Composition (one month)

Number of users

Number of files and page requests per user

Number of search engine hits

Number of files serviced

Number of pages serviced

Number of CGI/dynamic content serviced

Bytes transferred

Byte latency

Total number of files on server

Total number of pages on server

Documents by Traffic graph ( x% documents account for y% of traffic)

Growth Rates

Number of users

Number of files and page requests per user

Number of files serviced

Number of pages serviced

Number of CGI/dynamic content serviced

Bytes transferred

Byte latency

Number of files on server

Number of bytes on server

Doubling period for all of the above metrics


Entire Server - User Centric

Files transferred per user

Unique files transferred per user

Pages transfer per user

Unique pages transfer per user

Reoccurrence rates for files and pages per user (assumes longitudinal tracking capabilities)

Entire Server¾ Web Centric

Embedded images per page

Mime-type percentage breakdown (e.g., html, jpg, ps, etc.)

Hyperlinks per page

Longitudinal Sessions¾ General

Sessions per user

Longitudinal Sessions¾ Temporal

Length of sessions per user

Inter-session time per user (session to session time)

Longitudinal Sessions¾ Paths

Length of sessions per user

Per Session¾ Temporal

Inter-request time per user (request to request time)

Intra-request time per user (request to render time)

Length of visit at site per user

Per Session¾ Paths

Stack distance per user

Length of visit at site per user

Metrics for WWW Characterizations

To be completed