Web Characterization Activity
Characterization Metrics

Editor:
Jim Pitkow, Xerox PARC

Last Updated:
Friday, December 18, 1998


This document contains a taxonomy of WWW specific metrics that can be characterized. The metrics are broken into the following areas:

 

Metrics for Client and Proxy Characterizations

Realm

Metric


Basic Facts

Classification of users (educational, home, ISP, or corporate)

Access method of users (LAN, modem, mobile, or wireless)

Users, response rate, and attrition rate.

Date and duration of study

Description and rational for any cleaning, filtering, etc. of data

Sampling methodology


Distributions

Entire Study User Centric

Files transferred per user

Unique files transferred per user

Pages transfer per user

Unique pages transfer per user

Sites visited per user

Unique sites visited per user

Reoccurrence rates for files, pages, and sites per user

Entire Study Web Centric

Embedded images per page

Mime-type percentage breakdown (e.g., html, jpg, ps, etc.)

Protocol percent breakdown (e.g., http, shttp, gopher, etc.)

Hyperlinks per page

Sessions General

Sessions per user

Sessions Temporal

Length of sessions per user

Inter-session time per user (session to session time)

Sessions Paths

Length of sessions per user

Stack distance per user

Per Session Temporal

Inter-request time per user (request to request time)

Intra-request time per user (request to render time)

Length of visit per site per user

Per Session Paths

Length of visit per site per user


Metrics for Server Characterizations

Please referrer to the following paper for the source of many of the metrics:

Manley, S. and Seltzer, M. (1997). Web Facts and Fantasy.
Proceedings of the 1997 USENIX Symposium on Internet Technologies and Systems,
Monterey, CA, December 1997.

Realm

Metric


Basic Facts

Domain classification of server (.com, .edu, etc.)

Description of content contained on site

Cost to access material on site (free, pay-for-view, etc.)

Type of service provider (single server, virtual hosting, etc.)

Birth and modification history of server (major revisions of content)

Date and duration of study

Description and rational for any cleaning, filtering, etc. of data


Site Composition (one month)

Number of users

Number of files and page requests per user

Number of search engine hits

Number of files serviced

Number of pages serviced

Number of CGI/dynamic content serviced

Bytes transferred

Byte latency

Total number of files on server

Total number of pages on server

Documents by Traffic graph ( x% documents account for y% of traffic)


Growth Rates

Number of users

Number of files and page requests per user

Number of files serviced

Number of pages serviced

Number of CGI/dynamic content serviced

Bytes transferred

Byte latency

Number of files on server

Number of bytes on server

Doubling period for all of the above metrics


Distributions

Entire Server - User Centric

Files transferred per user

Unique files transferred per user

Pages transfer per user

Unique pages transfer per user

Reoccurrence rates for files and pages per user (assumes longitudinal tracking capabilities)

Entire Server Web Centric

Embedded images per page

Mime-type percentage breakdown (e.g., html, jpg, ps, etc.)

Hyperlinks per page

Longitudinal Sessions General

Sessions per user

Longitudinal Sessions Temporal

Length of sessions per user

Inter-session time per user (session to session time)

Longitudinal Sessions Paths

Length of sessions per user

Per Session Temporal

Inter-request time per user (request to request time)

Intra-request time per user (request to render time)

Length of visit at site per user

Per Session Paths

Stack distance per user

Length of visit at site per user


Metrics for WWW Characterizations

To be completed