Briefing Package for the
Web Characterization Activity, Phase II

Table of contents

1. Executive Summary

This Briefing Package proposes a phase II Activity as a continuation of the work performed by the nearly terminated Web Characterization Group (WCG) as part of the W3C HTTP-Next Generation (NG) Project.

Since the WCG was chartered in July 1997, it has successfully completed its role within the NG Project,  providing data to the Protocol Design Group (PDG), and developing representative testbed scenarios. The intent of phase II is based on the experience gathered by the WCG to broaden the scope of the Web characterization and provide information and test scenarios for the W3C Membership and the Web community in general about the Web and how it is being used now and in the near future. By better understanding the Web, we believe that W3C and its Membership is better suited to evolve the Web and to ensure its long term interoperability and robustness.

An important result of WCG is the identification of the three key groups in the characterization work and how they interact:

Bulk Data Providers
The Bulk Data Providers are typically server maintainers and ISPs providing server and proxy logs but can also be backbone providers gathering information directly from the Net or users running instrumented Web clients etc. Because of privacy concerns and because of the sheer size of log files, it is often preferred to have data providers running a set of characterization tools locally so that only the boiled down data sets and profiles are released.
The W3C Characterization Working Group
The WCG develops and maintains a set of characterization tools used by the data providers and defines the mechanism for exchanging boiled down data sets and profiles with the data providers in order to maintain confidentiality and trust. The collected data sets are used to develop characterization models and to provide characterization data to the third group, the reduced data consumers.
Reduced Data Consumers
The reduced data consumers use the profiles and data sets provided by the WCG and provide feedback and new questions to be asked. Primary data consumers are expected to be content providers, service providers, user groups, researchers and technology designers.

The format for this Activity is to let the interaction between the reduced data consumers and bulk data providers take place through an Interest Group, with a new Web Characterization Working Group (WCG-II) functioning as the mediator, provider of analysis tools and disseminator of characterization information.

Specifically, we propose to:

2. Current Status

The Web Characterization Group (WCG), chaired by Jim Pitkow, Xerox PARC, was originally chartered in August of 1997 as part of the HTTP-NG Project with the intent of providing a set of realistic user scenarios to be used within the HTTP-NG testbed. Specifically, the WCG aimed to fulfill four primary goals for the HTTP-NG Project:

The WCG has completed the first three goals and begun working on the fourth. The group currently consists of members from academia, industry, and W3C member organizations. The principal members are Boston University's Oceans Group, Harvard College's Vino Group, INRIA, Microsoft, Netscape, Virginia Tech's Network Research Group, and Xerox PARC's Webology Group.

While working within the scope of the HTTP-NG Project it became clear that there is a need and interest for Web characterization information within the W3C Membership and in the Web Community in general. The purpose of this briefing package is to propose a framework in which the Web characterization work can continue while broadening the scope to include a larger group of bulk data providers as well as reduced data consumers.

3. Proposal: Web Characterization Activity

3.1 Scope of Activity

The scope of the Web Characterization Activity is to gain further understanding of how the Web is evolving and how fast changes can propagate in a globally distributed environment.

Efficient techniques for establishing and maintaining trust and privacy of individuals and groups of people is essential for the long term stability of the Web. However, we do not consider providing technical solutions for establishing privacy policies within the scope of this activity - this is better provided by activities like P3P and DSig. As technical solutions evolve, they will be deployed as fit by this Activity.

The scope of this Activity is to characterize the Web as a distributed system and not individuals using the Web. Especially, the Activity will make no effort to identify individual users or to disclose data that can lead to the identification of individual users. Also, it will make no effort to identify groups of people according to race, religion, national or ethnic classification, nor to political, or sexual orientation, see also the section on IPR

3.2 Market for the Activity

The results of the Activity is expected to be of interest to a relatively large set of groups including but not limited to:

Groups within W3C as well as technology designers and ISPs are expected to be able to draw immediate benefit from the results produced by the Activity. Advances in these markets clearly translate to benefits for the Web Community. The advertising and market research groups will benefit, as the diverse methods and tools for measuring Web usage now can be brought into more focus. Academia also stands to profit by obtaining representative data (something that is typically very difficult and done on an ad-hoc basis).

The output of the Activity will provide the W3C Membership and the Web Community in general with an important feedback mechanism that provides information about how new techniques and solutions propagate on the Web and how they affect the way the Web is being used. It may also provide information about existing performance bottlenecks, usability problems etc. which when identified can result in more focused solutions with higher chance of faster deployment.

The true value of this proposed Activity relies on being able to take regular, representative samples over a relatively long period so that the dynamics of the Web can be modelled and reflected back into the evolution of the Web.

3.3 Structure of Activity

This Activity is intended to last 12 months from November 5, 1998.

3.3.1 Web Characterization Workshop

The Activity will be kicked off by the Web Characterization Workshop, November 5, 1998 in Boston, MA, with the intent of bringing together both W3C Members and Web characterization experts. The results of the Workshop will be the identification of organizations that wish to and could participate in the Working Group, and the formation of the Interest Group.

More information can be found in the Workshop invitation and program.

3.3.2 Web Characterization Working Group, Phase II (WCG-II)

The WCG-II is intended to work using a request/response based model similar to the one used between the HTTP-NG PDG and the WCG. Requests will be formally issued by the Interest Group and by W3C Activities and the WCG-II will respond with realistic time lines for when and how results can be made available.

The WCG-II will start its work by formally soliciting requests for characterization data needed by other W3C Working Groups and Activities. The solicitation process is intended to occur at six-month intervals, enough time for the Working Group to understand and respond to the requests of the other W3C Groups. Requests from the Interest Group will be dealt with on a case by case basis.

The focus of the WCG-II is expected to include the following tasks:

Provide characterization data for W3C Activities in order to better define the problem space and to improve design decisions
While a process will need to be developed to determine and develop relationships between other W3C Working Groups, the following working W3C groups could benefit from WCG products: Electronic Commerce Activity, HTTP-NG, HTML-NG, XML, WAI, I18N, SMIL, CSS, XLF, and Privacy Policy.
Provide realistic user scenarios and representative load characterizations to be used in automatable test beds
Development of a representative testbed has had significant impact of the HTTP-NG WGG's views on the field of Web server benchmarking and performance testing. State-of-the-art products typically attempt to stress test servers by issued as many HTTP requests per seconds as possible. We intend to continue to develop the load generator based approach.
Design scalable and automatable mechanisms for collecting representative data sets at client, proxy, server, and network access points.
Currently, the acquisition of data to characterize can be quite challenging due to the difficulty in instrumentation, the proprietary nature of the data, and privacy concerns.
Maintain a Web based repository of a) data collection, analysis, and modeling tools, b) publicly usable representative data sets, and c) relevant research publications.
Creating an active knowledge base of the Web has been an important part of the WCG and we expect that the demand for characterization information will increase in phase II. This work is likely to happen in collaboration with other characterization groups where similar projects are underway.

WCG-II participation is defined in the Member Resources section.

3.3.3 Web Characterization Interest Group (WCIG)

The role of the Interest Group is to be a public discussion forum for bulk data providers and reduced data consumers, and to provide requests and feedback to the Working Group. It is expected that the tools and dissemination mechanism produced by the Working Group will benefit from a feedback mechanism with its immediate users, as well as their continuous review.

Furthermore, the intent of the IG is to establish connections to the Web Community as well as the Internet Community and to provide recommendations and guide lines for Web specifications and implementations. Developments like SURGE can have direct impact on products like caches, Web servers, and proxies; as well as measurement tools and benchmarks like Webstone, SpecWeb96, and WebBench2.0.

Participation in the Interest Group will be open to W3C members as well as non-W3C members. The role of the Interest Group will be to help focus the Working Group, monitor the progress of the Working Group, provide critical review of the work, and help filtering questions and issues presented to the Working Group.

3.4 Meetings and Coordination

3.4.1 The Working Group

The Working Group will communicate using an archived mailing list and a set of Web pages. The mailing list archives will be made available to the Membership. The group will have a teleconference twice a month, in which members, editors, the Chair and W3C staff representatives will take part. The frequency of the meetings can be increased at the discretion of the Chair, and the availability of resources. Minutes of phone conferences will be posted either on the mailing list or on a Web page.

3.4.2 The Interest Group

The Interest Group will maintain its communications using an archive, public mailing list and a set of Web pages. All communication between the Interest Group and the Working Group should take place on the Interest Groups Web pages and/or mailing list.

3.7 Projected Schedule

Date Event Format
October 7, 1998 Advisory Committe ballot closes
October 21, 1998 Director's Decision
November 5, 1998 W3C Workshop (member and non-member) on Web characterizations, testbed simulators, and automatic characterization methodologies Abstract reviewed Workshop with attendance limited to 50 participants
November 15, 1998 Identification and inclusion of new members into the Working Group Milestone
November 15, 1998 Workshop minutes Report to the Membership
December 5, 1998 Summary of Workshop outlining research opportunities and proposed solutions W3C Note
December 5, 1998 Solicitation to other W3C Working Groups Memo to other W3C Working Groups
December 5, 1998 Initial repository containing data sets, tools, and bibliography of Web characterization research Initial online repository available to the public
January 5, 1998 Initial proposal for automatic Web characterization W3C Note
January 5, 1998 Closing date for contracting with other W3C Working Groups Milestone
February 5, 1998 Initial proposal for refined testbed W3C Note
March 25, 1998 Fulfillment of contracts with other W3C Working Groups Report to the Membership
April 5, 1998 Six month status report Report to the Membership
April 15, 1998 Solicitation to other W3C Working Groups  Memo to other W3C Working Groups
May 15, 1998 Closing date for contracting with other W3C Working Groups Milestone
August 5, 1999 Refined testbed  W3C Note w/ Software
September 5, 1999 Completed repository containing data sets, tools, and bibliography of Web characterization research Online repository available to the public
October 25, 1999 Fulfillment of contracts with other W3C Working Groups Report to the Membership
November 5, 1999 Twelve month status report Report to the Membership

4. IPR Issues

The privacy concern surrounding log files should be taken into account during this Activity. Specifically, the tools developed should take into account the respect for individual privacy.

The intellectual property rights, IPR (e.g. copyright, patents, and trademarks) for existing software (as well as the log files as such, which count as databases in some legislations, and therefore are subject to protection) should be respected, but all software and other materials produced within the scope of this Activity should be subject to existing W3C IPR policy.

Participants in this Activity are required to inform the Chair and other participants prior to joining the group about any IPR concerns, claims and limitations that they might raise, at the beginning of the project.

Participants in the Interest Group, as well as the Working Group, are expected to make log files and relevant information available for analysis on their sites, and share generated data sets free of charge and on an equitable basis.

Johan Hjelm, W3C/Ericsson, Member WCG <>,