Web Traffic Characterization and Information Delivery

Previous work on workload characterization focused on statistics such as file size and type distribution, rate of change, etc. The main objective behind these efforts was to understand the system and hence enhance its performance through caching, delta encoding, and prefetching. An important objective for this work is to design a robust and scalable protocol that will support WWW users’ activities.

A new problem facing the WWW users is the amount of information they can access and how to make sense of it. Tools for classifying and categorizing information, detecting similarities between two or more sources of information, and information filtering are highly appreciated by current Web users.

Our current interest is to study communities of users searching for patterns of behavior that will help in developing tools to support enhanced information delivery. Specifically we are interested in metrics and tools to measure relations between accessed documents. This information will be used to build environments that will support information integration, personalization and push, and data mining.

For example, relations between documents can be computed based on geographic location or subject.

In addition to caching, proxies can be potential candidates to host tools that will provide the integration process of external and internal information. The information will be filtered based on user profiles’ that is sent by individual clients to support personalization of information. Personalization or Push on the proxy cache level is guided by the shared interest between the clients within a certain locality.

Our objective from participating in the workshop is to discuss metrics, tools, and methods to characterize WWW information resources in an attempt to support better information delivery. By understanding information resources we can build tools that will classify, group, and present information using dynamic views based on user interest.