pics_69x112_8g.gif (3380 bytes)

PICS Label Grabber

Kyle Jamieson jamieson@mit.edu
Jennifer Berk jcberk@mit.edu
Terrence Poon tpoon@mit.edu

Summary

The PICS label grabber application searches the WWW for PICS Labels and submits them to the label bureau of your choice.  It can be used to populate a label bureau for testing, or for any other label bureau purpose.

The PICS label grabber allows the user to specify three URLs, and three numbers corresponding to the amount of searching the robot should do starting at each URL.   The robot does a breadth-first search from each URL; i.e., all URLs immediately linked to the start URL ("child" URLs) are searched before any "grandchild" URLs.  The robot does not visit a page twice.  Once the robot visits a page, all PICS labels are extracted from that page and sent to a label bureau via an HTTP PUT.

Robot.gif (27971 bytes)

Features

Sends a standard PICS PUT to any label bureau.
Reads robots.txt at each site to avoid network congestion and respect the wishes of site webmasters.
100% Java; uses PICS Standard Library.

Installation

  1. Note: the latest version of the PICS Standard Library should be installed!   See http://www.w3.org/PICS/refcode/Parser/Overview.html to install.
  2. Download (Version 1.0, 1/20/98; Version 1.1, 1/30/98)
  3. Copy all files in the ZIP to a subdirectory named w3c\pics\robot.
  4. Execute java w3c.pics.robot.Controller [address] [port] [resource]

where [address] is the hostname of the label bureau, [port] is the port number of the label bureau, and [resource] is the label bureau resource name.

Direct all questions regarding the PICS Robot to Kyle Jamieson jamieson@mit.edu.

References

The Jigsaw Label Bureau

PICS PUT Specification

PICS Standard Library