
Push-based Web Filtering Using PICS Profiles
Push-based Web Filtering Using PICS Profiles
Download source code .zip file
/ .tar.gz file
Dowbload API .zip file
/ .tar.gz file
Browse the API
This page is the documentation for the source code that was developed as a
part of this thesis. The Java source code available from this page is a
complete working prototype of a push-based PICS filtering system using the
RSACi rating service. Only information specific to the source code is
found on this page. For more general information about this project, see
the actual thesis document.
Installation
NOTE: This software requires a Java 1.1 Virtual Machine, such
as Sun's JDK 1.1.x.
To install this system, first download the source code (.zip file / .tar.gz file)
After uncompressing the file, you should have a directory named
w3c
, and a file named rsaci.rat
. From the
installation directory, you must create four new directories, and name
them: profiles
, labels
, csrc
,
and output
.
After creating the four required directories, you should have the following
directory structure:
INSTALLDIR--w3c--pics--db--parser
|
|
-------profiles
|
|
-------labels
|
|
-------csrc
|
|
-------output
|
|
rsaci.rat
Add the INSTALLDIR to your CLASSPATH.
You should now be ready to go!
Starting the system
The system is run through a command-line interface provided the CDatabase
class.
To start the system type:
java w3c.pics.db.CDatabase -service rsaci.rat
The exact command will differ slightly if you are using a Java VM other than
the JDK.
The following command line arguments are available:
- -service ratfile: contains the name of a .rat file to use. For
the purpose of the filtering system, this must be the file rsaci.rat. When
using the system only for the purpose of generating random labels or
profiles, and rating service(s) may be used, so long as the appropriate
ratfiles are present in the installation directory. If more than one service
description is used in this manner, each ratfile name must have its own
-service tag in the command line.
- -vm number: The number of profiles and labels to use as the
systems vm page size. If this flag is not present, the default size is 200.
Varying the size can be useful if the system appears to run out of memory.
- -without: enables the output generators during profile conversion. By
default output generators are disable. When this flag is present, any
C files generated will contain extra code that results in HTML files being
constructed containing the results of the processing.
After invoking the CDatabase, the following commands are available:
- convert labels: Converts labels into binary form.
- convert profiles: Converts profiles into C source code.
- create labels: Creates some randomly genreated labels.
- create profiles: Creates some randomly generated profiles.
- help: Prints this help information.
- quit: Quits the program.
Unless otherwise noted, all labels that are read or created by this program
must reside in the labels
subdirectory, and all profiles that are
read or created by this program must reside in the profiles
subdirectory. All converted profiles and labels will be placed in
the csrc
subdirectory.
Convert labels
This command will prompt for two inputs: a labelcount and a basename. The
labelcount is simply the number of labels to be converted. The basename is
the base file name that the labels are stored in. For example, if the
labels are stored in the files mylabel1.txt, mylabel2.txt,
mylabel3.txt, etc... the basename is mylabel. Note that this means that the
label files must end in the extension .txt.
All converted labels will
be saved as individual files in the csrc
subdirectory. The
names of these
files will be: labeldata0, labeldata1, labeldata2, ... Note that these files
should NEVER be edited or renamed. They should only be read or created by
other commands within this system.
Convert profiles
This command will prompt for three inputs: a usercount, a basename, and a
labelcount. The usercount is simply the number of the labels to be
converted. The basename is the base file name that the profiles are stored
in. For example, if the profiles are stored in the files myprofile1.rlz,
myprofile2.rlz, myprofile3.rlz, etc.. the basename is myprofile. Note that
this means that the profile files must end in the extension .rlz. The
labelcount is the number of labels that are going to be processed against
these profiles. This number can be changed later. It is placed into the C
code as a constant which can easily be changed.
All converted profiles will be saved in C source files in
the csrc
subdirectory. The names of these files should NEVER be edited or renamed.
Along with the C source file, a Makefile will be created. Compiling with this
Makefile will build the entire Profile Store, ready to be run against
converted labels. The Makefile which is created is for Windows machines. It
is compatible with the nmake utility provided with MSVC. Users of Unix
machines will have to modify this Makefile to make it work properly with
their compiliers.
Create labels
This command will prompt for two inputs: a labelcount and a basename. The
labelcount is simply the number of labels to be created. The basename is
the base file name that the labels will be stored in. For example, if
three labels are constructed, with the basename of testlabel, the files
created will be: testlabel1.txt, testlabel2.txt, testlabel3.txt.
Create profiles
This command will prompt for two inputs: a usercount and a basename. The
usercount is simply the number of profiles to be created. The basename is
the base file name that the profiles will be stored in. For example, if
three profiles are constructed, with the basename of testprof, the files
created will be: testprof1.rlz, testprof2.rlz, testprof3.rlz.
Using Labels and Profiles from the Web
To use Labels and Profiles that were either created by hand or acquired from
third parties, you have to do the following:
- Place the labels/profiles in the
labels
/profiles
directory.
- Make sure that the names of the files follow the naming standard as
given above. Labels should be of the form: [name][number].txt. Profiles
should be of the form: [name][number].rlz. In order to prevent overwriting
existing profiles, profiles should be number consecutively without
reusing numbers. Label numbers can be reused, unless you want to re-evaluate
profiles against those labels again at some later time.
- Use the Convert labels/profiles command.
- If profiles were added, they must be compilied using the Makefile.
- Run main.exe.
Example Session
The following in an example of the series of steps that would need to be
done in order to start running this system for the first time, using labels
and profiles gathered from outside sources.
- Rename all the profiles and labels. The labels are given the names
label1.txt, label2.txt, etc... The profiles are given the names user1.rlz,
user2.rlz, etc... All profiles are placed in the
profiles
subdirectory. All
labels are placed in the labels
subdirectory.
- The system is started with the command:
java w3c.pics.db.CDatabase -service rsaci.rat -without
This command starts the converter in the mode which will enable HTML output
for the results of processing.
- Use the convert labels command. This command will
convert all of the labels into machine readable format. When prompted, enter
the total number of labels and the base file name used for the labels (in
this example, "label").
- Use the convert profiles command. This command will
convert all of the profiles into C source code. When prompted, enter
the total number of labels and profiles, and the base file name used for the
profiles (in the example, "user").
- Compiling the C code using the Makefile that was generated in
the
csrc
subdirectory in the previous step.
- Run main.exe.
- The results of running the system will appear as HTML files in the
output
subdirectory. Each profile will have its own HTML file.
Each file
will be a list of links to the pages whose labels matched that profile's
settings.
Class notes
Below is a brief description of the various Java classes included in this
software. For more details, consult the API
- w3c.pics.db.CDatabase -- The core of the system. The main procedure of
this class is the basic interface for the entire program.
- w3c.pics.db.DataStub -- A random number generator used by the other classes.
- w3c.pics.db.ExprTable -- A table to store values of PICS expressions as
they are calculated. Specific to the RSACi rating system, but could easily
be modified to use others.
- w3c.pics.db.LabelMaker -- A random PICS label generator.
- w3c.pics.db.RandomURLGenerator -- A random URL generator, used by both
LabelMaker and RulesGenerator. Can be modified to select from specific
host names.
- w3c.pics.db.RsaciTable -- A table to store the conversions of PICS
expressions from PICSRules to C source code. Specific to the RSACi rating
system, but could easily be modified to use others.
- w3c.pics.db.RulesGenerator -- A random PICS Profiles (PICSRules) generator.
- w3c.pics.db.parser.* -- The classes in this package are specially
modified versions of the classes found in the PICS Standard Library. For more information see the
documenation for that library.
dshapiro@w3.org
16 April 98