Log Validator - Manual - Creating new modules

The Log Validator's modular design

The Log Validator is a very flexible tool and most of its behaviour can be changed by leveraging its modular design: the only fixed thing in its behaviour is, in fact, that its input is a Web server log (or a simple list of URIs) and a configuration file. The rest is controlled by its process and output modules, and since such modules are easy to create and use, anyone with some knowledge of the Perl language is able to extend or re-use the Log Validator for a great variety of purposes.

In order to be able to create new modules for the Log Validator, the first important thing is to understand its modular design:

How to Create a module

  1. Download the stable code archive or checkout the CVS code
  2. If you downloaded the archive, uncompress it.
  3. go to the samples directory inside the uncompressed archive or in your CVs checkout of the CVS code.
  4. Open the source code for sample modules and start editing, while following this documentation

Creating a process module

The process module receives a configuration hash. From this hash, it can extract a few things including:

Once all this information has been extracted, the module processes the information and creates results hash that will be passed to the core module..

We are editing the NewModule.pm module in the samples directory.


# Copyright (c) YYYY the World Wide Web Consortium :
#       Keio University,
#      European Research Consortium for Informatics and Mathematics 
#       Massachusetts Institute of Technology.
# written by Firstname Lastname <your@address.mail> for W3C

Replace YYYY with the current year and add your name and your email address.

The following code is the beginning of the constructor for the module, you are not supposed to modify it.

#
# $Id: Manual-Modules.html,v 1.8 2006/06/29 00:46:05 ot Exp $

package W3C::LogValidator::MyProcessModule;
use strict;
use warnings;


require Exporter;
our @ISA = qw(Exporter);
our %EXPORT_TAGS = ( 'all' => [ qw() ] );
our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
our @EXPORT = qw();
our $VERSION = '0.1';

###########################
# usual package interface #
###########################
our $verbose = 1;
our %config;

sub new
{
        my $self  = {};
        my $proto = shift;
        my $class = ref($proto) || $proto;
	# mandatory vars for the API
	$self->{URIS}   = undef;

Here you can set up your internal parameters for this module. In the following example we set up a list of known extensions for the type of document we want to process, allowing us to filter the list of URIs before processing.

	# internal stuff here
	$self->{AUTH_EXT} = ".html .xhtml .phtml .htm /";

This is where you start processing the configuration hash passed to you by the core module. Standard parameters include:

	# don't change this
        if (@_) {%config =  %{(shift)};}
	if (exists $config{verbose}) {$verbose = $config{verbose}}

Internal parameters for this module may have been configured through the configuration file, but you may want to assign default fallback values, as follows (example taken from the W3C::LogValidator::HTMLValidatorcode):

        $config{ValidatorHost} = "validator.w3.org" if (! exists $config{ValidatorHost});
        $config{ValidatorPort} = "80" if (!exists $config{ValidatorPort});
        $config{ValidatorString} = "/check\?uri=" if (!exists $config{ValidatorString});
        $config{ValidatorPostString} = "\;output=xml" if (!exists $config{ValidatorPostString});

ending the constructor.

	bless($self, $class);
        return $self;
}

#########################################
# Actual subroutine to check the list of uris #
#########################################

Moving on to the main subroutine for the process modules. Process modules must includethis process_listsubroutine.

sub process_list
{
	my $self = shift;
	my $max_invalid = undef;
	if (exists $config{MaxInvalid}) {$max_invalid = $config{MaxInvalid}}

Here we have an example of getting a parameter ( $config{MaxInvalid}) from the configuration hash passed to our module by the core W3C::LogValidatormodule.

Below is the code that handles the temporary DB file. The technique is to tie it back (read-only, only the core module is supposed to modify this) to a hash, as follows:

	print "Now Using the CHANGEME module :\n" if $verbose;
	use DB_File;                                                                  
        my $tmp_file = $config{tmpfile};
	my %hits;                                                                     
	tie (%hits, 'DB_File', "$tmp_file", O_RDONLY) ||                              
	die ("Cannot create or open $tmp_file");                                      

You will probably want to sort the list of URIs before starting to process them. If you wish to have a sorted list by hits, you can use the following code:

	my @uris = sort { $hits{$b} <=> $hits{$a} } keys %hits;

You are now free to do whatever you want with the sorted list. Use your imagination!

You are not even limited to processing the list directly, you could send it to an externam program or service on the Web. For example, a spelling checker, a WAI validator, etc.

	# do what pleases you!
	print "Done!\n" if $verbose;

   untie %hits;                                                                  

When you are done, you may untie the DB file and the URI/hits hash.

This subroutine's output is a hash, which structure is explained below:


   my %returnhash;
# the name of the module
	$returnhash{"name"}="CHANGEME";                                                  
#intro string
	$returnhash{"intro"}="An intro string for the module's results";
#Headers for the result table
	@{$returnhash{"thead"}}=["Header1", "Header2", "..."] ;
# data for the results table
	@{$returnhash{"trows"}}=
	[
	 ["data1", "data2", "..."]
	 ["etc", "etc", "etc"]
	 ["etc", "etc", "etc"]
	 ["etc", "etc", "etc"]
	];
#outro string
	$returnhash{"outro"}="An outre string for the module's results. Usually the conclusion";
	return %returnhash;
}

You may of course create subroutines that will be called by the main subroutine process_list.

# internal routines
#sub foobar
#{
#   my $self = shift;
#   ...
#}

End of the code proper.

package W3C::LogValidator::CHANGEME;

1;

Do not forget to replace the relevant bits in the embedded documentation for your module.

__END__

=head1 NAME

W3C::LogValidator::CHANGEME

=head1 SYNOPSIS


=head1 DESCRIPTION

This module is part of the W3C::LogValidator suite, and ....

=head1 AUTHOR

you <your@address>

=head1 SEE ALSO

W3C::LogValidator, perl(1).
Up-to-date complete info at http://www.w3.org/QA/Tools/LogValidator/

=cut

@@ add some explanation on how to include this newly coded module to an existing Log Validator installation

Creating an output module

We are editing the NewOutputModule.pmmodule in the samples directory.

# Copyright (c) YYYY the World Wide Web Consortium :
#       Keio University,
#       European Research Consortium for Informatics and Mathematics
#       Massachusetts Institute of Technology.
# written by Firstname Lastname <your@address.mail> for W3C

Replace YYYY with the current year, and add your name and your email address.

package W3C::LogValidator::Output::MyOutputModule;
use strict;

Above, replace "MyOutputModule" with the actual name of your output module.

Below is the standard constructor code for the module, do not modify it unless you really know what you are doing.

###########################
# usual package interface #
#     don't modify        #
###########################

require Exporter;
our @ISA = qw(Exporter);
our %EXPORT_TAGS = ( 'all' => [ qw() ] );
our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } );
our @EXPORT = qw();
our $VERSION = '0.1';


our %config;
our $verbose = 1;

sub new
{
        my $self  = {};
        my $proto = shift;
        my $class = ref($proto) || $proto;
	# configuration for this module
	if (@_) {%config =  %{(shift)};}
	if (exists $config{verbose}) {$verbose = $config{verbose}}
        bless($self, $class);
        return $self;
}

Now we are working on the part of the code you will have to modify. This code is organized with two subroutines:

Output modules must includethose two subroutines.

#############################
# first subroutine is output #
#   create output string    #
#############################

First subroutine sub output

You create the result string by using the different entries in the results hash, including:

You are free to do whatever you want in this subroutine provided you return the string. Below is an example that concatenates all the information into one string.

sub output
{
	my $self = shift;
	my %results;
	my $outputstr ="";

# you create the result string by using the different entries 
# in the results hash, including name (of the module), intro (text)
# thead (the headers of the result table), trows (rows of the result table)
# and outro

#sample code for a full-text tabbed result table below
	if (@_) {%results = %{(shift)}}
	$outputstr= "
************************************************************************
Results for module ".$results{'name'}."
************************************************************************\n";
	$outputstr= $outputstr.$results{"intro"}."\n\n" if ($results{"intro"});
	my @thead = @{$results{"thead"}};
	while (@thead)
	{
	   my $header = shift (@thead);    
	   $outputstr= $outputstr."$header   ";
	}
	$outputstr= $outputstr."\n";
	my @trows = @{$results{"trows"}};
	while (@trows)
	{
	   my @row=@{shift (@trows)};
	   my $tcell;
	   while (@row)
	   {
	       $tcell= shift (@row);   
	       chomp $tcell;
	       $outputstr= $outputstr."$tcell   ";
	   }
	   $outputstr= $outputstr."\n";
	}
	$outputstr= $outputstr."\n";
	$outputstr= $outputstr.$results{"outro"}."
************************************************************************\n\n" if ($results{"outro"});

This is the end of the example, now we can return the string.

# the subroutine returns the output string
	return $outputstr; 
}

The finishsubroutine does whatever action is needed with the output string like "print" or send as e-mail or whatever you like note that for saving to file, the main module has an option for that already. Therefore in most cases, you will just have to print.

################################################################
# finish does whatever action is needed with the output string #
#   like "print" or send as e-mail or whatever you like        #
# note that for saving to file, the main module has an option  #
#               for that already, just "print"                 #
################################################################

sub finish
{
# well for this output it's not too difficult :)
	my $self = shift;
	if (@_) 
	{ 
	   my $result_string = shift;
	   print $result_string;
	}
}


Do not forget to replace the relevant bits in the embedded documentation for your module.


package W3C::LogValidator::Output::MyOutputModule;

1;

__END__

=head1 NAME

W3C::LogValidator::Output::MyOutputModule  Short Description


=head1 DESCRIPTION

This module is part of the W3C::LogValidator suite, and ...

=head1 AUTHOR

Firstname Lastname <your@mail.address>

=head1 SEE ALSO

W3C::LogValidator, perl(1).
Up-to-date complete info at http://www.w3.org/QA/Tools/LogValidator/
=cut


Valid XHTML 1.0! Created Date: 2002-07-24 by Olivier Thereaux
Last modified $Date: 2006/06/29 00:46:05 $ by $Author: ot $