From tilh%sin-co.sin-ro.DHL.COM@sinco.sin-co.sin-ro.DHL.COM  Mon Jul 17 03:24:23 1995
Date: Mon, 17 Jul 1995 09:22:32 +0800
From: Ti Lian Hwang 
Subject: Enhanced html_parser

Here's an enhanced html_parser to add to your collection of HTML Converters.
It rationalises conversions to various other printer formats.

I am attempting to standardise reports to HTML format, from which
they can be converted to other formats, eg. to printers. Unfortunately
most of the converters convert to Postscript.

Face with non-postscript printers, like the HP Laserjet, I was searching
for a tool to do the job.

I'm using the html_parser programmes by Jim Davis of dri.cornell.edu
and decided that enhancing it to work with HP-PCL would be worth the while.

Included below are my efforts. They are perl scripts running under perl 5.0.

Minimal change have been made to the programmes. They are all
in parse-html.pl.

The strategy is this :

parse-html will get the TERM environment variable, and 'require' the
relevent file, 'h2a_$TERM.pl'. This file should contain the following
functions

html_begin_doc ()
html_end_doc ()
begin_font ($element,$tag)
end_font ($element,$tag)

The routines should do whatever font changes are necesary for the HTML
tag. 

If 'h2a_$TERM.pl' is not available, the file 'dummy.pl' will be used.
I've included a 'h2a_pcl.pl' file for HP laserjets.

I've included the call the begin_font and end_font in html_begin and
html_end respectively.

Noted 2 errors in the original distribution :

1) the 'U' attribute is not catered for - I've included.

2) there is an additional html_begin_doc in 'html-ascii.pl'. I've 
   removed it, and left the code there, not elegant, but it works !

I would like to have the 'CENTER' attribute working (problem is I haven't
figure out how to integrate into the code yet).

Regards,

email : tilh@sin-co.sin-ro.dhl.com