First of all let me underline that I wrote this code just for fun and to learn something about HTTP stuff during summer holidays. The code is really flimsy,I wrote it on my PC then I brought the code to a Unix system to have a printed copy of some html docs I needed for my work. On this system it seems to work but when I've used it I've discovered that the are different interpretation on how to use certain html tags so probably You'll find soon something that will crash it. Up to now I consider the code still not released so I'm going to update it without creating SPR. If this SW will work on your system we have to thank mr Frans van Hoesel that did and invaluable work on this SW. This SW has been tested only on unix systems Well Have fun and good luck Francesco Ruta ***************************************************************************** INSTALLATION : To default compiler in the makefile is gcc ansi with the PROTOTYPES definition. The PROTOTYPES identifier is used to allow the function prototype; if it is not defined prototypes are not used. make should generate four executables : htpsxups htpsxlst htpsxcps htpsxldu To execute them You'll need the configartion file htps.cnf included in this distribution (see below). ***************************************************************************** htpsxups (html2ps) this procedure dumps an URL tree to a PS file. It accepts a command line argument, if no argument is provided it'll prompt for the required parameters. The procedure loads the input url and then starts parsing it; every image or url referenced is then extracted and loaded, for each url the process is repeated. When all the brach have been parsed the tree is dumped to a PS file. If the root url is already stored in a file the filename can be specified with the -f parameter, the url is still required because htpsxups will try to load the url referenced in this file (if the depth is greater than 1 and the max no of files is greater than 1) and the full url could be required to translate partial url. The behaviour of the process may be customized defining the maximun number of url that may be retrieved, the maximum depth, the allowed hosts, the allowed file extensions and the allowed protocols. The PS output may be customized as well defining if inline images should be loaded/dumped, if a list of anchors have to be printed ... These values are specified into the configuration file (see below) but could be overridden by a command line parameter. This process is the sum of htpsxlst and htpsxcps, these two processes do the same job in two steps to allow an higher level of customization. command line parameters : url Required- root URL [ps_file] Optional- output PS file name if missing the dafault is stdio [-f filename] Optional- the file where the root url is stored it could be used to dump local file, the url is still required to interpret partial url included in the file. [-c config] Optional- The name of the configuration file if missing it search first htps.cnf in local dir then .htps_cnf in local dir then .htps_cnf in root dir A configuration file is required. [-p prefix] Optional- The file prefix used to store the loaded url to local disk, the default is "ht" [-d depth] Optional- Maximum depth (int) [-n maxfile] Optional- Maximum no of url files to load(int) [-i|I] Optional- Dump inline image(FALSE|TRUE) [-l host,host] Optional- Host list, the name must be separated by comma no blanks [-g|G] Optional- Convert to gray scale (FALSE=color|TRUE=gray) [-o L|P] Optional- Orientation (Landscape|Portrait) [-b|B] Optional- Add a box (FALSE|TRUE) [-t|T] Optional- Dump URL title (FALSE|TRUE) [-r|R] Optional- Dump URL anchors index (FALSE|TRUE) [-u|U] Optional- Dump unknow/mispelled tags (FALSE|TRUE) [-x|X] Optional- Delete loaded file when completed (FALSE|TRUE) [-v|V] Optional- Verbose (FALSE|TRUE) If there are no options on the command line htpsxups will prompt for the following parameters : url Required config_file Default output_ps_file Default file_prefix Default Max_file Default from config_file Max_depth Default from config_file These parameters overwrite the same values found into the configuration file(see below), only the host names are added to that found into the configration file if the last entry is [,+] eg '*.ch,*.it,+' ***************************************************************************** htpsxlst (html2list) this procedure loads an URL tree and create an index file (see below). It accepts a command line argument, if no argument is provided it'll prompt for the required parameters. The procedure loads the input url and then starts parsing it; every image or url referenced is then extracted and loaded, for each url the process is repeated. This process will load all the url recursively referenced and the image and will create an index file(see below) that will be used by htpsxcps (see below). If the root url is already stored in a file the filename can be specified with the -f parameter, the url is still required because htpsxups will try to load the url referenced in this file (if the depth is greater than 1 and the max no of files is greater than 1) and the full url could be required to translate partial url. The behaviour of the process may be customized defining the maximun number of url that may be retrieved, the maximum depth, the allowed hosts, the allowed file extensions and the allowed protocols. These values are specified into the configuration file (see below) but could be overridden by a command line parameter. command line parameters : url Required- root URL [-f filename] Optional- the file where the root url is stored it could be used to dump local file, the url is still required to interpret partial url included in the file. [-c config] Optional- The name of the configuration file if missing it search first htps.cnf in local dir then .htps_cnf in local dir then .htps_cnf in root dir A configuration file is required. [-p prefix] Optional- The file prefix used to store the loaded url to local disk, the default is "ht" [-d depth] Optional- Maximum depth (int) [-n maxfile] Optional- Maximum no of url files to load(int) [-i|I] Optional- Load inline image(FALSE|TRUE) [-l host,host] Optional- Host list, the name must be separated by comma no blanks [-v|V] Optional- Verbose (FALSE|TRUE) If there are no options on the command line htpsxlst will prompt for the following parameters : url Required index_file Default config_file Default file_prefix Default Max_file Default from config_file Max_depth Default from config_file These parameters overwrite the same values found into the configuration file(see below), only the host names are added to that found into the configration file if the last entry is [,+] eg '*.ch,*.it,+' ***************************************************************************** htpsxcps (list2ps) this procedure loads an index file (see below) and generate the PS file. It accepts a command line argument, if no argument is provided it'll prompt for the required parameters. This process will dump all the file referenced into the index file. The behaviour of the process may be customized These values are specified into the configuration file (see below) but could be overridden by a command line parameter. command line parameters : index_file Required- index file name [ps_file] Optional- output PS file name if missing the dafault is stdio [-c config] Optional- The name of the configuration file if missing it search first htps.cnf in local dir then .htps_cnf in local dir then .htps_cnf in root dir A configuration file is required. [-g|G] Optional- Convert to gray scale (FALSE=color|TRUE=gray) [-o L|P] Optional- Orientation (Landscape|Portrait) [-b|B] Optional- Add a box (FALSE|TRUE) [-t|T] Optional- Dump URL title (FALSE|TRUE) [-r|R] Optional- Dump URL anchors index (FALSE|TRUE) [-u|U] Optional- Dump unknow/mispelled tags (FALSE|TRUE) [-x|X] Optional- Delete loaded file when completed (FALSE|TRUE) [-i|I] Optional- Dump inline image(FALSE|TRUE) [-v|V] Optional- Verbose (FALSE|TRUE) If there are no options on the command line htpsxcps will prompt for the following parameters : config_file Default file_prefix Default ps_file Default These parameters overwrite the same values found into the configuration file. ***************************************************************************** configuration file The configuration file contains the customization parameters. This file is required. If this file is not specified the processes search first htps.cnf in the current dir then .htps_cnf in the current dir and as last resort .htps_cnf in the root dir. The flag allowed into the configuration file are : e-mail = Your e-mail this is used as password for FTP, if You use only Your username e.g. e-mail = Ruta the current node is added verbose = TRUE|FALSE max_depth = max depth that can be reached when parsing the URL tree. depth = 1 is the root URL depth, i.e. one file only is loaded max_files = max no of URL to retrieve, this value do not include the inline images. protocol = the allowed protocols, http and ftp are the only protocol supported, if You define protocol = http then a URL like this ftp://node/file is ignored. extension = the allowed file extension, if You define extension = html then a URL like this http://node/file.txt is ignored. Only the root url may be not defined like this htpsxlst http://hp835.mt.asi.it ... host = the allowed list of host, if You define host = *.ch then a URL like this http://info.cern.ch/file.html is accepted while the following is ignored http://hp835.mt.asi.it load_image= TRUE|FALSE if this flag is set the inlime images are loaded and then dumped. the following parameters define the PS formatting anchor_index = TRUE|FALSE If this flag is set each anchor referenced in the current document is listed at the end of the document ignore_unknow = TRUE|FALSE if this flag is set a mispelled/unknow tag is ignored otherwise it is dumped into the document as text. page_orientation =Portrait|landscape page_width page_height = dimension of the page left_margin right_margin top_margin bottom_margin = page border these value may be expresses as inch(in) cm(cm) mm(mm) the default is point (1/72 inch) start_page = each page is numbered starting with this value, if the value is -1 the page aren't numbered page_box = TRUE|FALSE If set a box is draw around the page page_header = TRUE|FALSE if set each document is preceded by a header that includes the URL, title and ident (see index file) gray_scale = TRUE|FALSE if set the gif inline images are converted to gray scale image_res = this is the resolution pixel/inch of the image it is used to keep the same size of the image on the screen eg a VGA is about 96dpi background = Usually the images on the WWW have a gray background to match the Mosaic one. To keep the same aspect You can specify the range that has to be converted to white e.g background =[170,170,170][200,200,200] means that if a pixel RGB value falls inside that range it is converted to white, this flag has not effect if the background flag of the GIF (GIF89) is set. PS_level = 1|2 Postscript level it has effect only if gray_scale is FALSE, it define how the color images are handled. style = Font,size this entry define for each html style the font and the size. The font is the postscript definition and it is case sensitive the allowed styles are : default_style H1_style

H2_style

H3_style

H4_style

H5_style

H6_style
address_style

pre_style
		   blockquote_style 
tt_style em_style strong_style code_style samp_style kbd_style var_style dfn_style cite_style e.g. default_style = Times-Roman,10 For each font type referenced in a style there must be supplied the different face e.g. If You have used Times-Roman for a style You must add the following lines : font = Times base = Times-Roman bold = Times-Bold blditalic = Times-BoldItalic These are used to handle the and tags. (Probably not as the html standard requires) ***************************************************************************** index file The index file contains the reference of all the anchors find into all the parsed htmls. The structure of this file is like this : image_section_header [img_url_entry]0..n document_section_header [doc_url_entry]0..n file_trailer in the two header is defined the no of entries for each section. an image entry has the following format URL [filename] counter a document entry has the following format URL [paragraph id] [depth] [title] [filename] if the filename is missing ([]) the url will be ignored. You may freely modify this index to customize the final document. It includes all the URLs it has found, but probably not all of them have been loaded (max depth or max no reached) so some of them have no filename. If an error has occurred the filename is stored in the title slot (the file could contain the server response). The url are listed even if not loaded to leave a clue of what have been lost. (An url could be loaded later with htpsxldu) If you want to remove an html delete its filename leaving the two brackets []. Each loaded URL is characterized besides the URL and filename by its title and paragraph identifier, both could be modified. The title and identifier are dumped to the PS file if the page_header flag is set) The depth field is not used. You can add or move the URLs in the index file to determine the document appearance. If you add or delete records consider that the SW is not flexible you must respect the structure leaving the exact number of lines for each URL and update the counters on the top of the list. ***************************************************************************** file prefix When a file is loaded it is stored with the following filename prefixnnnn.ttt where prefix is the string you give (ht is the default) nnnn is a sequence number (starts from 0 ) ttt is the data type extension html for html, txt for plain text and gif,xbm for images The prefix is used to allow the retrieving of different URL on the same dir. *********************************************************************** htpsxcps handles only gif and xbm (x11) images It does not handle interleaved gif file nor tiff. If you find such image load and convert them to gif. The process is obviously not able to identify a call to a sgi-script with an extra path. Such URL /SGI/extra_path/test.html is accepted but the server may returns everything. Up tonow the process do not insert %%Page comment that are required by some PS previewer. *********************************************************************** pls try to save same paper sheet printing at least double side. ***********************************************************************