From ehood@convex.com Tue Aug 2 10:13:51 1994 Article: 1895 of comp.infosystems.www.misc From: ehood@convex.com (Earl Hood) Subject: ANNOUNCE: Table of Contents Generator for HTML Date: Thu, 21 Jul 94 23:21:34 GMT-1:00 Organization: Engineering, Convex Computer Corporation, Richardson, Tx USA Announcing "htmltoc", a Perl program to generate a Table of Contents (ToC) for HTML documents. Are you tired of manually creating a ToC for your HTML documents? Would you like an automatic way to easily index your HTML files? Well "htmltoc" is what you need. htmltoc is a new addition to the perlWWW package. You can get more information here You may also grab htmltoc via ftp from: ftp.uci.edu/pub/dtd2html/perlWWW.21July1994.tar.gz This is a beta release of htmltoc. I've tested it on documents I have, but it has not been completely tested. Included below is the ASCII version of the htmltoc documentation. --ewh __________________________________________________________________________ HTMLTOC A Perl program to generate a Table of Contents (ToC) for HTML documents. _________________________________________________________________ Description htmltoc allows you to specify "significant elements" that will be hyperlinked to in a "Table of Contents" (ToC) for a given set of HTML documents. Basically, the ToC generated is a multi-level level list containing links to the significant elements. htmltoc inserts the links into the ToC to significant elements at a level specified by the user. Example: If H1s are specified as level 1, than they appear in the first level list of the ToC. If H2s are specified as a level 2, than they appear in a second level list in the ToC. See ToC Map File on how to tell htmltoc what are the significant elements and at what level they should be at. In standard operation, the created ToC is sent to standard output, or to the file specified by the -toc command-line option. For more information on controlling the contents of the created ToC, see Formating the ToC. htmltoc also supports the ability to incorporate the ToC into the HTML document itself via the -inline command-line option. This only works if a single HTML file is being processed. See Inlining the ToC for more information. In order for htmltoc to support linking to significant elements, htmltoc inserts anchors into the significant elements. Since this requires modification of the original HTML document(s), the originals are backed up with a ".org" suffix appended to the filenames. The following sections give more information on htmltoc: * Usage * ToC Map File * Formating the ToC * Inlining the ToC * Notes * Bugs _________________________________________________________________ Usage htmltoc is invoked from a Unix shell, with the following syntax: % htmltoc [options] file ... > tocfile % htmltoc -toc tocfile [options] file ... The following options are available: -footer filename Append filename, containing HTML markup, to the end of the ToC. This option is invalid when the -inline option is specified. -header filename Prepend filename, containing HTML markup, to the beginning of the ToC. The desired format of filename is slightly different if the ToC is being inlined, or is an external document. See Formating the ToC and Inlining the ToC for more information. -help Short message giving all options available to htmltoc. -inline Insert ToC into the document being processed. This options is only valid if one HTML file is being processed. See Inlining the ToC for more information on this option. -noorg htmltoc normally backs up the original HTML files with a ".org" extension. When this option is specified, htmltoc will remove the backup files when done. -ol Put level 1 ToC entries in an order list (OL). -prefix string Use string as the prefix for ID values used in NAME/HREF attributes in anchors (A) for linking ToC entries to the document(s). The default prefix is "xtocid". -textonly htmltoc by default, preserves HTML markup that exists in a significant element to appear in the ToC. This options tells htmltoc to only use the textual content of the ToC element. If this option is not specified, htmltoc will still ignore the following tags: A, HR, P. -title string Set the title, (i.e. TITLE element) of the generated ToC document to string. This option has no affect if the -header or -inline options are specified. The default title is "Table of Contents". -toc file By default, the ToC is sent to standard output (unless the -inline option is specified). This option explicitly tells htmltoc to output the ToC to file. -toclabel string Put string before ToC. This option has no effect if the -header option is specified. The default ToC label is "<H1>Table of Contents</H1>". -tocmap filename Use filename as the ToC map file. By default, htmltoc only indexes H1s and H2s. See ToC Map File for more information. Any arguments that are not part of the command-line options are treated as HTML files to be processed. _________________________________________________________________ ToC Map File The ToC map file allows you to tell htmltoc what significant elements to include in the ToC, what level they should appear in the ToC, and any text to include before and/or after the ToC entry. The format of the map file is as follows: significant_element:level:sig_element_end:before_text,after_text significant_element:level:sig_element_end:before_text,after_text ... Each line of the map file contains a series of fields separated by the `:' character. The definition of each field is as follows: significant_element The tag name of the significant element. Example values are H1, H2, H5. This field is case-insensitive. level What level the significant element occupies in the ToC. This valid must be numeric, and greater than 0. sig_element_end (optional) The tag name that signifies the termination of the significant_element. Example: The DT tag is a marker in HTML and not a container. However, one can index DT sections of a definition list by using the value DD in the sig_element_end field (this does assume that each DT has a DD following it). If the sig_element_end is empty, then the corresponding end tag of the specified significant_element is used. Example: If H1 is the significant_element, than htmltoc looks for a "</H1>" for terminating the significant_element. Caution: the sig_element_end value should not contain the `<` and `>' tag delimiters. If you want the sig_element_end to be the end tag of another element than that of the significant_element, than use "/element_name". The sig_element_end field is case-insensitive. before_text,after_text (optional) This is literal text that will be inserted before and/or after the ToC entry for the given significant_element. The before_text is separated from the after_text by the `,' character (which implies a comma cannot be contained in the before/after text). See examples following for the use of this field. In the map file, the first two fields MUST be specified. Following are a few examples to help illustrate how a ToC map file works. EXAMPLE 1 The following map file reflects the default mapping htmltoc uses if no map file is explicitly specified: # Default mapping for htmltoc # Comments can be inserted in the map file via the '#' character H1:1 # H1 are level 1 ToC entries H2:2 # H2 are level 2 ToC entries EXAMPLE 2 The following map file makes use of the before/after text fields: # A ToC map file that adds some formatting H1:1::<STRONG>,</STRONG> # Make level 1 ToC entries <STRONG> H2:2::<EM>,</EM> # Make level 2 entries <EM> H2:3 # Make level 3 entries as is EXAMPLE 3 The following map file tries to index definition terms: # A ToC map file that can work for Glossary type documents H1:1 H2:2 DT:3:DD:<EM>,</EM> # Assumes document has a DD for each DT, otherwise ToC # will get screwed up. EXAMPLE 4 The following map file demonstrates how one can bastardize the use HTML elements: # A ToC map file that wraps ToC entries in header tags. This is illegal # HTML, but it looks pretty good in Mosaic. H1:1::<H3>,</H3> H2:2::<H4>,</H4> H3:3::<H5>,</H5> _________________________________________________________________ Formatting the ToC The ToC Map File gives you control on how the ToC entries may look, but htmltoc has other options to affect the final appearance of the ToC file created. With the -header option, htmltoc will prepend the contents of the file before the generated ToC. This allows you to have introductory text, or any other text, before the ToC. Note: If you use the -header option, make sure the file specified contains the opening HTML tag, the HEAD element (containing the TITLE element), and the opening BODY tag. However, these tags/elements should not be in the header file if the -inline options is used. See Inlining the ToC for information on what the header file should contain for inlining the ToC. With the -footer option, htmltoc will append the contents of the file after the generated ToC. Note: If you use the -footer, make sure it includes the closing BODY and HTML tags. htmltoc will add the appropriate HTML markup to if either the -header or -footer option is not specified to insure a valid HTML document is created for the ToC. If you do not want/need to deal with header, and footer, files, then htmltoc allows you specify the title, -title option, of the ToC file; and it allows you to specify a heading, or label, to put before ToC entries' list, the -toclabel option. Both options have default values, see Usage for more information on each option. _________________________________________________________________ Inlining the ToC htmltoc supports the ability to incorporating the ToC directly into an HTML document via the -inline option. Inlining can only occur if one, and ONLY one, HTML file is being processed, AND the HTML file contains an opening BODY tag. The ToC generated is inserted right after the opening BODY tag, and before any other HTML markup in the file. If the -header option is specified, then the contents of the specified file are inserted after the BODY tag, but before the ToC. Otherwise, htmltoc inserts the text specified by the -toclabel option. Note: The header file should not containing the beginning HTML tag and HEAD element since the HTML file being processed should already contains these tags/elements. _________________________________________________________________ Notes * For the average user, the only thing to worry about is the -title and -toclabel options. Everything else is provided for customizing the behavior of htmltoc. * htmltoc is smart enough to detect anchors inside significant elements. If the anchor defines the NAME attribute, htmltoc uses the value. Else, it adds its own NAME attribute to the anchor. * htmltoc will not process files related to command-line options if they are also specified to be processed for ToC significant elements. Example: The command, "htmltoc -header header.html -toc toc.html *.html" will cause header.html and toc.html to be included in the HTML files to processed due to shell filename globbing of "*.html". htmltoc is smart of enough to detect this, and exempt header.html and toc.html from being processed for ToC significant elements. * The TITLE element is treated specially if specified in the ToC map file. It is illegal to insert anchors (A) into TITLE elements. Therefore, htmltoc will actually link to the filename itself instead of the TITLE element of the document. _________________________________________________________________ Bugs * Probably _________________________________________________________________ Earl Hood, ehood@convex.com htmltoc 1.0.0 beta -- Earl Hood | CONVEX Computer Corporation ehood@convex.com | 3000 Waterview Parkway Phone: (214) 497-4387 | P.O. Box 833851 FAX: (214) 497-4500 | Richardson, TX 75083-3851