From phelps@ecstasy.cs.berkeley.edu Fri Jul 28 10:11:52 1995
Article: 6946 of comp.infosystems.www.announce
From: phelps@ecstasy.cs.berkeley.edu (Tom Phelps)
Subject: SOFTWARE: RosettaMan, a manual page to HTML converter
Date: Thu, 27 Jul 95 05:48:47 GMT-1:00
Organization: University of California, Berkeley


RosettaMan is distinguished from other manual page to HTML filters in
several ways: it makes the most aggressive analysis of pages
(attempting to identify lists, for instance), it generates several
types of output including HTML, and it recognizes the man pages of
many flavors of UNIX, each of which varies in some way from the others.
RosettaMan has been around for some time and quite it's stable, though
for some reason it has never been announced in this group.

Tom

----------------

RosettaMan is a filter for UNIX manual pages.  It takes as input man
pages formatted for a variety of UNIX flavors (i.e., formatted
[tn]roff source) and produces as output a variety of file formats.
Currently RosettaMan accept man pages as formatted by the following
flavors of UNIX: Hewlett-Packard HP-UX, AT&T System V, SunOS, Sun
Solaris, OSF/1, DEC Ultrix, SGI IRIX, Linux, SCO, FreeBSD; and
produces output for the following formats: printable ASCII only (with
page headers and footers stripped), section and subsection headers
only, TkMan, [tn]roff, Ensemble, HTML, LaTeX, RTF, Perl 5 pod.

RosettaMan improves upon other man page filters in several ways: (1) its
analysis recognizes the structural pieces of man pages, enabling high
quality output, (2) its modular structure permits easy augmentation of
output formats, (3) it accepts man pages formatted with the variant
macros of many different flavors of UNIX, and (4) it doesn't require
modification of or cooperation with any other program.

RosettaMan is a rewrite of TkMan's man page filter, called bs2tk.  (If
you haven't heard about TkMan, a hypertext man page browser written in
Tcl/Tk, you can grab it via anonymous ftp from the same place as
RosettaMan.)  Whereas bs2tk generated output only for TkMan,
RosettaMan generalizes the process so that the analysis can be
leveraged to new output formats.  A single analysis engine recognizes
section heads, subsection heads, body text, lists(!), references to
other man pages, boldface, italics, bold italics, special characters
(like bullets) and strips out page headers and footers.  The engine
sends signals to the selected output functions so that an improvement
of the engine improves the quality of output of all of them.  Output
format functions are easy to add, and thus far average about about 75
lines of C code each.

A note for HTML consumers:  This filter does real (heuristic)
parsing--no <PRE>!  Man page references are turned into hypertext
links.  This file
is an example of the quality of output produced entirely automatically
(no retouching) by RosettaMan.  Several people have extended World
Wide Web servers to format man pages on the fly.  Check the README
file in the contrib directory for a list.

CHANGES in 2.2

* when in SEE ALSO, hyphens would confuse man page-reference finder, 
  so re-linebreak if necessary to eliminate them (!) (Greg Earle & Uri Guttman)


-- 
phelps@CS.Berkeley.EDU

--