From friendly@hotspur.psych.yorku.ca Mon Mar  6 09:56:48 1995
Article: 21183 of comp.infosystems.www.providers
From: friendly@hotspur.psych.yorku.ca (Michael Friendly)
Subject: Waterlooo Script GML -> HTML translator
Date: Wed, 1 Mar 95 21:59:53 MET
Organization: York University, Ontario, Canada

Below is a description of the revised version of a translator for Waterloo  
Script/GML to HTML, now about 80% complete and available via LISTSERV.  It  
does not work for IBM Script and only works with GML-encoded tags, but does a  
reasonable job.  If anyone improves it, please send the result to me at the  
VM1 address at the end of this note.

Michael Friendly                                        York University

GMLHTML: A GML to HTML Translator for Waterloo Script/GML March 1, 1995

-----------------------------------------------------------------------


                                GMLHTML

A number of  schemes have been developed for  translating various docu-
ment formats to HTML.   Most of these  rely on parsing the source docu-
ment using  languages such as perl,   awk,  REXX,  etc.    For Waterloo
Script/GML documents, I have seen and tried several translation schemes
based  on REXX  and/or Xedit,   and have  not been  satisfied with  the
results.   It then occurred  to me that one could use  Script itself to
perform the translation,  by replacing  the GML macros with equivalents
which output HTML codes.

   A side benefit of  this approach is that the resulting  HTML file is
automatically formatted as  well.   A limitation is  that this approach
can only deal with GML-encoded text;   Script control words (e.g., .bd)
cause the text to be formatted,  but are not translated into HTML equi-
valtents.


How does it work?

The GMLHTML package consists of the  main file,  GMLHTML SCRIPT,  and a
set of subsidiary files, HML$xxxx SCRIPT, which contain the new defini-
tions of GML macros.  A design-goal of this implementation was to allow
the same Script source file to be translated to HTML or to be formatted
normally.  To do this in a general way, the GMLHTML SCRIPT file must be
imbedded in  the Script source file  just before the :GDOC  tag.   HTML
EXEC inserts the line

 .im GMLHTML  ;.* inserted by HTMLPREP

in the source file,   and runs Script to produce a  LISTING file,  then
removes the  GMLHTML imbed line  from the  source file.   (If  you have
TOUCH EXEC,  the  original time stamp of the source  file is restored.)
Finally,  the  LISTING file is  post-processed through a  REXX pipeline
filter to remove ASA control characters, and correct a few awkward fea-
tures of the translation process (such  as page breaks),  producing the
HTML file.


What does it do?

GMLHTML produces HTML encodings for the following GML tags:

*  Headings, H1 - H6 are mapped to  ...  .   Heading levels H1
   and  H2  also  generate  an appropriate  anchor,   of  the  form   for the heading.  Appendices are handled similarly.
   Cross-references to headings (HREF tag) are not currently handled.


*  Lists:  Ordered, unordered, and definition lists use the correspond-
   ing HTML tags <OL>,  <UL>,  and <DL>.   GML glossary lists (GL)  are
   treated like DLs; simple lists (SL) use the HTML <MENU> tag.  Bibli-
   ographic lists (BL) are treated as unordered lists.   You can change
   the assignments for BL and SL by modifying lines in GMLHTML SCRIPT.

*  Highlighted phrases, HP0- HP3 are mapped to the appropriate combina-
   tions of <I>  and <B>.   However,  note that not  all browsers treat
   <I><B> ...  </B></I> cumulatively.   The FONT= attribute, often used
   as :HP0 FONT=MONO  is not treated specially,  and  disappears in the
   output.

*  Figures & tables are treated as pre-formatted text,  using the <pre>
   tag.  This works reasonably well for inline, textual display materi-
   al,  but cannot,  of course,   handle material designed for paste-in
   using the DEPTH= attribute.   Figures and tables generate an anchor,
   of the form <A NAME="Fig_xxx">,  and FIGREF/TABREF tags generate the
   appropriate links (<A HREF="#Fig_xxx">Figure nn</A>).

*  Examples,  XMP,  are treated as pre-formatted text,  using the <pre>
   tag.

*  Quotes:  Q ...  eQ  is mapped using the entity '&quot;'  for the '"'
   character.  Long quotes (LQ ... eLQ) use the <blockquote> tag.

*  Paragrphs, notes are mapped to <P>

*  Equations:  Just a start.   Display formulas (DF)  are surrounded by
   HTML comments and treated as pre-formatted text.

    <!-- DF --><pre>
    y sub ijkl  =  mu sub ijk  +  epsilon sub ijkl ,
    </pre><!-- eDF -->

   What appears inside is whatever the formula processor produces.   If
   you use  ">" and "<" instead  of "gt" and "lt"  inside formulas,
   these will be  translated to the HTML entities,   ">" and "<",
   respectively.   Note  that the characters  "<" and ">"  are reserved
   metacharacters in HTML,   but are used for grouping  in GML formulas
   (e.g.,  g sub <1 1>).   The post-processing carried out by HTML EXEC
   translates the characters "<", "%" (thin space), and ">" inside dis-
   play formulas to blanks.

*  Title page:  The FRONTM and TITLEP tags generate an HTML <HEAD> sec-
   tion,  with  a <TITLE>  tag and HTML  comments constructed  from the
   AUTHOR and DATE tags.   Beware:  if your document does not contain a
   FRONTM section,  the  HTML document produced may  confuse some brow-
   sers.


What doesn't it do?

GMLHTML does NOT currently handle the following GML tags:

*  H0 tag (should be mapped to H1)
*  Table of Contents (TOC)
*  Index (INDEX)
*  Footnotes (FN)
*  Endnotes (EN)
*  Inline formulas (F)
*  Graphic segments inlined with the Script .si control word.


Availability

The GMLHTML package may be  obtained from LISTSERV@YORKVM1 (bitnet)  or
LISTSERV@VM1.YORKU.CA (internet) by sending sending a mail message con-
taining the line:

GET GMLHTML PACKAGE



--
Michael Friendly	Internet: friendly@vm1.yorku.ca
Psychology Department	NeXTmail: friendly@hotspur.psych.yorku.ca
York University		Voice: 416 736 5118
4700 Keele Street	http://www.math.yorku.ca/SCS/friendly.html
Toronto, ONT  M3J 1P3 CANADA