HTML to RTF

From MHammond@cmutual.com.au Fri Jun 30 09:17:03 1995
Article: 1358 of comp.infosystems.www.authoring.misc
From: MHammond@cmutual.com.au (Mark Hammond)
Subject: Re: HELP> HTML to RTF
Organization: Colonial Mutual Life

In article <3rrc1q$jmu@ccuf.wlv.ac.uk>, jw@scitsc.wlv.ac.uk says:
>
>I've got "rtftohtml" and find it extremely useful.
>
>Does anyone know of anything that does the reverse.
>i.e, you input HTML and get RTF out.
>
>(I know RTF is much "richer" than HTML, but it should be 
>possible to map  to "Heading 1", etc. It's just
>the generation of the rest of the RTF information that 
>offputs me, and if someone has already done it .....)

I _nearly_ have completed this.  It uses Python (www.python.org) which
has built-in HTML parsing, and a Python OLE2 extension.

How it works is to parse the HTML, and send OLE commands to Word Basic.  
After a document is complete, I get Word to save the file as RTF.

The tool I have is basically a "Web Crawler".  You give it a reference,
and it processes that reference, as well as any sub-references (and
their subrefernces).  You will be able to exclude certain references
(eg, only go n references deep, dont follow links off the current host,
etc)

At the end, it writes a HPJ file, all ready to turn into a .HLP file.

It should be ready in about 1-2 weeks.  Mail me then if you are 
interested.

Mark.