From MHammond@cmutual.com.au Fri Jun 30 09:17:03 1995
Article: 1358 of comp.infosystems.www.authoring.misc
From: MHammond@cmutual.com.au (Mark Hammond)
Subject: Re: HELP> HTML to RTF
Organization: Colonial Mutual Life

In article <3rrc1q$jmu@ccuf.wlv.ac.uk>, jw@scitsc.wlv.ac.uk says:
>
>I've got "rtftohtml" and find it extremely useful.
>
>Does anyone know of anything that does the reverse.
>i.e, you input HTML and get RTF out.
>
>(I know RTF is much "richer" than HTML, but it should be 
>possible to map 

to "Heading 1", etc. It's just >the generation of the rest of the RTF information that >offputs me, and if someone has already done it .....) I _nearly_ have completed this. It uses Python (www.python.org) which has built-in HTML parsing, and a Python OLE2 extension. How it works is to parse the HTML, and send OLE commands to Word Basic. After a document is complete, I get Word to save the file as RTF. The tool I have is basically a "Web Crawler". You give it a reference, and it processes that reference, as well as any sub-references (and their subrefernces). You will be able to exclude certain references (eg, only go n references deep, dont follow links off the current host, etc) At the end, it writes a HPJ file, all ready to turn into a .HLP file. It should be ready in about 1-2 weeks. Mail me then if you are interested. Mark.