Converting from HTML
Up to full list of filters
Browsers
Most www browsers will convert html to plain text, for example the Linemode browser
or Lynx:
www -na "some-URL" > my-text
lynx -dump "some-URL" > my-text
(See the Lynx documentation)
Mosaic, Netscape Navigator, Internet Explorer and other browsers
will let you "save as" plain text, and in some versions also in other formats
including formatted text and PostScript.
SGML tools
Some SGML tools will allow you to convert HTML to other formats. For instance:
-
gf. A general-purpose SGML compiler.
-
HyperHelp Bridge from
Bristol Technology will convert to RTF.
-
SGML2TeX will convert SGML to TeX on the PC.
-
Fred will convert SGML to
HTML, TeX (PostScript), ASCII etc.
-
sgml2 will convert SGML to other formats.
-
instant from OSF can be used with sgmls to produce
various output formats from standard SGML inputs.
- An HTML to ICADD Transformation Service
translates HTML into the ICADD DTD, suitable for further translation to Braille,
large print or voice synthesis.
Further information is available on
SGML resources and tools.
Other tools
- PostScript
- MS Word / RTF
-
html2rtf
translates HTML to RTF. Using this program and the standard windows help
compiler, you can convert hyperlinked Web pages into Windows HLP files.
Contact: guido.krueger@itzehoe.netsurf.de
-
hh2rtf
is a set of freeware perl scripts that converts most HtmlHelp formatted
HTML to WinHelp-ready RTF. Contact: steve@blighty.com (Steve Atkins).
-
htm2rtf is another
converter for PC/DOS. Contact: sagnier@cena.dgac.fr (Yves Sagnier)
-
html2wrd.zip is a Microsoft Word Basic program to
convert HTML documents (including lists, tables and other formatting) into WinWord documents.
Contact: yura@kpmgk.kiev.ua (Yuri M. Lesiuk)
-
Here is another tool under development
- Frame
-
www_and_frame
will convert HTML to MML for FrameMaker.
See also an HTML to MIFconverter and toolkit
by the same author.
See
Support Info for current situation and sources. Contact: connolly@w3.org (Dan Connolly)
- html2mif is an HTML to Framemaker MIF
converter written in Tcl. Contact: faustus@remarque.berkeley.edu (Wayne A. Christopher)
- HTML-2-Frame is an HTML import filter for FrameMaker for Windows 95/NT from .riess gmbh (Web site in German - auf Deutsch). Contact: thomas@riess.de (Thomas Berendonck).
- WordPerfect
- LaTeX etc
-
html2latex is a program based on the
NCSA html parser.
Contact: Nathan.Torkington@vuw.ac.nz.
- Another html2latex
can combine several HTML files into a single LaTeX file, converting links between the
files to references. External URL's can be converted
into footnotes or into a bibliography sorted on URL.
Contact: F.J.Faase@cs.utwente.nl (Frans J. Faase)
- Another html2latex
implemented on Linux by yacc+lex+C. Also available from the
TSX-11 Linux FTP site
as nc-html2latex-0.97.tar.gz. Contact: naochan@naochan.com (Naoya Tozuka)
- htmlatex.pl
is a perl script to do the conversion (may be moving soon).
Contact: n9146070@cc.wwu.edu (Jake Kesinger)
- There is also a
sed script
to convert HTML into LaTeX.
- Plain text and setext
- Markup Remover from
Aquatic Moon Software is a Windows application to convert HTML into plain text:
there are several output options.
- Remove-It from GME Systems
is a Windows 3.1X based HTML Tag removal utility.
- Here is a Visual Basic application to do the job.
- HTMSTRIP by Bruce
Guthrie for DOS/Windows processes and removes embedded HTML commands
from Web pages. Reflows paragraphs, processes tables, etc as straight ASCII
text.
- HTMLCon for MSDOS converts HTML to ASCII.
- HTML Markdown
is a drag-and-drop Macintosh program that converts HTML files into
regular text files.
- An html parser in perl is available which will also
convert HTML to plain text.
- Here is information about some other html to ascii converters.
- The dehtml option of htmlchek will produce plain
ASCII from HTML.
- html2setext will convert HTML to setext
structured enhanced text which is human-readable. It is available from
Serious Cybernetics.
Contact: xanni@aus.xanadu.com (Andrew Pam)
- Other formats
- An enhanced html parser handles
HP-PCL and other printer formats
- HTMLhelp converts to WinHelp
- see html2rtf for another route to WinHelp
- and hh2rtf for yet another route to WinHelp
- cphtml converts an annotated HTML file
into a perl script as an aid to writing cgi scripts.
- HTMLDBF converts HTML pages into DBF files (dBase III+).
Check out word processor filters, some of
which work both ways, and also HTML editors.
__________________________________________________________________
MS,
CERN
19 March 1999