Document Management for Web Specs

Aaargh! Maintaining specs is a Royal Pain! We need to automate this!

See also:




Possible Solutions

Multiformat tools
This was used for the PNG spec
FrameMaker, WebMaker, ??? print-to-text tool
This is what Roy Fielding (and a lot of other folks) use.
HTML+, dsr's tools
Dave Raggett edits the HTML with a text editor (mostly BBEdit on a Mac). He's got some little tools written in C to produce plain text.
Snafu DTD, gf tools, Texi2HTML, COST, Joe English
This is what I ended up using for HTML 2.0
LaTeX, latex2html, IETF print-to-text tools
MS Word, rtf2html, ??? print-to-text tool

Ideal Solution

Source format: HTML dialect
use a strict HTML dialect with: tables, class=abstract, possibly math.
Document Manipulation API: java interface
There are lots of web libraries for python. We could eventually specify the interfaces in ILU and use them from lots of languages (C, C++, java, scheme, CommonLisp, Modula-3), but we'd prototype and develop using python.

I've already written little tools to do things like relativize links and such. Rather than doing TOC generation, section nubmbering, etc. during translation, we'd do it in-place in the source, but automatically

changed my mind, since java has at least the potential to address the installation bugs. Plus, it looks like we can write for the java VM in scheme (see kawa)

Chunking support: python scripts
This would handle chunking many HTML documents into one for printing, and many-to-many chunking for author/reader convenience.
PostScript Output: python implementation of Mosaic print tool
This code is already written. Guido translated the postscript printing code from Mosaic into python. We could adapt things like headers/footers for our needs. This eliminates the need for a TeX installation.
Postscript Output: libwww TeX module?
use HTTeXGen module in libwww to generate TeX. It doesn't currently support all the features we need, but it could work. It would rely on a many-to-one html-to-html filter
Postscript Output: html2lout?
lout is kinda like TeX, but it was written since the dawn of postscript, so there's less redundancy between lout and PS than between TeX and PS. The syntax of lout is also cleaner. Lout has table, equasion, etc. packages. A clean html2lout filter should be much more reliable and hands-free than anything based on TeX.
Plain-Text output: custom python app?
there is already python code to do simple html to text formatting, but handling multiple documents, tables etc. needs to be added, as well as IETF style
Plain-Text output: libwww module?
same feature enhancements would be needed.

Wish list

Direct manipulation grammar editor
for SGML DTDs, RFC822 grammars in HTTP specs, etc.


structured document conversion. in use at vuw.ac.nz. Gotta check it out...
Debiandoc-SGML markup manual
4 February 1997 Ian Jackson ijackson@gnu.ai.mit.edu.

source in debian archive under text

Dr. Thomas F. Gordon
GMD FIT - German National Research Center for Information Technology
Research Division Artificial Intelligence
53754 Sankt Augustin, Germany
email: thomas.gordon@gmd.de; phone: (+49 2241) 14-2665
The Collection of Computer Science Bibliographies
Copyright © 1995-1996 Alf-Christian Achilles

Great for the reference section!

example of using spam to munge HTML
SGML tools
as used in The Linux Documentation Project
Re: Lout to HTML Jin S. Choi (jsc@atype.com) Wed, 13 Nov 1996 19:34:47 -0500 . Nifty thread about LOUT, SGML, DSSSL, HTML, etc. I agree!
Joe English
Eric Raymond
Linux, computational linguistics, www-html <199512221800.NAA09004@locke.ccil.org>
IETF draft guidelines

@@ I know from first-hand experience that producing multi-purpose technical specifications (e.g. IETF plain text, online hypertext, and postscript) is tricky and tedious. I try to keep track of tools that might provide solutions to this problem.

a new C++ based SGML parser by James Clark, the author of SGMLS
DTD Fragments
Another SGMLS/Perl formatter, DTD Fragments. It's not DTD specific and does output to HTML, ASCII and TROFF, it does require a DTD to generic element mapping in Perl for any specific DTD and comes with DocBook and Linuxdoc mappings. The next version will have RTF output, Snafu DTD mapping and better support for applying different styles to the output.
Ken MacLeod
Another perl5/ngmls toolet. Includes some support for DocBook->LaTeX, HTML conversion, though that part of the code looks like a one-time shot, not a complete implementation.
An SGML DTD documentation/navigation tool by Earl Hood
This tool translates an SGML DTD into HTML, providing hypertext navigation of the document structure. Handy for learning SGML.
A GNU Emacs mode for SGML files
Getting HotMetaL by FTP
SoftQuad Inc. Panorama Press Release
Linux Doc/SGML
These guys have taken a very practical approach to SGML for technical documentation. They started with SGMLs from James Clark and the QWERTZ DTD, which mirrors LaTeX structure. Then they added down-translators for groff, HTML, and others. Looks promising.

Hmmm... on closer examination, this is something of a hack. They hacked the DTD, hacked the down-translators, etc. I like the idea of using a LaTeX-like DTD, but I think I'll wait till this matures a little more. also: distribution archive.

GF: General SGML Formatter
another SGMLs based SGML to HTML converter supporting a few sophisticated DTDs
Setting up PSGML and sgmls for HTML
Remote file ftp.jclark.com/pub/sp
Copenhagen SGMLs Tool -- SGMLs meets Tcl
maintained by Joe English

Dan Connolly
created 1995/12/05
last update by $Author: connolly $ on $Date: 1999/11/23 20:35:13 $