W3C

HTML to various things

An example using the PNG specification

Introduction

The PNG spec exists as a single master file from which several variant formats can be generated for distribution purposes. (The PNG extensions document is processed with the same tools.) This README file explains the format of the master file and the scripts used to generate the distribution versions.

The whole setup is a bit ad-hoc. It'd be better to make the master be in an SGML document format, and use general-purpose SGML tools to generate the output files. But so far I haven't seen any tools that are both stable and well adapted to what we need to do. Perhaps for PNG 2.0, if there ever is one, a format revision will be undertaken.

There are two major textual variants of the PNG spec, W3C and RFC. (They differ only in the introductory boilerplate statements, not in the technical content.) The master file contains all the needed source text, and the processing scripts select which text to include in a particular output file.

Formats generated

We can currently generate seven output formats:

In principle either textual variant could be generated in all four formats, but in practice we don't bother with all combinations.

Each output format is generated by a different Perl script: makemulti, makesingle, maketex, and makenroff respectively. Each of these scripts accepts command line switches "-rfc" and "-w3c" to tell it which optional text to include.

Master file

The master file is in HTML format with a few application conventions (in comments). The master can be viewed directly by an HTML viewer, although cross-references won't work properly. Our extensions all take the form of HTML comments so that they will be ignored by an HTML browser looking directly at the master file. Note that the scripts require each of these commands to appear on a separate line; also, the commands must be spelled exactly as shown.

The primary extension is conditional inclusion. Text and extension commands appearing between IF and ENDIF commands are ignored if not selected for output. The current scripts understand four variants of conditional inclusion for dealing with the two textual variants of the spec:

	<!-- IF RFC -->
	... text to appear only in RFC version ...
	<!-- ENDIF -->

	<!-- IF !RFC -->
	... text to appear except in RFC version ...
	<!-- ENDIF -->

	<!-- IF W3C -->
	... text to appear only in W3C version ...
	<!-- ENDIF -->

	<!-- IF !W3C -->
	... text to appear except in W3C version ...
	<!-- ENDIF -->

There are also commands for including stuff only in a particular output format:

	<!-- NROFF
	... material to appear only in nroff output
	NROFF -->

	<!-- TEX
	... verbatim TeX material to appear only in LaTeX output
	TEX -->

	<!-- IF !TEX -->
	... material to appear only in non-LaTeX output
	<!-- ENDIF -->

	<!-- IF HTML -->
	... material to appear only in HTML output
	<!-- ENDIF -->

Note that in the first two cases, the conditional material need not be legal HTML material, since it is inside an HTML comment.

Another extension is used to (nonconditionally) include other files; currently, this is used only to include an automatically-generated table of contents. The source file contains

	<!-- INCLUDE filename -->

The specified file is included and processed in the same way as other text.

This command controls page header/footer strings in formats where those are appropriate:

	<!-- HEADFOOT type string -->

where "type" is LH for left header, CH for center header, RH for right header, LF for left footer, or CF for center footer. This command is ignored by makesingle and makemulti, since the HTML output format isn't paginated. makenroff converts this command directly to a .ds nroff command. maketex uses the specified strings only in -rfc mode (otherwise it uses a more sensible two-sided format). Note this command should appear only in the <HEAD> part of the master file.

Multi-file HTML

The W3C-style multi-file HTML version of the PNG spec is generated by the command:

        make png.html

The makemulti script needs two special commands that tell it how to split the source into multiple files. These commands are ignored by the other scripts. One of these commands indicates the start of a new output file (new chapter). This is a line containing

	<!-- NEW FILE filename file-title -->

This causes makemulti to finish up the prior output file and begin writing the specified filename. The file-title is used as the title of the new HTML file.

Single file HTML

The W3C-style single-file HTML version of the PNG spec is generated by the command:

        make png-all.html

makesingle doesn't need to worry about adding HTML headers/trailers, since the input file is already HTML. makemulti makes a separate copy of the input file header, up to a line reading

	<!-- END HEADER -->

The copied lines are emitted into each new output file when it is started. At the end of each output file except the last, the fixed text

	</BODY>
	</HTML>

is attached. (The master file is assumed to end with these lines already.) The duplicated text must include the HTML header and <BODY> command, plus any text that should be replicated at the top of each output file. NoteIt should include a <TITLE> command appearing on a separate line; makemulti replaces that line with the proper file title for each subfile. Note: maketex also recognizes END HEADER as the end of the area containing the <H1> commands that make up the LaTeX title.

makemulti also adds navigation links for "previous page", "next page", and "table of contents" to the top and bottom of each output file.

Nroff

The W3C-style single-file HTML version of the PNG spec is generated by the command:

        make png-all.html

Table of contents generation is done differently for each output format. The chapter headings that are to be entered into the TOC must have the format

	<H2><A NAME="ChapterName">Chapter Heading</A></H2>
(all on one line). Section and subsection headers are formatted similarly but use <H3> and <H4> tags. Note that a header without a NAME tag will not be entered in the table of contents, nor will it get a section number.

For HTML output, the "makecontents" script generates a file which contains HTML links; this file is then simply included into the single- and multi- page HTML formats via <!-- INCLUDE -->. makecontents scans the master file for headings formatted as above, and outputs them in the format

	<LI><A HREF="ChapterName">Chapter Heading</A>
together with added <UL> and </UL> commands as needed.

For nroff output, we use several scripts in succession:

  makenroff	Primary conversion of HTML master to nroff source
  makenumbers	Extract page number substitution commands from -PN output
  makedotleaders	Format table of contents with dot leaders

The -PN switch to makenroff changes the nroff output in such a way that the "section headings" in the output from nroff are actually substitution commands providing page number information, like this:

	s|!ChapterName!|!PageNumber!|i; #

These lines are extracted by makenumbers and used to create a table of contents with correct page numbers. Yup, it's a hack.

For LaTeX output, we rely on LaTeX's native table of contents facility. maketex looks for an <H2> header reading 'Table of Contents' and replaces it by a \tableofcontents command; it also skips actual source text up to the next H2 header. Note that it can take as many as three runs of LaTeX to produce correct page numbers in the table of contents.

Cross-references to other sections are represented in the HTML source by standard HTML links. For HTML output, it is necessary to adjust the link names for single-file or multi-file output. (This is the reason why some cross-references don't work when viewing the master file directly: the adjustment hasn't been done.) For non-HTML output, HTML links are optionally expanded to chapter/section numbers, so that what appears in the master file as

	See Rationale: <A HREF="R.Byte-order">Byte order</A>.

will look in the text output like

	See Rationale: Byte order (Section 12.5).

(We choose to apply this section number expansion in the single-page HTML version as well, and could do it in the multi-page version if we wanted. This is just a matter of which scripts are applied by the Makefile.)

Since we have to adjust HTML crossreferences anyway, some abbreviation capability is also provided to make the representation of cross-references a little terser.

A chapter heading anchor is represented in the source by

	NAME="ChapterName"

and cross-references to it have the format

	HREF="ChapterName"

makesingle adjusts crossreferences to the format

	HREF="#ChapterName"

makemulti adjusts crossreferences to the format

	HREF="PNG-ChapterName.html"

which is assumed to be the correct output file name for that chapter. (Hence <H2> NAMEs must agree with NEW FILE commands.)

A unabbreviated section or subsection heading anchor is represented in the source by

	NAME="SectionName"

and cross-references to it have the format

	HREF="ChapterName#SectionName"

where ChapterName is the chapter containing the section.

makesingle adjusts crossreferences to the format

	HREF="#SectionName"

makemulti adjusts crossreferences to the format

	HREF="PNG-ChapterName.html#SectionName"

Note that sections named in this fashion must be unique across the whole document, or the links won't work right in the single-page HTML format.

An abbreviated section or subsection heading anchor is represented in the source by

	NAME="C.SectionName"

where C. is an abbreviation for the chapter containing the section.

Cross-references have the format

	HREF="C.SectionName"

makesingle adjusts such crossreferences to the format

	HREF="#C.SectionName"

makemulti adjusts crossreferences to the format

	HREF="PNG-ChapterName.html#C.SectionName"

where the chapter file name is deduced from the abbreviation.

Note that sections named in this fashion need only be unique within their chapters. The currently recognized chapter abbreviations are

	DR. DataRep
	C. Chunks
	E. Encoders
	D. Decoders
	R. Rationale

Sections appearing in other chapters must have unique names.

The script makeshowxrefs is used to add section numbers to crossreferences. It is applied to the HTML master file and generates a Perl script of substitution commands; then that script is applied to the master file to generate a modified master in which

     "<AHREF=...>Title</A>" 

is expanded to

     "<AHREF=...>Title (Section N)</A>"

(Actually, the expansion format differs depending on whether there's a right paren right after </A>; in that case it looks better to generate "<A HREF=...>Title, Section N</A>".) The modified master is then fed to subsequent scripts when we are generating an output format in which we want the section numbers to appear.

Note: currently, makeshowxrefs and the TOC-generating scripts do not recognize conditional inclusion or the INCLUDE command. Hence, section headings that have NAME anchors must not appear inside conditional segments or included files; otherwise section numbering will be wrong.


Chris
Last modified: $Date: 1996/12/17 16:11:54 $