Dealing with Large Documents

15 - Dealing with Large Documents

Many classic works are available over the Internet, now that their copyright has expired. Downloading these as large documents is time consuming, and a better strategy is to split them up into smaller pieces. Other people have lots of paper documents and wish to make them available electronically. While it is easy to scan these documents in, the size of the images makes them tedious to transfer over the network. Once again, time can be saved by avoiding the need to download the whole document at once. HTML+ makes it easy to do this with explicit or implicit links between the pieces that make up the complete document.

A book might have the following pieces:

`Cover page'
About the author
Copyright and publishing details
Table of contents
Foreword
Preface
Acknowledgement
One or more chapters
One or more appendices
Bibliography
Glossary
Index

Each of these could be held as separate HTML+ subdocuments. The table of contents should obviously include hypertext links to other parts of the book rather than page numbers. You can define a linear sequence through each of these subdocuments by including LINK elements with REL=NEXT and REL=PREVIOUS. This will allow readers to read through each part of the book in turn. You should also include LINKs to the table of contents (REL=CONTENTS) and other key parts (using REL=BOOKMARK).

Generating a hypertext version of the index may prove time consuming, and it may be simpler to offer a full text search facility instead. The INDEX attribute can be used with many HTML+ elements to facilitate automatic generation of a conventional looking index, see Section 13.

Implicit links are useful when you want to reuse a given subdocument in another independent book, and for non-HTML+ formats such as scanned page images. To define implicit links, you need to first create a HTML+ document such as a table of contents, and to make each entry into a hypertext link using the <A> element with the attribute REL="SUBDOCUMENT". When the user follows one of these links, the browser scans the current document to locate the next <A> element with the subdocument relationship. If it reaches the end of the document it looks for a LINK element with REL=NEXT. This procedure is used to imply a LINK element in the retrieved subdocument. A similar process is used to imply a LINK element with REL=PREVIOUS. The other links for the current document are simply inherited, i.e. any bookmarks, glossary or index links that hold for the table of contents, also hold for the subdocument.

The browser then retrieves the subdocument and merges the implied LINKs with any that are given explicitly*1. If the user now presses the "Next" button on the toolbar (or menu), the browser follows the implicit link to the next subdocument. The browser needs to look again at the parent document to find the new next subdocument. This mechanism is difficult to explain, but simple to write documents for. All that authors need to do, is to remember to include the subdocument relationship when defining hypertext links.

For a hundred page scanned document where each page is held as a separate file, the "table of contents" is going to be pretty dull, and there is little point creating it as an HTML+ node. Instead, you should use an HTTP server which passes the missing LINK elements as header fields for each page image. The suggested representation for these header fields uses the same attributes and syntax as the LINK element:

WWW-Link: REL="Next" HREF="http://www.w3.org/...."

There could be several WWW-Link: headers, one for each implied LINK. This idea puts the burden on the server to supply such links as appropriate to each requested document.

HTML+ Discussion Document - November 8, 1993

[Top] [Up] [Next] [Previous]