Document formats
The question of the format of the
contents of a node is independent of the format of all the
management information (except for the format of the anchor
position within the node content). Therefore, the hypertext
system can be largely defined without specifying the node format.
However, agreement must be reached between client and server
about how they exchange content information. Many hypertext
systems qualify as ªhypermediaº systems because they handle media
other than plain text. Examples are graphics, video and sound
clips, object-oriented graphics definitions, marked-up text, etc.
Most hypermedia
systems on the market today have the same application program
responsible for the hypertext navigation and for the browsing. It
would be safer to separate these features as much as possible:
otherwise, in defining a universal hypertext system, one is
burdened with defining a universal multimedia browser. This would
certainly not stand the test of time. Node content must be left
free to evolve. This implies that format conversion facilities
must be available to allow simple browsers to access data which
is stored in a sophisticated format. Such conversion facilities
tend to exist in many applications, though not, in general, in
hypertext applications.
The format of the content of a node should be as flexible as
possible. Having more than one format is not useful from the
user's point of view -- only from the point of view of an
evolving system. I suggest the following rules:
1. Basic formats
There is a set of formats which every
client must be able to handle. These include 80-column text and
basic hypertext ( HTML ).
2. Conversion
A server providing a format which is not in
the basic set of formats required for a client must have the
possibility of generating some sort of conversion of the text
(even if necessary an apology for non-conversion in the case of
graphics to text) for a client which cannot handle it. This
ensures universal readability world over.
3. Negotiation
For every format, there must be a set of
other possible formats which the server can convert it into, and
the most desirable format is selected by negotiation between the
two parties. The negotiation must take into account:
- the expected translation time, including current load
factors
- the expected data degradation
- the expected transmission time (?!!)
The times one could assume will be roughly proportional to
the length of the document, or at least linear in it.
Application-specific node formats (e.g. physics event) would
allow specialized browsers to perform local processing. This is a
natural extension of the hierarchy of node formats. I would
suggest one stick to the rule that a server providing such a type
of data must provide some default conversion to a standardized
view.
An index or a keyword could be a specific node format which
would be manageable by a browser.
Examples
Examples of rich text formats which exist
already at CERN are as follows, with, in brackets after each,
other formats into which it might be convertible:
-
SGML (
Tex , Postscript, plain text)
- Bookmaster (Postscript, I3812, plain text)
- TeX (DVI, plain text)
- DVI (IBM 3812, Postscript, etc)
- Microsoft RTF (postscript, plain text, Next ªWriteNowº) -
See Specs
- Postscript, Editable
Postscript (IBM 3812 bitmap)
- plain text
When a server (or browser) is obliged to perform a
conversion from one format to another, one imagines that the
result would be cached so that, if the same conversion were
needed later, it would be available more rapidly. Format
conversion, like notification of new material, is something which
can be triggered either by the writer or by the browser. In many
cases, a conversion from, say, SGML into Postscript or plain text
would be made immediately on entry of the new material, and kept
until the source has been updated (See caching , up to design issues ).
©TimBL 1991