The Languages of Thot

Vincent Quint

Translated from French by Ethan Munson

Version of January 14, 2000

© 1996 INRIA


Contents


The document model of Thot

All of the services which Thot provides to the user are based on the system's internal document representation. This representation is itself derived from the document model which underlies Thot. The model is presented here, prior to the description of the languages which permit the generic specification of documents.

The logical structure of documents

The document model of Thot is primarily designed to allow the user to operate on those entities which s/he has in mind when s/he works on a document. The model makes no assumptions about the nature of these entities. It is essentially these logical entities, such as paragraphs, sections, chapters, notes, titles, and cross-references which give a document its logical structure.

Because of this model, the author can divide the document into chapters, giving each one a title. The content of these chapters can be further divided into sections, subsections, etc. The text is organized into successive paragraphs, according to the content. In the writing phase, the lines, pages, margins, spacing, fonts, and character styles are not very important. In fact, if the system requires documents to be written in these terms, it gets in the way. So, Thot's model is primarily based on the logical aspect of documents. The creation of a model of this type essentially requires the definition :

The choice of entities to include in the model can be subtle. Some documents require chapters, while others only need various levels of sections. Certain documents contain appendices, others don't. In different documents the same logical entity may go by different names (e.g. ``Conclusion'' and ``Summary''). Certain entities which are absolutely necessary in some documents, such as clauses in a contract or the address of the recipient in a letter, are useless in most other cases.

The differences between documents result from more than just the entities that appear in them, but also from the relationships between these entities and the ways that they are linked. In certain documents, notes are spread throughout the document, for example at the bottom of the page containing the cross-reference to them, while in other documents they are collected at the end of each chapter or even at the end of the work. As another example, the introduction of some documents can contain many sections, while in other documents, the introduction is restricted to be a short sequence of paragraphs.

All of this makes it unlikely that a single model can describe any document at a relatively high level. It is obviously tempting to make up a list of widely used entities, such as chapters, sections, paragraphs, and titles, and then map all other entities onto the available choices. In this way, an introduction can be supported as a chapter and a contract clause supported as a paragraph or section. However, in trying to widen the range of usage of certain entities, their meaning can be lost and the power of the model reduced. In addition, while this widening partially solves the problem of choosing entities, it does not solve the problem of their organization: when a chapter must be composed of sections, how does one indicate that an introduction has none when it is merely another chapter? One solution is to include introductions in the list of supported entities. But then, how does one distinguish those introductions which are allowed to have sections from those which are not. Perhaps this could be done by defining two types of introduction. Clearly, this approach risks an infinite expansion of the list of widely used entities.

Generic and specific structures

Thus, it is apparently impossible to construct an exhaustive inventory of all those entities which are necessary and sufficient to precisely describe any document. It also seems impossible to specify all possible arrangements of these entities in a document. This is why Thot uses a meta-model instead, which permits the description of numerous models, each one describing a class of documents.

A class is a set of documents having very similar structure. Thus, the collection of research reports published by a laboratory constitutes a class; the set of commercial proposals by the sales department of a company constitutes another class; the set of articles published by a journal constitutes a third class. Clearly, it is not possible to enumerate every possible document class. It is also clear that new document classes must be created to satisfy new needs and applications.

To give a more rigorous definition of classes, we must introduce the ideas of generic structure and specific structure. Each document has a specific structure which organizes the various parts which comprise it. We illustrate this with the help of a simple example comparing two reports, A and B (see Figure). The report A contains an introduction followed by three chapters and a conclusion. The first chapter contains two sections, the second, three sections. That is the specific structure of document A. Similarly, the structure of document B is: an introduction, two chapters, a conclusion; Chapter 1 has three sections while Chapter 2 has four. The specific structures of these two documents are thus different.


        Report A                 Report B
             Introduction              Introduction
             Chapter 1                 Chapter 1
                  Section 1.1               Section 1.1
                  Section 1.2               Section 1.2
             Chapter 2                      Section 1.3
                  Section 2.1          Chapter 2
                  Section 2.2               Section 2.1
                  Section 2.3               Section 2.2
             Chapter 3                      Section 2.3
             Conclusion                     Section 2.4
                                       Conclusion

Two specific structures


The generic structure defines the ways in which specific structures can be constructed. It specifies how to generate specific structures. The reports A and B, though different, are constructed in accordance with the same generic structure, which specifies that a report contains an introduction followed by a variable number of chapters and a conclusion, with each chapter containing a variable number of sections.

There is a one-to-one correspondence between a class and a generic structure: all the documents of a class are constructed in accordance with the same generic structure. Hence the definition of the class: a class is a set of documents whose specific structure is constructed in accordance with the same generic structure. A class is characterized by its generic structure.

Thus, a generic structure can be considered to be a model at the level which interests us, but only for one class of documents. When the definition is limited to a single class of documents, it is possible to define a model which does a good job of representing the documents of the class, including the necessary entities and unencumbered by useless entities. The description of the organization of the documents in the class can then be sufficiently precise.

Logical structure and physical structure

Generic structures only describe the logical organization of documents, not their physical presentation on a screen or on sheets of paper. However, for a document to be displayed or printed, its graphic presentation must be taken into account.

An examination of current printed documents shows that the details of presentation essentially serve to bring out their logical structure. Outside of some particular domains, notably advertising, the presentation is rarely independent of the logical organization of the text. Moreover, the art of typography consists of enhancing the organization of the text being set, without catching the eye of the reader with overly pronounced effects. Thus, italic and boldface type are used to emphasize words or expressions which have greater significance than the rest of the text: keywords, new ideas, citations, book titles, etc. Other effects highlight the organization of the text: vertical space, margin changes, page breaks, centering, eventually combined with the changes in the shapes or weight of the characters. These effects serve to indicate the transitions between paragraphs, sections, or chapters: an object's level in the logical structure of the document is shown by the markedness of the effects.

Since the model permits the description of all of the logical structure of the document, the presentation can be derived from the model without being submerged in the document itself. It suffices to use the logical structure of the document to make the desired changes in its presentation: changes in type size, type style, spacing, margin, centering, etc.

Just as one cannot define a unique generic logical structure for all document classes, one cannot define universal presentation rules which can be applied to all document classes. For certain types of documents the chapter titles will be centered on the page and printed in large, bold type. For other documents, the same chapter titles will be printed in small, italic type and aligned on the left margin.

Therefore, it is necessary to base the presentation specifications for documents on their class. Such a specification can be very fine-grained, because the presentation is expressed in terms of the entities defined in the generic logical structure of the class. Thus, it is possible to specify a different presentation for the chapter titles and the section titles, and similarly to specify titles for the sections according to their level in the section hierarchy. The set of rules which specify the presentation of all the elements defined in a generic logical structure is called a generic presentation.

There are several advantages derived from having a presentation linked to the generic structure and described by a generic presentation. Homogeneity is the first. Since every document in a class corresponds to the same generic logical structure, a homogenous presentation for different documents of the same class can be assured by applying the same generic presentation to all documents of the class. Homogeneity of presentation can also be found among the entities of a single document: every section heading will be presented in the same way, the first line of every paragraph of the same type will have the same indentation, etc.

Another advantage of this approach to presentation is that it facilitates changes to the graphical aspect of documents. A change to the generic presentation rules attached to each type of entity will alter the presentation of the entire document, and will do so homogenously. In this case, the internal homogeneity of the class is no longer assured, but the way to control it is simple. It suffices to adopt a single generic presentation for the entire class.

If the presentation of the class does not have to be homogenous, then the appearance of the document can be adapted to the way it will be used or to the device used to render it. This quality is sufficient to allow the existence of many generic presentations for the same document class. By applying one or the other of these presentations to it, the document can be seen under different graphical aspects. It must be emphasized that this type of modification of the presentation is not a change to the document itself (in its specific logical structure or its content), but only in its appearance at the time of editing or printing.

Document structures and object structures

So far, we have only discussed the global structure of documents and have not considered the contents found in that structure. We could limit ourselves to purely textual contents by assuming that a title or a paragraph contains a simple linear text. But this model would be too restrictive. In fact, certain documents contain not only text, but also contain tables, diagrams, photographs, mathematical formulas, and program fragments. The model must permit the representation of such objects.

Just as with the whole of the document, the model takes into account the logical structure of objects of this type. Some are clearly structured, others are less so. Logical structure can be recognized in mathematical formulas, in tables, and in certain types of diagrams. On the other hand, it is difficult to define the structure of a photograph or of some drawings. But in any case, it does not seem possible to define one unique structure which can represent every one of these types of objects. The approach taken in the definition of meta-structure and document classes also applies to objects. Object classes can be defined which put together objects of similar type, constructed from the same generic logical structure.

Thus, a mathematical class can be defined and have a generic logical structure associated with it. But even if a single generic structure can represent a sufficient variety of mathematical formulas, for other objects with less rigorous structure, multiple classes must be defined. As for documents, using multiple classes assures that the model can describe the full range of objects to be presented. It also permits the system to support objects which were not initially anticipated. Moreover, this comment applies equally to mathematics: different classes of formulas can be described depending on the domain of mathematics being described.

Since objects have the same level of logical representation as documents, they gain the same advantages. In particular, it is possible to define the presentation separately from the objects themselves and attach it to the class. Thus, as for documents, objects of the same type have a uniform presentation and the presentation of every object in a given class can be changed simply by changing the generic presentation of the class. Another advantage of using this document model is that the system does not bother the user with the details of presentation, but rather allows the user to concentrate on the logical aspect of the document and the objects.

It is clear that the documents in a class do not necessarily use the same classes of objects: one technical report will contain tables while another report will have no tables but will use mathematical formulas. The usable object classes are not always mentioned in a limiting way in the generic logical structure of documents. Rather, they can be chosen freely from a large set, independent of the document class.

Thus, the object classes will be made commonplace and usable in every document. The notion of ``object'' can be enlarged to include not only non-textual elements, but also certain types of textual elements which can appear in practically every document, whatever their class. Among these textual elements, one can mention enumerations, descriptions, examples, quotations, even paragraphs.

Thus, the document model is not a single, general model describing every type of document in one place. Rather, it is a meta-model which can be used to describe many different models each of which represents either a class of similar documents or a class of similar objects which every document can include.


The S language

Document meta-structure

Since the concept of meta-structure is well suited to the task of describing documents at a high level of abstraction, this meta-structure must be precisely defined. Toward that end this section first presents the basic elements from which documents and structured objects are composed and then specifies the ways in which these basic elements are assembled into structures representing complete documents and objects.

The basic types

At the lowest level of a document's structure, the first atom considered is the character. However, since characters are seldom isolated, usually appearing as part of a linear sequence, and in order to reduce the complexity of the document structure, character strings are used as atoms and consecutive characters belonging to the same structural element are grouped in the same character string.

If the structure of a document is not refined to go down to the level of words or phrases, the contents of a simple paragraph can be considered to be a single character string. On the other hand, the title of a chapter, the title of the first section of that chapter, and the text of the first paragraph of that section constitute three different character strings, because they belong to distinct structural elements.

If, instead, a very fine-grained representation for the structure of a document is sought, character strings could be defined to contain only a single word, or even just a single character. This is the case, for example, in programs, for which one wants to retain a structure very close to the syntax of the programming language. In this case, an assignment statement initializing a simple variable to zero would be composed of two structural elements, the identifier of the variable (a short character string) and the assigned value (a string of a single character, `0').

The character string is not the only atom necessary for representing those documents that interest us. It suffices for purely textual documents, but as soon as the non-textual objects which we have considered arise, there must be other atoms; the number of objects which are to be represented determines the number of types of atoms that are necessary.

Primitive graphical elements are used for tables and figures of different types. These elements are simple geometric shapes like horizontal or vertical lines, which are sufficient for tables, or even oblique lines, arrows, rectangles, circles, polygons, and curves for use in figures. From these elements and character strings, graphical objects and tables can be constructed.

Photographs, though having very little structure, must still appear in documents. They are supported by picture elements, which are represented as matrices of pixels.

Finally, mathematical notations require certain elements which are simultaneously characters and graphical elements, the symbols. By way of example, radicals, integration signs, or even large parentheses are examples of this type of atom. The size of each of these symbols is determined by its environment, that is to say, by the expression to which it is attached.

To summarize, the primitive elements which are used in the construction of documents and structured objects are:

  • character strings,
  • graphical elements,
  • pictures,
  • and mathematical symbols.

Constructed elements

A document is evidently formed from primitive elements. But the model of Thot also proposes higher level elements. Thus, in a document composed of several chapters, each chapter is an element, and in the chapters each section is also an element, and so on. A document is thus an organized set of elements.

In a document there are different sorts of elements. Each element has a type which indicates the role of the element within the document as a whole. Thus, we have, for example, the chapter and section types. The document is made up of typed elements: elements of the type chapter and elements of the type section, among others, but also character string elements and graphical elements: the primitive elements are typed elements just as well. At the other extreme, the document itself is also considered to be a typed element.

The important difference between the primitive elements and the other elements of the document is that the primitive elements are atoms (they cannot be decomposed), whereas the others, called constructed elements, are composed of other elements, which can either be primitive elements or constructed elements. A constructed element of type chapter (or more simply, ``a chapter'') is composed of sections, which are also constructed elements. A paragraph, a constructed element, can be made up of character strings, which are primitive elements, and of equations, which are constructed elements.

A document is also a constructed element. This is an important point. In particular, it allows a document to be treated as part of another document, and conversely, permits a part of a document to be treated as a complete document. Thus, an article presented in a journal is treated by its author as a document in itself, while the editor of the journal considers it to be part of an issue. A table or a figure appearing in a document can be extracted and treated as a complete document, for example to prepare transparencies for a conference.

These thoughts about types and constructed elements apply just as well to objects as they do to documents. A table is a constructed element made up of other constructed elements, rows and columns. A row is formed of cells, which are also constructed elements which contain primitive elements (character strings) and/or constructed elements like equations.

Logical structure constructors

Having defined the primitive elements and the constructed elements, it is now time to define the types of organization which allow the building of structures. For this, we rely on the notion of the constructor. A constructor defines a way of assembling certain elements in a structure. It resides at the level of the meta-structure: it does not describe the existing relations in a given structure, but rather defines how elements are assembled to build a structure that conforms to a model.

In defining the overall organization of documents, the first two constructors considered are the aggregate and the list.

Aggregate and List

The aggregate constructor is used to define constructed element types which are collections of a given number of other elements. These collections may or may not be ordered. The elements may be either constructed or primitive and are specified by their type. A report (that is, a constructed element of the report type) has an aggregate structure. It is formed from a title, an author's name, an introduction, a body, and a conclusion, making it a collection of five element types. This type of constructor is found in practically every document, and generally at several levels in a document.

The list constructor is used to define constructed elements which are ordered sequences of elements (constructed or primitive) having the same type. The minimum and maximum numbers of elements for the sequence can be specified in the list constructor or the number of elements can be left unconstrained. The body of a report is a list of chapters and is typically required to contain a minimum of two chapters (is a chapter useful if it is the only one in the report?) The chapter itself can contain a list of sections, each section containing a list of paragraphs. In the same way as the aggregate, the list is a very frequently used constructor in every type of document. However, these two constructors are not sufficient to describe every document structure; thus other constructors supplement them.

Choice, Schema, and Unit

The choice constructor is used to define the structure of an element type for which one alternative is chosen from several possibilities. Thus, a paragraph can be either a simple text paragraph, or an enumeration, or a citation.

The choice constructor indicates the complete list of possible options, which can be too restrictive in certain cases, the paragraph being one such case. Two constructors, unit and schema, address this inconvenience. They allow more freedom in the choice of an element type. If a paragraph is defined by a schema constructor, it is possible to put in the place of a paragraph a table, an equation, a drawing or any other object defined by another generic logical structure. It is also possible to define a paragraph as a sequence of units, which could be character strings, symbols, or pictures. The choice constructor alone defines a generic logical structure that is relatively constrained; in contrast, using units and schemas, a very open structure can be defined.

The schema constructor represents an object defined by a generic logical structure chosen freely from among those available.

The unit constructor represents an element whose type can be either a primitive type or an element type defined as a unit in the generic logical structure of the document, or in another generic logical structure used in the document. Such an element may be used in document objects constructed according to other generic structures.

Thus, for example, if a cross-reference to a footnote is defined in the generic logical structure ``Article'' as a unit, a table (an object defined by another generic structure) can contain cross-references to footnotes, when they appear in an article. In another type of document, a table defined by the same generic structure can contain other types of elements, depending on the type of document into which the table is inserted. All that is needed is to declare, in the generic structure for tables, that the contents of cells are units. In this way, the generic structure of objects is divided up between different types of documents which are able to adapt themselves to the environment into which they are inserted.

Reference and Inclusion

The reference is used to define document elements that are cross-references to other elements, such as a section, a chapter, a bibliographic citation, or a figure. The reference is bi-directional. It can be used to access both the element being cross-referenced and each of the elements which make use of the cross-reference.

References can be either internal or external. That is, they can designate elements which appear in the same document or in another document.

The inclusion constructor is a special type of reference. Like the reference, it is an internal or external bidirectional link, but it is not a cross-reference. This link represents the ``live'' inclusion of the designated element; it accesses the most recent version of that element and not a ``dead'' copy, fixed in the state in which it was found at the moment the copy was made. As soon as an element is modified, all of its inclusions are automatically brought up to date. It must be noted that, in addition to inclusion, Thot permits the creation of ``dead'' copies.

There are three types of inclusions: inclusions with full expansion, inclusions with partial expansion, and inclusions without expansion. During editing, inclusions without expansion are represented on the screen by the name of the included document, in a special color, while inclusions with expansion (full or partial) are represented by a copy (full or partial) of the included element (also in a special color). The on-screen representation of a partial inclusion is a ``skeleton'' image of the included document.

Inclusion with complete expansion can be used to include parts of the same document or of other documents. Thus, it can be either an internal or an external link. It can be used to include certain bibliographic entries of a scientific article in another article, or to copy part of a mathematical formula into another formula of the same document, thus assuring that both copies will remain synchronized.

Inclusion without expansion or with partial expansion is used to include complete documents. It is always an external link. It is used primarily to divide very large documents into sub-documents that are easier to manipulate, especially when there are many authors. So, a book can include some chapters, where each chapter is a different document which can be edited separately. When viewing the book on the screen, it might be desirable to see only the titles of the chapters and sections. This can be achieved using inclusion with partial expansion.

During printing, inclusions without expansion or with partial expansion can be represented either as they were shown on the screen or by a complete (and up-to-date) copy of the included element or document.

The inclusion constructor, whatever its type, respects the generic structure: only those elements authorized by the generic structure can be included at a given position in a document.

Mark pairs

It is often useful to delimit certain parts of a document independently from the logical structure. For example, one might wish to attach some information (in the form of an attribute) or a particular treatment to a group of words or a set of consecutive paragraphs. Mark pairs are used to do this.

Mark pairs are elements which are always paired and are terminals in the logical structure of the document. Their position in the structure of the document is defined in the generic structure. It is important to note that when the terminals of a mark pair are extensions (see the next section), they can be used quite freely.

Restrictions and Extensions

The primitive types and the constructors presented so far permit the definition of the logical structure of documents and objects in a rigorous way. But this definition can be very cumbersome in certain cases, notably when trying to constrain or extend the authorized element types in a particular context. Restrictions and extensions are used to cope with these cases.

A restriction associates with a particular element type A, a list of those element types which elements of type A may not contain, even if the definition of type A and those of its components authorize them otherwise. This simplifies the writing of generic logical structures and allows limitations to be placed, when necessary, on the choices offered by the schema and unit constructors.

Extensions are the inverse of restrictions. They identify a list of element types whose presence is permitted, even if its definition and those of its components do not authorize them otherwise.

Summary

Thus, four constructors are used to construct a document:

  • the aggregate constructor (ordered or not),
  • the list constructor,
  • the choice constructor and its extensions, the unit and schema constructors,
  • the reference constructor and its variant, the inclusion.

These constructors are also sufficient for objects. Thus, these constructors provide a homogenous meta-model which can describe both the organization of the document as a whole and that of the various types of objects which it contains. After presenting the description language for generic structures, we will present several examples which illustrate the appropriateness of the model.

The first three constructors (aggregate, list and choice) lead to tree-like structures for documents and objects, the objects being simply the subtrees of the tree of a document (or even of other objects' subtrees). The reference constructor introduces other, non-hierarchical, relations which augment those of the tree: when a paragraph makes reference to a chapter or a section, that relation leaves the purely tree-like structure. Moreover, external reference and inclusion constructors permit the establishment of links between different documents, thus creating a hypertext structure.

Associated Elements

Thanks to the list, aggregate and choice constructors, the organization of the document is specified rigorously, using constructed and primitive elements. But a document is made up of more than just its elements; it clearly also contains links between them. There exist elements whose position in the document's structure is not determinable. This is notably the case for figures and notes. A figure can be designated at many points in the same document and its place in the physical document can vary over the life of the document without any effect on the meaning or clarity of the document. At one time, it can be placed at the end of the document along with all other figures. At another time, it can appear at the top of the page which follows the first mention of the figure. The figures can be dispersed throughout the document or can be grouped together. The situation is similar for notes, which can be printed at the bottom of the page on which they are mentioned or assembled together at the end of the chapter or even the end of the work. Of course, this brings up questions of the physical position of elements in documents that are broken into pages, but this reflects the structural instability of these elements. They cannot be treated the same way as elements like paragraphs or sections, whose position in the structure is directly linked to the semantics of the document.

Those elements whose position in the structure of the document is not fixed, even though they are definitely part of the document, are called associated elements. Associated elements are themselves structures, which is to say that their content can be organized logically by the constructors from primitive and constructed elements.

It can happen that the associated elements are totally disconnected from the structure of the document, as in a commentary or appraisal of the entire work. But more often, the associated elements are linked to the content of the document by references. This is generally the case for notes and figures, among others.

Thus, associated elements introduce a new use for the reference constructor. It not only serves to create links between elements of the principal structure of the document, but also serves to link the associated elements to the primary structure.

Attributes

There remain logical aspects of documents that are not entirely described by the structure. Certain types of semantic information, which are not stated explicitly in the text, must also be taken into account. In particular, such information is shown by typographic effects which do not correspond to a change between structural elements. In fact, certain titles are set in bold or italic or are printed in a different typeface from the rest of the text in order to mark them as structurally distinct. But these same effects frequently appear in the middle of continuous text (e.g. in the interior of a paragraph). In this case, there is no change between structural elements; the effect serves to highlight a word, expression, or phrase. The notion of an attribute is used to express this type of information.

An attribute is a piece of information attached to a structural element which augments the type of the element and clarifies its function in the document. Keywords, foreign language words, and titles of other works can all be represented by character strings with attached attributes. Attributes may also be attached to constructed elements. Thus, an attribute indicating the language can be attached to a single word or to a large part of a document.

In fact, an attribute can be any piece of information which is linked to a part of a document and which can be used by agents which work on the document. For example, the language in which the document is written determines the set of characters used by an editor or formatter. It also determines the algorithm or hyphenation dictionary to be used. The attribute ``keyword'' facilitates the work of an information retrieval system. The attribute ``index word'' allows a formatter to automatically construct an index at the end of the document.

As with the types of constructed elements, the attributes and the values they can take are defined separately in each generic logical structure, not in the meta-model, according to the needs of the document class or the nature of the object.

Many types of attributes are offered: numeric, textual, references, and enumerations:

  • Numeric attributes can take integer values (negative, positive, or null).
  • Textual attributes have as their values character strings.
  • Reference attributes designate an element of the logical structure.
  • Enumeration attributes can take one value from a limited list of possible values, each value being a name.

In a generic structure, there is a distinction between global attributes and local attributes. A global attribute can be applied to every element type defined in the generic structure where it is specified. In contrast, a local attribute can only be applied to certain types of elements, even only a single type. The ``language'' attribute presented above is an example of a global attribute. An example of a local attribute is the rank of an author (principal author of the document or secondary author): this attribute can only be applied sensibly to an element of the ``author'' type.

Attributes can be assigned to the elements which make up the document in many different ways. The author can freely and dynamically place them on any part of the document in order to attach supplementary information of his/her choice. However, attributes may only be assigned in accordance with the rules of the generic structure; in particular, local attributes can only be assigned to those element types for which they are defined.

In the generic structure, certain local attributes can be made mandatory for certain element types. In this case, Thot automatically associates the attribute with the elements of this type and it requires the user to provide a value for this attribute.

Attributes can also be automatically assigned, with a given value, by every application processing the document in order to systematically add a piece of information to certain predefined elements of the document. By way of example, in a report containing a French abstract and an English abstract, each of the two abstracts is defined as a sequence of paragraphs. The first abstract has a value of ``French'' for the ``language'' attribute while the second abstract's ``language'' attribute has a value of ``English''.

In the case of mark pairs, attributes are logically associated with the pair as a whole, but are actually attached to the first mark.

Discussion of the model

The notions of attribute, constructor, structured element, and associated element are used in the definition of generic logical structures of documents and objects. The problem is to assemble them to form generic structures. In fact, many types of elements and attributes can be found in a variety of generic structures. Rather than redefine them for each structure in which they appear, it is best to share them between structures. The object classes already fill this sharing function. If a mathematical class is defined, its formulas can be used in many different document classes, without redefining the structure of each class. This problem arises not only for the objects considered here; it also arises for the commonplace textual elements found in many document classes. This is the reason why the notion of object is so broad and why paragraphs and enumerations are also considered to be objects. These object classes not only permit the sharing of the structures of elements, but also of the attributes defined in the generic structures.

Structure, such as that presented here, can appear very rigid, and it is possible to imagine that a document editing system based on this model could prove very constraining to the user. This is, in fact, a common criticism of syntax-directed editors. This defect can be avoided with Thot, primarily for three reasons:

  • the generic structures are not fixed in the model itself,
  • the model takes the dynamics of documents into account,
  • the constructors offer great flexibility.

When the generic structure of a document is not predefined, but rather is constructed specifically for each document class, it can be carefully adapted to the current needs. In cases where the generic structure is inadequate for a particular document of the class, it is always possible either to create a new class with a generic structure well suited to the new case or to extend the generic structure of the existing class to take into account the specifics of the document which poses the problem. These two solutions can also be applied to objects whose structures prove to be poorly designed.

The model is sufficiently flexible to take into account all the phases of the life of the document. When a generic structure specifies that a report must contain a title, an abstract, an introduction, at least two chapters, and a conclusion, this means only that a report, upon completion, will have to contain all of these elements. When the author begins writing, none of these elements is present. Thot uses this model. Therefore, it tolerates documents which do not conform strictly to the generic structure of their class; it also considers the generic logical structure to be a way of helping the user in the construction of a complex document.

In contrast, other applications may reject a document which does not conform strictly to its generic structure. This is, for example, what is done by compilers which refuse to generate code for a program which is not syntactically correct. This might also occur when using a document application for a report which does not have an abstract or title.

The constructors of the document model bring a great flexibility to the generic structures. A choice constructor (and even more, a unit or schema constructor) can represent several, very different elements. The list constructor permits the addition of more elements of the same type. Used together, these two constructors permit any series of elements of different types. Of course, this flexibility can be reduced wherever necessary since a generic structure can limit the choices or the number of elements in a list.

Another difficulty linked to the use of structure in the document model resides in the choice of the level of the structure. The structure of a discussion could be extracted from the text itself via linguistic analysis. Some studies are exploring this approach, but the model of Thot excludes this type of structure. It only takes into account the logical structure provided explicitly by the author.

However, the level of structure of the model is not imposed. Each generic structure defines its own level of structure, adapted to the document class or object and to the ways in which it will be processed. If it will only be edited and printed, a relatively simple structure suffices. If more specialized processing will be applied to it, the structure must represent the element types on which this processing must act. By way of example, a simple structure is sufficient for printing formulas, but a more complex structure is required to perform symbolic or numeric calculations on the mathematical expressions. The document model of Thot allows both types of structure.

The definition language for generic structures

Generic structures, which form the basis of the document model of Thot, are specified using a special language. This definition language, called S, is described in this section.

Each generic structure, which defines a class of documents or objects, is specified by a file, written in the S language, which is called a structure schema. Structure schemas are compiled into tables, called structure tables, which are used by Thot and which determine its behavior.

Writing Conventions

The grammar of S, like those of the languages P and T presented later, is described using the meta-language M, derived from the Backus-Naur Form (BNF).

In this meta-language each rule of the grammar is composed of a grammar symbol followed by an equals sign (`=') and the right part of the rule. The equals sign plays the same role as the traditional `::=' of BNF: it indicates that the right part defines the symbol of the left part. In the right part,

concatenation
is shown by the juxtaposition of symbols;
character strings
between apostrophes ' represent terminal symbols, that is, keywords in the language defined. Keywords are written here in upper-case letters, but can be written in any combination of upper and lower-case letters. For example, the keyword DEFPRES of S can also be written as defpres or DefPres.
material between brackets
(`[' and `]') is optional;
material between angle brackets
(`<' and `>') can be repeated many times or omitted;
the slash
(`/') indicates an alternative, a choice between the options separated by the slash character;
the period
marks the end of a rule;
text between braces
(`{' and `}') is simply a comment.

The M meta-language also uses the concepts of identifiers, strings, and integers:

NAME
represents an identifier, a sequence of letters (upper or lower-case), digits, and underline characters (`_'), beginning with a letter. Also considered a letter is the sequence of characters `\nnn' where the letter n represents the ISO Latin-1 code of the letter in octal. It is thus possible to use accented letters in identifiers. The maximum length of identifiers is fixed by the compiler. It is normally 31 characters.

Unlike keywords, upper and lower-case letters are distinct in identifiers. Thus, Title, TITLE, and title are considered different identifiers.

STRING
represents a string. This is a string of characters delimited by apostrophes. If an apostrophe must appear in a string, it is doubled. As with identifiers, strings can contain characters represented by their octal code (after a backslash). As with apostrophes, if a backslash must appear in a string, it is doubled.
NUMBER
represents a positive integer or zero (without a sign), or said another way, a sequence of decimal digits.

The M language can be used to define itself as follows:

{ Any text between braces is a comment. }
Grammar      = Rule < Rule > 'END' .
               { The < and > signs indicate zero }
               { or more repetitions. }
               { END marks the end of the grammar. }
Rule         = Ident '=' RightPart '.' .
               { The period indicates the end of a rule }
RightPart    = RtTerminal / RtIntermed .
               { The slash indicates a choice }
RtTerminal   ='NAME' / 'STRING' / 'NUMBER' .
               { Right part of a terminal rule }
RtIntermed   = Possibility < '/' Possibility > .
               { Right part of an intermediate rule }
Possibility  = ElemOpt < ElemOpt > .
ElemOpt      = Element / '[' Element < Element > ']' /
              '<' Element < Element > '>'  .
               { Brackets delimit optional parts }
Element      = Ident / KeyWord .
Ident        = NAME .
               { Identifier, sequence of characters }
KeyWord      = STRING .
               { Character string delimited by apostrophes }
END

Extension schemas

A structure schema defines the generic logical structure of a class of documents or objects, independent of the operations which can be performed on the documents. However, certain applications may require particular information to be represented by the structure for the documents that they operate on. Thus a document version manager will need to indicate in the document the parts which belong to one version or another. An indexing system will add highly-structured index tables as well as the links between these tables and the rest of the document.

Thus, many applications need to extend the generic structure of the documents on which they operate to introduce new attributes, associated elements or element types. These additions are specific to each application and must be able to be applied to any generic structure: users will want to manage versions or construct indices for many types of documents. Extension schemas fulfill this role: they define attributes, elements, associated elements, units, etc., but they can only be used jointly with a structure schema that they complete. Otherwise, structure schemas can always be used without these extensions when the corresponding applications are not available.

The general organization of structure schemas

Every structure schema begins with the keyword STRUCTURE and ends with the keyword END. The keyword STRUCTURE is followed by the keyword EXTENSION in the case where the schema defines an extension, then by the name of the generic structure which the schema defines (the name of the document or object class). The name of the structure is followed by a semicolon.

In the case of a complete schema (that is, a schema which is not an extension), the definition of the name of the structure is followed by the declarations of the default presentation schema, the global attributes, the parameters, the structure rules, the associated elements, the units, the skeleton elements and the exceptions. Only the definition of the structure rules is required. Each series of declarations begins with a keyword: DEFPRES, ATTR, PARAM, STRUCT, ASSOC, UNITS, EXPORT, EXCEPT.

In the case of an extension schema, there are neither parameters nor skeleton elements and the STRUCT section is optional, while that section is required in a schema that is not an extension. On the other hand, extension schemas can contain an EXTENS section, which must not appear in a schema which is not an extension; this section defines the complements to attach to the rules found in the schema to which the extension will be added. The sections ATTR, STRUCT, ASSOC, and UNITS define new attributes, new elements, new associated elements, and new units which add their definitions to the principal schema.

     StructSchema ='STRUCTURE' ElemID ';'
                   'DEFPRES' PresID ';'
                 [ 'ATTR' AttrSeq ]
                 [ 'PARAM' RulesSeq ]
                   'STRUCT' RulesSeq
                 [ 'ASSOC' RulesSeq ]
                 [ 'UNITS' RulesSeq ]
                 [ 'EXPORT' SkeletonSeq ]
                 [ 'EXCEPT' ExceptSeq ]
                   'END' .
     ElemID       = NAME .

or

     ExtensSchema ='STRUCTURE' 'EXTENSION' ElemID ';'
                   'DEFPRES' PresID ';'
                 [ 'ATTR' AttrSeq ]
                 [ 'STRUCT' RulesSeq ]
                 [ 'EXTENS' ExtensRuleSeq ]
                 [ 'ASSOC' RulesSeq ]
                 [ 'UNITS' RulesSeq ]
                 [ 'EXCEPT' ExceptSeq ]
                   'END' .
     ElemID       = NAME .

The default presentation

It was shown above that many different presentations are possible for documents and objects of the same class. The structure schema defines a preferred presentation for the class, called the default presentation. Like generic structures, presentations are described by programs, called presentation schemas, which are written in a specific language, P, presented later in this document. The name appearing after the keyword DEFPRES is the name of the default presentation schema. When a new document is created, Thot will use this presentation schema by default, but the user remains free to choose another if s/he wishes.

     PresID = NAME .

Global Attributes

If the generic structure includes global attributes of its own, they are declared after the keyword ATTR. Each global attribute is defined by its name, followed by an equals sign and the definition of its type. The declaration of a global attribute is terminated by a semi-colon.

For attributes of the numeric, textual, or reference types, the type is indicated by a keyword, INTEGER, TEXT, or REFERENCE respectively.

In the case of a reference attribute, the keyword REFERENCE is followed by the type of the referenced element in parentheses. It can refer to any type at all, specified by using the keyword ANY, or to a specific type. In the latter case, the element type designated by the reference can be defined either in the STRUCT section of the same structure schema or in the STRUCT section of another structure schema. When the type is defined in another schema, the element type is followed by the name of the structure schema (within parentheses) in which it is defined. The name of the designated element type can be preceded by the keyword First or Second, but only in the case where the type is defined as a pair. These keywords indicate whether the attribute must designate the first mark of the pair or the second. If the reference refers to a pair and neither of these two keywords is present, then the first mark is used.

In the case of an enumeration attribute, the equals sign is followed by the list of names representing the possible values of the attribute, the names being separated from each other by commas. An enumeration attribute has at least one possible value; the maximum number of values is defined by the compiler for the S language.

     AttrSeq   = Attribute < Attribute > .
     Attribute = AttrID '=' AttrType  ';' .
     AttrType  = 'INTEGER' / 'TEXT' /
                 'REFERENCE' '(' RefType ')' /
                 ValueSeq .
     RefType   = 'ANY' / [ FirstSec ] ElemID [ ExtStruct ] .
     FirstSec  = 'First' / 'Second' .
     ExtStruct = '(' ElemID ')' .
     ValueSeq  = AttrVal < ',' AttrVal > .
     AttrID    = NAME .
     AttrVal   = NAME .

There is a predefined global text attribute, the language, which is automatically added to every Thot structure schema. This attribute allows Thot to perform certain actions, such as hyphenation and spell-checking, which cannot be performed without knowing the language in which each part of the document is written. This attribute can be used just like any explicitly declared attribute: the system acts as if every structure schema contains

ATTR
   Language = TEXT;

Example:

The following specification defines the global enumeration attribute WordType.

ATTR
   WordType = Definition, IndexWord, DocumentTitle;

Parameters

A parameter is a document element which can appear many times in the document, but always has the same value. This value can only be modified in a controlled way by certain applications. For example, in an advertising circular, the name of the recipient may appear in the address part and in the text of the circular. If the recipient's name were a parameter, it might only be able to be changed by a ``mail-merge'' application.

Parameters are not needed for every document class, but if the schema includes parameters they are declared after the keyword PARAM. Each parameter declaration is made in the same way as a structure element declaration.

During editing, Thot permits the insertion of parameters wherever the structure schema allows; it also permits the removal of parameters which are already in the document but does not allow the modification of the parameter's content in any way. The content is generated automatically by Thot during the creation of the parameter, based on the value of the parameter in the document.

Structured elements

The rules for defining structured elements are required, except in an extension schema: they constitute the core of a structure schema, since they define the structure of the different types of elements that occur in a document or object of the class defined by the schema.

The first structure rule after the keyword STRUCT must define the structure of the class whose name appears in the first instruction (STRUCTURE) of the schema. This is the root rule of the schema, defining the root of the document tree or object tree.

The remaining rules may be placed in any order, since the language permits the definition of element types before or after their use, or even in the same instruction in which they are used. This last case allows the definition of recursive structures.

Each rule is composed of a name (the name of the element type whose structure is being defined) followed by an equals sign and a structure definition.

If any local attributes are associated with the element type defined by the rule, they appear between parentheses after the type name and before the equals sign. The parentheses contain, first, the keyword ATTR, then the list of local attributes, separated by commas. Each local attribute is composed of the name of the attribute followed by an equals sign and the definition of the attribute's type, just as in the definition of global attributes. The name of the attribute can be preceded by an exclamation point to indicate that the attribute must always be present for this element type. The same attribute, identified by its name, can be defined as a local attribute for multiple element types. In this case, the equals sign and definition of the attribute type need only appear in the first occurrence of the attribute. It should be noted that global attributes cannot also be defined as local attributes.

If any extensions are defined for this element type, a plus sign follows the structure definition and the names of the extension element types appear between parentheses after the plus. If there are multiple extensions, they are separated by commas. These types can either be defined in the same schema, defined in other schemas, or they may be base types identified by the keywords TEXT, GRAPHICS, SYMBOL, or PICTURE.

Restrictions are indicated in the same manner as extensions, but they are introduced by a minus sign and they come after the extensions, or if there are no extensions, after the structure definition.

If the values of attributes must be attached systematically to this element type, they are introduced by the keyword WITH and declared in the form of a list of fixed-value attributes. When such definitions of fixed attribute values appear, they are always the last part of the rule.

The rule is terminated by a semicolon.

  RuleSeq       = Rule < Rule > .
  Rule          = ElemID [ LocAttrSeq ] '=' DefWithAttr ';'.
  LocAttrSeq    = '(' 'ATTR' LocAttr < ';' LocAttr > ')' .
  LocAttr       = [ '!' ] AttrID [ '=' AttrType ] .
  DefWithAttr   = Definition
                  [ '+' '(' ExtensionSeq ')' ]
                  [ '-' '(' RestrictSeq ')' ]
                  [ 'WITH' FixedAttrSeq ] .
  ExtensionSeq  = ExtensionElem < ',' ExtensionElem > .
  ExtensionElem = ElemID / 'TEXT' / 'GRAPHICS' /
                  'SYMBOL' / 'PICTURE' .
  RestrictSeq   = RestrictElem < ',' RestrictElem > .
  RestrictElem  = ElemID / 'TEXT' / 'GRAPHICS' /
                  'SYMBOL' / 'PICTURE' .

The list of fixed-value attributes is composed of a sequence of attribute-value pairs separated by commas. Each pair contains the name of the attribute and the fixed value for this element type, the two being separated by an equals sign. If the sign is preceded by a question mark the given value is only an initial value that may be modified later rather than a value fixed for all time. Reference attributes are an exception to this norm. They cannot be assigned a fixed value, but when the name of such an attribute appears this indicates that this element type must have a valid value for the attribute. For the other attribute types, the fixed value is indicated by a signed integer (numeric attributes), a character string between apostrophes (textual attributes) or the name of a value (enumeration attributes).

Fixed-value attributes can either be global or local to the element type for which they are fixed, but they must be declared before they are used.

    FixedAttrSeq    = FixedAttr < ',' FixedAttr > .
    FixedAttr       = AttrID [ FixedOrModifVal ] .
    FixedOrModifVal = [ '?' ] '=' FixedValue .
    FixedValue      = [ '-' ] NumValue / TextVal / AttrVal .
    NumValue        = NUMBER .
    TextVal         = STRING .

Structure definitions

The structure of an element type can be a simple base type or a constructed type.

For constructed types, it is frequently the case that similar structures appear in many places in a document. For example the contents of the abstract, of the introduction, and of a section can have the same structure, that of a sequence of paragraphs. In this case, a single, common structure can be defined (the paragraph sequence in this example), and the schema is written to indicate that each element type possesses this structure, as follows:

     Abstract           = Paragraph_sequence;
     Introduction       = Paragraph_sequence;
     Section_contents   = Paragraph_sequence;

The equals sign means ``has the same structure as''.

If the element type defined is a simple base type, this is indicated by one of the keywords TEXT, GRAPHICS, SYMBOL, or PICTURE. If some local attributes must be associated with a base type, the keyword of the base type is followed by the declaration of the local attributes using the syntax presented above.

In the case of an open choice, the type is indicated by the keyword UNIT for units or the keyword NATURE for objects having a structure defined by any other schema.

A unit represents one of the two following categories:

  • a base type: text, graphical element, symbol, picture,
  • an element whose type is chosen from among the types defined as units in the UNITS section of the document's structure schema. It can also be chosen from among the types defined as units in the UNITS section of the structure schemas that defines the ancestors of the element to which the rule is applied.

Before the creation of an element defined as a unit, Thot asks the user to choose between the categories of elements.

Thus, the contents of a paragraph can be specified as a sequence of units, which will permit the inclusion in the paragraphs of character strings, symbols, and various elements, such as cross-references, if these are defined as units.

A schema object (keyword NATURE) represents an object defined by a structure schema freely chosen from among the available schemas; in the case the element type is defined by the first rule (the root rule) of the chosen schema.

If the element type defined is a constructed type, the list, aggregate, choice, and reference constructors are used. In this case the definition begins with a keyword identifying the constructor. This keyword is followed by a syntax specific to each constructor.

The local attribute definitions appear after the name of the element type being defined, if this element type has local attributes.

   Definition = BaseType [ LocAttrSeq ] / Constr / Element .
   BaseType   = 'TEXT' / 'GRAPHICS' / 'SYMBOL' / 'PICTURE' /
                'UNIT' / 'NATURE' .
   Element    = ElemID [ ExtOrDef ] .
   ExtOrDef   = 'EXTERN' / 'INCLUDED' / 
                [ LocAttrSeq ] '=' Definition .
   Constr     = 'LIST' [ '[' min '..' max ']' ] 'OF'
                       '(' DefWithAttr ')' /
                'BEGIN' DefOptSeq 'END' /
                'AGGREGATE' DefOptSeq 'END' /
                'CASE' 'OF' DefSeq 'END' /
                'REFERENCE' '(' RefType ')' /
                'PAIR' .

List

The list constructor permits the definition of an element type composed of a list of elements, all of the same type. A list definition begins with the LIST keyword followed by an optional range, the keyword OF, and the definition, between parentheses, of the element type which must compose the list. The optional range is composed of the minimum and maximum number of elements for the list separated by two periods and enclosed by brackets. If the range is not present, the number of list elements is unconstrained. When only one of the two bounds of the range is unconstrained, it is represented by a star ('*') character. Even when both bounds are unconstrained, they can be specified by [*..*], but it is simpler not to specify any bound.

               'LIST' [ '[' min '..' max ']' ]
               'OF' '(' DefWithAttr ')'
     min     = Integer / '*' .
     max     = Integer / '*' .
     Integer = NUMBER .

Before the document is edited, Thot creates the minimum number of elements for the list. If no minimum was given, it creates a single element. If a maximum number of elements is given and that number is attained, the editor refuses to create new elements for the list.

Example:

The following two instructions define the body of a document as a sequence of at least two chapters and the contents of a section as a sequence of paragraphs. A single paragraph can be the entire contents of a section.

Body             = LIST [2..*] OF (Chapter);
Section_contents = LIST OF (Paragraph);

Aggregate

The aggregate constructor is used to define an element type as a collection of sub-elements, each having a fixed type. The collection may be ordered or unordered. The elements composing the collection are called components. In the definition of an aggregate, a keyword indicates whether or not the aggregate is ordered: BEGIN for an ordered aggregate, AGGREGATE for an unordered aggregate. This keyword is followed by the list of component type definitions which is terminated by the END keyword. The component type definitions are separated by commas.

Before creating an aggregate, the Thot editor creates all the aggregate's components in the order they appear in the structure schema, even for unordered aggregates. However, unlike ordered aggregates, the components of an unordered aggregate may be rearranged using operations of the Thot editor. The exceptions to the rule are any components whose name was preceded by a question mark character ('?'). These components, which are optional, can be created by explicit request, possibly at the time the aggregate is created, but they are not created automatically prior to the creation of the aggregate.

                 'BEGIN' DefOptSeq 'END'
     DefOptSeq = DefOpt ';' < DefOpt ';' > .
     DefOpt    = [ '?' ] DefWithAttr .

Example:

In a bilingual document, each paragraph has an English version and a French version. In certain cases, the translator wants to add a marginal note, but this note is present in very few paragraphs. Thus, it must not be created systematically for every paragraph. A bilingual paragraph of this type is declared:

Bilingual_paragraph = BEGIN
                      French_paragraph  = TEXT;
                      English_paragraph = TEXT;
                      ? Note            = TEXT;
                      END;

Choice

The choice constructor permits the definition of an element type which is chosen from among a set of possible types. The keywords CASE and OF are followed by a list of definitions of possible types, which are separated by semicolons and terminated by the END keyword.

               'CASE' 'OF' DefSeq 'END'
     DefSeq = DefWithAttr ';' < DefWithAttr ';' > .

Before the creation of an element defined as a choice, the Thot editor presents the list of possible types for the element to the user. The user has only to select the element type that s/he wants to create from this list.

The order of the type declarations is important. It determines the order of the list presented to the user before the creation of the element. Also, when a Choice element is being created automatically, the first type in the list is used. In fact, using the Thot editor, when an empty Choice element is selected, it is possible to select this element and to enter its text from keyboard. In this case, the editor uses the first element type which can contain an atom of the character string type.

The two special cases of the choice constructor, the schema and the unit are discussed elsewhere.

Example:

It is common in documents to treat a variety of objects as if they were ordinary paragraphs. Thus, a ``Paragraph'' might actually be composed of a block of text (an ordinary paragraph), or a mathematical formula whose structure is defined by another structure schema named Math, or a table, also defined by another structure schema. Here is a definition of such a paragraph:

Paragraph = CASE OF
              Simple_text = TEXT;
              Formula     = Math;
              Table_para  = Table;
              END;

Reference

Like all elements in Thot, references are typed. An element type defined as a reference is a cross-reference to an element of some other given type. The keyword REFERENCE is followed by the name of a type enclosed in parentheses. When the type which is being cross-referenced is defined in another structure schema, the type name is itself followed by the name of the external structure schema in which it is defined.

When the designated element type is a mark pair, it can be preceded by a FIRST or SECOND keyword. These keywords indicate whether the reference points to the first or second mark of the pair. If the reference points to a pair and neither of these two keywords is present, the reference is considered to point to the first mark of the pair.

There is an exception to the principle of typed references: it is possible to define a reference which designates an element of any type, which can either be in the same document or another document. In this case, it suffices to put the keyword ANY in the parentheses which indicate the referenced element type.

             'REFERENCE' '(' RefType ')'
   RefType = 'ANY' / [ FirstSec ] ElemID [ ExtStruct ] .

When defining an inclusion, the REFERENCE keyword is not used. Inclusions with complete expansion are not declared as such in the structure schemas, since any element defined in a structure schema can be replaced by an element of the same type. Instead, inclusions without expansion or with partial expansion must be declared explicitly whenever they will include a complete object ( and not a part of an object). In this case, the object type to be included (that is, the name of its structure schema) is followed by a keyword: EXTERN for inclusion without expansion and INCLUDED for partial expansion.

Before creating a cross-reference or an inclusion, the Thot editor asks the user to choose, from the document images displayed, the referenced or included element.

Example:

If the types Note and Section are defined in the Article structure schema, it is possible to define, in the same structure schema, a reference to a note and a reference to a section in this manner:

Ref_note    = REFERENCE (Note);
Ref_section = REFERENCE (Section);

It is also possible to define the generic structure of a collection of articles, which include (with partial expansion) objects of the Article class and which possess an introduction which may include cross-references to sections of the included articles. In the Collection structure schema, the definitions are:

Collection = BEGIN
             Collection_title = TEXT;
             Introduction = LIST OF (Elem = CASE OF
                                           TEXT;
                                           Ref_sect;
                                           END);
             Body = LIST OF (Article INCLUDED);
             END;
Ref_sect   = REFERENCE (Section (Article));

Here we define a Folder document class which has a title and includes documents of different types, particularly Folders:

Folder   = BEGIN
           Folder_title    = TEXT;
           Folder_contents = LIST OF (Document);
           END;

Document = CASE OF
              Article EXTERN;
              Collection EXTERN;
              Folder EXTERN;
              END;

Under this definition, Folder represents either an aggregate which contains a folder title and the list of included documents or an included folder. To resolve this ambiguity, in the P language, the placement of a star character in front of the type name (here, Folder) indicates an included document.

Mark pairs

Like other elements, mark pairs are typed. The two marks of the pair have the same type, but there exist two predefined subtypes which apply to all mark pairs: the first mark of the pair (called First in the P and T languages) and the second mark (called Second).

In the S language, a mark pair is noted simply by the PAIR keyword.

In the Thot editor, marks are always moved or destroyed together. The two marks of a pair have the same identifier, unique within the document, which permits intertwining mark pairs without risk of ambiguity.

Imports

Because of schema constructors, it is possible, before editing a document, to use classes defined by other structure schemas whenever they are needed. It is also possible to assign specific document classes to certain element types. In this case, these classes are simply designated by their name. In fact, if a type name is not defined in the structure schema, it is assumed that it specifies a structure defined by another structure schema.

Example:

If the types Math and Table don't appear in the left part of a structure rule in the schema, the following two rules indicate that a formula has the structure of an object defined by the structure schema Math and that a table element has the structure of an object defined by the Table schema.

Formula    = Math;
Table_elem = Table;

Extension rules

The EXTENS section, which can only appear in an extension schema, defines complements to the rules in the primary schema (i.e. the structure schema to which the extension schema will be applied). More precisely, this section permits the addition to an existing type of local attributes, extensions, restrictions and fixed-value attributes.

These additions can be applied to the root rule of the primary schema, designated by the keyword Root, or to any other explicitly named rule.

Extension rules are separated from each other by a semicolon and each extension rule has the same syntax as a structure rule, but the part which defines the constructor is absent.

     ExtenRuleSeq = ExtensRule ';' < ExtensRule ';' > .
     ExtensRule   = RootOrElem [ LocAttrSeq ]
                    [ '+' '(' ExtensionSeq ')' ]
                    [ '-' '(' RestrictSeq ')' ]
                    [ 'WITH' FixedAttrSeq ] .
     RootOrElem   = 'Root' / ElemID .

Associated elements

If associated elements are necessary, they must be declared in a specific section of the structure schema, introduced by the keyword ASSOC. Each associated element type is specified like any other structured element. However, these types must not appear in any other element types of the schema, except in REFERENCE rules.

Units

The UNITS section of the structure schema contains the declarations of the element types which can be used in the external objects making up parts of the document or in objects of the class defined by the schema. As with associated elements, these element types are defined just like other structured element types. They can be used in the other element types of the schema, but they can also be used in any other rule of the schema.

Example:

If references to notes are declared as units:

UNITS
   Ref_note = REFERENCE (Note);

then it is possible to use references to notes in a cell of a table, even when Table is an external structure schema. The Table schema must declare a cell to be a sequence of units, which can then be base element types (text, for example) or references to notes in the document.

Cell = LIST OF (UNITS);

Skeleton elements

When editing a document which contains or must contain external references to several other documents, it may be necessary to load a large number of documents, simply to see the parts designated by the external references of the document while editing, or to access the source of included elements. In this case, the external documents are not modified and it is only necessary to see the elements of these documents which could be referenced. Because of this, the editor will suggest that the documents be loaded in ``skeleton'' form. This form contains only the elements of the document explicitly mentioned in the EXPORT section of their structure schema and, for these elements, only the part of the contents specified in that section. This form has the advantage of being very compact, thus requiring very few resources from the editor. This is also the skeleton form which constitutes the expanded form of inclusions with partial expansion.

Skeleton elements must be declared explicitly in the EXPORT section of the structure schema that defines them. This section begins with the keyword EXPORT followed by a comma-separated list of the element types which must appear in the skeleton form and ending with a semicolon. These types must have been previously declared in the schema.

For each skeleton element type, the part of the contents which is loaded by the editor, and therefore displayable, can be specified by putting the keyword WITH and the name of the contained element type to be loaded after the name of the skeleton element type. In this case only that named element, among all the elements contained in the exportable element type, will be loaded. If the WITH is absent, the entire contents of the skeleton element will be loaded by the editor. If instead, it is better that the skeleton form not load the contents of a particular element type, the keyword WITH must be followed by the word Nothing.

                [ 'EXPORT' SkeletonSeq ]

     SkeletonSeq = SkelElem < ',' SkelElem > ';' .
     SkelElem    = ElemID [ 'WITH' Contents ] .
     Contents    = 'Nothing' / ElemID [ ExtStruct ] .

Example:

Suppose that, in documents of the article class, the element types Article_title, Figure, Section, Paragraph, and Biblio should appear in the skeleton form in order to make it easier to create external references to them from other documents. When loading an article in its skeleton form, all of these element types will be loaded except for paragraphs, but only the article title will be loaded in its entirety. For figures, the caption will be loaded, while for sections, the title will be loaded, and for bibliographic entries, only the title that they contain will be loaded. Note that bibliographic elements are defined in another structure schema, RefBib. To produce this result, the following declarations should be placed in the Article structure schema:

EXPORT
   Article_title,
   Figure With Caption,
   Section With Section_title,
   Paragraph With Nothing,
   Biblio With Biblio_title(RefBib);

Exceptions

The behavior of the Thot editor and the actions that it performs are determined by the structure schemas. These actions are applied to all document and object types in accordance with their generic structure. For certain object types, such as tables and graphics, these actions are not sufficient or are poorly adapted and some special actions must be added to or substituted for certain standard actions. These special actions are called exceptions.

Exceptions only inhibit or modify certain standard actions, but they can be used freely in every structure schema.

Each structure schema can contain a section defining exceptions. It begins with the keyword EXCEPT and is composed of a sequence of exception declarations, separated by semicolons. Each declaration of an exception begins with the name of an element type or attribute followed by a colon. This indicates the element type or attribute to which the following exceptions apply. When the given element type name is a mark pair, and only in this case, the type name can be preceded by the keyword First or Second, to indicate if the exceptions which follow are associated with the first mark of the pair or the second. In the absence of this keyword, the first mark is used.

When placed in an extension schema, the keyword EXTERN indicates that the type name which follows is found in the principal schema (the schema being extended by the extension schema). The exceptions are indicated by a name. They are separated by semicolons.

                  [ 'EXCEPT' ExceptSeq ]

     ExceptSeq     = Except ';' < Except ';' > .
     Except        = [ 'EXTERN' ] [ FirstSec ] ExcTypeOrAttr
                     ':' ExcValSeq .
     ExcTypeOrAttr = ElemID / AttrID .
     ExcValSeq     = ExcValue < ',' ExcValue > .
     ExcValue      ='NoCut' / 'NoCreate' / 'NoHMove' / 
                    'NoVMove' / 'NoHResize' / 'NoVResize' /
                    'NoMove' / 'NoResize' / 'MoveResize' /
                    'NewWidth' / 'NewHeight' / 'NewHPos' /
                    'NewVPos' / 'Invisible' /
                    'NoSelect' / 
  
    
    '
  




NoSpellCheck' /
                    'Hidden' / 'ActiveRef' /
                    'ImportLine' / 'ImportParagraph' /
                    'NoPaginate' / 'ParagraphBreak' /
                    'PageBreak' / 'PageBreakAllowed' / 'PageBreakPlace' /
                    'PageBreakRepetition' / 'PageBreakRepBefore' /
                    'HighlightChildren' / 'ExtendedSelection' /
                    'ReturnCreateNL' / 'IsDraw' / 'IsTable' /
                    'IsRow' / 'IsColHead' / 'IsCell' /
                    'NewPercentWidth' / 'ColRef' / 'ColSpan' /
                    'RowSpan' / 'SaveDocument' / 'Shadow' .

The following are the available exceptions:

NoCut
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be deleted by the editor.
NoCreate
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be created by ordinary commands for creating new elements. These elements are usually created by special actions associated with other exceptions.
NoHMove
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be moved horizontally with the mouse. Their children elements cannot be moved either.
NoVMove
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be moved vertically with the mouse. Their children elements cannot be moved either.
NoMove
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be moved in any direction with the mouse. Their children elements cannot be moved either.
NoHResize
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be resized horizontally with the mouse. Their children elements cannot be resized either.
NoVResize
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be resized vertically with the mouse. Their children elements cannot be resized either.
NoResize
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be resized in any direction with the mouse. Their children elements cannot be resized either.
MoveResize
This exception can only be applied to element types. Elements of a type to which this exception is applied can be moved and resized in any direction with the mouse, even if one of their ancestor element has an exception that prevents moving or resizing. Their children elements can also be resized or moved.
NewWidth
This exception can only be applied to numeric attributes. If the width of an element which has this attribute is modified with the mouse, the value of the new width will be assigned to the attribute.
NewHeight
This exception can only be applied to numeric attributes. If the height of an element which has this attribute is modified with the mouse, the value of the new height will be assigned to the attribute.
NewHPos
This exception can only be applied to numeric attributes. If the horizontal position of an element which has this attribute is modified with the mouse, the value of the new horizontal position will be assigned to the attribute.
NewVPos
This exception can only be applied to numeric attributes. If the vertical position of an element which has this attribute is modified with the mouse, the value of the new vertical position will be assigned to the attribute.
Invisible
This exception can only be applied to attributes, but can be applied to all attribute types. It indicates that the attribute must not be seen by the user and that its value must not be changed directly. This exception is usually used when another exception manipulates the value of an attribute.
NoSelect
This exception can only be applied to element types. Elements of a type to which this exception is applied cannot be selected directly with the mouse, but they can be selected by other methods provided by the editor.
NoSpellCheck
This exception can only be applied to element types. Elements of a type to which this exception is applied are not taken into account by the spell checker.
Hidden
This exception can only be applied to element types. It indicates that elements of this type, although present in the document's structure, must not be shown to the user of the editor. In particular, the creation menus must not propose this type and the selection message must not pick it.
ActiveRef
This exception can only be applied to attributes of the reference type. It indicates that when the user of the editor makes a double click on an element which possesses a reference attribute having this exception, the element designated by the reference attribute will be selected.
ImportLine
This exception can only be applied to element types. It indicates that elements of this type should receive the content of imported text files. An element is created for each line of the imported file. A structure schema cannot contain several exceptions ImportLine and, if it contains one, it should not contain any exception ImportParagraph.
ImportParagraph
This exception can only be applied to element types. It indicates that elements of this type should receive the content of imported text files. An element is created for each paragraph of the imported file. A paragraph is a sequence of lines without any empty line. A structure schema cannot contain several exceptions ImportParagraph and, if it contains one, it should not contain any exception ImportLine.
NoPaginate
This exception can only be applied to the root element, i.e. the name that appear after the keyword STRUCTURE at the beginning of the structure schema. It indicates that the editor should not allow the user to paginate documents of that type.
ParagraphBreak
This exception can only be applied to element types. When the caret is within an element of a type to which this exception is applied, it is that element that will be split when the user hits the Return key.
ReturnCreateNL
This exception can only be applied to element types. When the caret is within an element of a type to which this exception is applied, the Return key simply inserts a New line character (code \212) at the current position. The Return key does not create a new element; it does not split the current element either.
HighlightChildren
This exception can only be applied to element types. Elements of a type to which this exception is applied are not highlighted themselves when they are selected, but all their children are highlighted instead.
ExtendedSelection
This exception can only be applied to element types. The selection extension command (middle button of the mouse) only add the clicked element (if it has that exception) to the current selection, without selecting other elements between the current selection and the clicked element.
IsDraw, IsTable, IsColHead, IsRow, IsCell
These exceptions can only be applied to element types. Elements of a type to which these exceptions are applied are identified as Draws, Tables, Colheads, Rows or Cells and specific processing are applied to them.
ColRef
This exception can only be applied to attributes of the reference type. It indicates that this attribute refers to the column head (see exception IsColHead) which the element belongs to.
ColSpan, RowSpan
These exceptions can only be applied to numeric attributes of cells. They indicate that attribute values give how many columns or rows the element spans.
Shadow
This exception can only be applied to element types. Text of elements of a type to which this exception is applied are displayed and printed as a set of stars ('*').

Example:

Consider a structure schema for object-style graphics which defines the Graphic_object element type with the associated Height and Weight numeric attributes. Suppose that we want documents of this class to have the following qualities:

  • Whenever the width or height of an object is changed using the mouse, the new values are stored in the object's Width and Height attributes.
  • The user should not be able to change the values of the Width and Height attributes via the Attributes menu of the Thot editor.

The following exceptions will produce this effect.

STRUCT
...
   Graphics_object (ATTR Height = Integer; Width = Integer)
       = GRAPHICS with Height ?= 10, Width ?= 10;
...
EXCEPT
   Height: NewHeight, Invisible;
   Width: NewWidth, Invisible;

Some examples

In order to illustrate the principles of the document model and the syntax of the S language, this section presents two examples of structure schemas. One defines a class of documents, the other defines a class of objects.

A class of documents: articles

This example shows a possible structure for articles published in a journal. Text between braces is comments.

STRUCTURE Article;  { This schema defines the Article class }
DEFPRES ArticleP;   { The default presentation schema is
                      ArticleP }
ATTR                { Global attribute definitions }
   WordType = Definition, IndexWord, DocumentTitle;
   { A single global attribute is defined, with three values }
STRUCT              { Definition of the generic structure }
   Article = BEGIN  { The Article class has an aggregate
                      structure }
             Title = BEGIN   { The title is an aggregate }
                     French_title = 
                         Text WITH Language='Fran\347ais';
                     English_title =
                         Text WITH Language='English';
                     END;
             Authors = 
               LIST OF (Author
                 (ATTR Author_type=principal,secondary)
                 { The Author type has a local attribute }
                 = BEGIN
                   Author_name = Text;
                   Info = Paragraphs ;
                   { Paragraphs is defined later }
                   Address    = Text;
                   END
                 );
             Keywords = Text;
             { The journal's editor introduces the article
               with a short introduction, in French and
               in English }
             Introduction = 
                 BEGIN
                 French_intr  = Paragraphs WITH
                                Language='Fran\347ais';
                 English_intr = Paragraphs WITH
                                Language='English';
                 END;
             Body = Sections; { Sections are defined later }
                   { Appendixes are only created on demand }
           ? Appendices = 
                 LIST OF (Appendix =
                          BEGIN
                          Appendix_Title    = Text;
                          Appendix_Contents = Paragraphs;
                          END
                         );
             END;      { End of the Article aggregate }

    Sections = LIST [2..*] OF (
                 Section = { At least 2 sections }
                 BEGIN
                 Section_title   = Text;
                 Section_contents =
                   BEGIN
                   Paragraphs;
                   Sections; { Sections at a lower level }
                   END;
                 END
                 );

    Paragraphs = LIST OF (Paragraph = CASE OF
                               Enumeration = 
                                   LIST [2..*] OF
                                       (Item = Paragraphs);
                               Isolated_formula = Formula;
                               LIST OF (UNIT);
                               END
                          );

ASSOC         { Associated elements definitions }

   Figure = BEGIN
            Figure_caption  = Text;
            Illustration   = NATURE;
            END;

   Biblio_citation = CASE OF
                        Ref_Article =
                           BEGIN
                           Authors_Bib   = Text;
                           Article_Title = Text;
                           Journal       = Text;
                           Page_Numbers  = Text;
                           Date          = Text;
                           END;
                        Ref_Livre =
                           BEGIN
                           Authors_Bib; { Defined above }
                           Book_Title   = Text;
                           Editor       = Text;
                           Date;        { Defined above }
                           END;
                       END;

   Note =  Paragraphs - (Ref_note);

UNITS      { Elements which can be used in objects }

   Ref_note    = REFERENCE (Note);
   Ref_biblio  = REFERENCE (Biblio_citation);
   Ref_figure  = REFERENCE (Figure);
   Ref_formula = REFERENCE (Isolated_formula);

EXPORT     { Skeleton elements }

   Title,
   Figure with Figure_caption,
   Section With Section_title;

END           { End of the structure schema }

This schema is very complete since it defines both paragraphs and bibliographic citations. These element types could just as well be defined in other structure schemas, as is the case with the Formula class. All sorts of other elements can be inserted into an article, since a paragraph can contain any type of unit. Similarly, figures can be any class of document or object that the user chooses.

Generally, an article doesn't contain appendices, but it is possible to add them on explicit request: this is the effect of the question mark before the word Appendices.

The Figure, Biblio_citation and Note elements are associated elements. Thus, they are only used in REFERENCE statements.

Various types of cross-references can be put in paragraphs. They can also be placed the objects which are part of the article, since the cross-references are defined as units (UNITS).

There is a single restriction to prevent the creation of Ref_note elements within notes.

It is worth noting that the S language permits the definition of recursive structures like sections: a section can contain other sections (which are thus at the next lower level of the document tree). Paragraphs are also recursive elements, since a paragraph can contain an enumeration in which each element (Item) is composed of paragraphs.

A class of objects: mathematical formulas

The example below defines the Formula class which is used in Article documents. This class represents mathematical formulas with a rather simple structure, but sufficient to produce a correct rendition on the screen or printer. To support more elaborate operations (formal or numeric calculations), a finer structure should be defined. This class doesn't use any other class and doesn't define any associated elements or units.

STRUCTURE Formula;
DEFPRES FormulaP;

ATTR
   String_type = Function_name, Variable_name;

STRUCT
   Formula      = Expression;
   Expression   = LIST OF (Construction);
   Construction = CASE OF
                  TEXT;         { Simple character string }
                  Index    = Expression;
                  Exponent = Expression;
                  Fraction =
                        BEGIN
                        Numerator   = Expression;
                        Denominator = Expression;
                        END;
                  Root = 
                        BEGIN
                      ? Order = TEXT;
                        Root_Contents = Expression;
                        END;
                  Integral =
                        BEGIN
                        Integration_Symbol = SYMBOL;
                        Lower_Bound        = Expression;
                        Upper_Bound        = Expression;
                        END;
                  Triple =
                        BEGIN
                        Princ_Expression = Expression;
                        Lower_Expression = Expression;
                        Upper_Expression = Expression;
                        END;
                  Column = LIST [2..*] OF 
                              (Element = Expression);
                  Parentheses_Block =
                        BEGIN
                        Opening  = SYMBOL;
                        Contents = Expression;
                        Closing  = SYMBOL;
                        END;
                  END;       { End of Choice Constructor }
END                          { End of Structure Schema }

This schema defines a single global attribute which allows functions and variables to be distinguished. In the presentation schema, this attribute can be used to choose between roman (for functions) and italic characters (for variables).

A formula's structure is that of a mathematical expression, which is itself a sequence of mathematical constructions. A mathematical construction can be either a simple character string, an index, an exponent, a fraction, a root, etc. Each of these mathematical constructions has a sensible structure which generally includes one or more expressions, thus making the formula class's structure definition recursive.

In most cases, the roots which appear in the formulas are square roots and their order (2) is not specified. This is why the Order component is marked optional by a question mark. When explicitly requested, it is possible to add an order to a root, for example for cube roots (order = 3).

An integral is formed by an integration symbol, chosen by the user (simple integral, double, curvilinear, etc.), and two bounds. A more fine-grained schema would add components for the integrand and the integration variable. Similarly, the Block_Parentheses construction leaves the choice of opening and closing symbols to the user. They can be brackets, braces, parentheses, etc.


The P Language

Document presentation

Because of the model adopted for Thot, the presentation of documents is clearly separated from their structure and content. After having presented the logical structure of documents, we now detail the principles implemented for their presentation. The concept of presentation encompasses what is often called the page layout, the composition, or the document style. It is the set of operations which display the document on the screen or print it on paper. Like logical structure, document presentation is defined generically with the help of a language, called P.

Two levels of presentation

The link between structure and presentation is clear: the logical organization of a document is used to carry out its presentation, since the purpose of the presentation is to make evident the organization of the document. But the presentation is equally dependent on the device used to render the document. Certain presentation effects, notably changes of font or character set, cannot be performed on all printers or on all screens. This is why Thot uses a two-level approach, where the presentation is first described in abstract terms, without taking into account each particular device, and then the presentation is realized within the constraints of a given device.

Thus, presentation is only described as a function of the structure of the documents and the image that would be produced on an idealized device. For this reason, presentation descriptions do not refer to any device characteristics: they describe abstract presentations which can be concretized on different devices.

A presentation description also defines a generic presentation, since it describes the appearance of a class of documents or objects. This generic presentation must also be applied to document and object instances, each conforming to its generic logical structure, but with all the allowances that were called to mind above: missing elements, constructed elements with other logical structures, etc.

In order to preserve the homogeneity between documents and objects, presentation is described with a single set of tools which support the layout of a large document as well as the composition of objects like a graphical figure or mathematical formula. This unity of presentation description tools contrasts with the traditional approach, which focuses more on documents than objects and thus is based on the usual typographic conventions, such as the placement of margins, indentations, vertical spaces, line lengths, justification, font changes, etc.

Boxes

To assure the homogeneity of tools, all presentation in Thot, for documents as well as for the objects which they contain, is based on the notion of the box, such as was implemented in TEX.

Corresponding to each element of the document is a box, which is the rectangle enclosing the element on the display device (screen or sheet of paper); the outline of this rectangle is not visible, except when a ShowBox rule applies to the element. The sides of the box are parallel to the sides of the screen or the sheet of paper. By way of example, a box is associated with a character string, a line of text, a page, a paragraph, a title, a mathematical formula, or a table cell.

Whatever element it corresponds to, each box possesses four sides and four axes, which we designate as follows (see figure):

Top
the upper side,
Bottom
the lower side,
Left
the left side,
Right
the right side,
VMiddle
the vertical axis passing through the center of the box,
HMiddle
the horizontal axis passing through the center of the box,
VRef
the vertical reference axis,
HRef
the horizontal reference axis.

        Left   VRef  VMiddle        Right
                 :      :
    Top   -----------------------------
          |      :      :             |
          |      :      :             |
          |      :      :             |
          |      :      :             |
          |      :      :             |
HMiddle ..|...........................|..
          |      :      :             |
          |      :      :             |
   HRef ..|...........................|..
          |      :      :             |
          |      :      :             |
  Bottom  -----------------------------
                 :      :

The sides and axes of boxes


The principal role of boxes is to set the extent and position of the images of the different elements of a document with respect to each other on the reproduction device. This is done by defining relations between the boxes of different elements which give relative extents and positions to these boxes.

There are three types of boxes:

  • boxes corresponding to structural elements of the document,
  • presentation boxes,
  • page layout boxes.

Boxes corresponding to structural elements of the document are those which linked to each of the elements (base or structured) of the logical structure of the document. Such a box contains all the contents of the element to which it corresponds (there is an exception: see rules VertOverflow and HorizOverflow). These boxes form a tree-like structure, identical to that of the structural elements to which they correspond. This tree expresses the inclusion relationships between the boxes: a box includes all the boxes of its subtree. On the other hand, there are no predefined rules for the relative positions of the included boxes. If they are at the same level, they can overlap, be contiguous, or be disjoint. The rules expressed in the generic presentation specify their relative positions.

Presentation boxes represent elements which are not found in the logical structure of the document but which are added to meet the needs of presentation. These boxes are linked to the elements of the logical structure that are best suited to bringing them out. For example, they are used to add the character string ``Summary:'' before the summary in the presentation of a report or to represent the fraction bar in a formula, or also to make the title of a field in a form appear. These elements have no role in the logical structure of the document: the presence of a Summary element in the document does not require the creation of another structural object to hold the word ``Summary''. Similarly, if a Fraction element contains both a Numerator element and a Denominator element, the fraction bar has no purpose structurally. On the other hand, these elements of the presentation are important for the reader of the reproduced document or for the user of an editor. This is why they must appear in the document's image. It is the generic presentation which specifies the presentation boxes to add by indicating their content (a base element for which the value is specified) and the position that they must take in the tree of boxes. During editing, these boxes cannot be modified by the user.

Page layout boxes are boxes created implicitly by the page layout rules. These rules indicate how the contents of a structured element must be broken into lines and pages. In contrast to presentation boxes, these line and page boxes do not depend on the logical structure of the document, but rather on the physical constraints of the output devices: character size, height and width of the window on the screen or of the sheet of paper.

Views and visibility

One of the operations that one might wish to perform on a document is to view it is different ways. For this reason, it is possible to define several views for the same document, or better yet, for all documents of the same class. A view is not a different presentation of the document, but rather a filter which only allows the display of certain parts of the document. For example, it might be desirable to see only the titles of chapters and sections in order to be able to move rapidly through the document. Such a view could be called a ``table of contents''. It might also be desirable to see only the mathematical formulas of a document in order to avoid being distracted by the non-mathematical aspects of the document. A ``mathematics'' view could provide this service.

Views, like presentation, are based on the generic logical structure. Each document class, and each generic presentation, can be provided with views which are particularly useful for that class or presentation. For each view, the visibility of elements is defined, indicated whether or not the elements must be presented to the user. The visibility is calculated as a function of the type of the elements or their hierarchical position in the structure of the document. Thus, for a table of contents, all the ``Chapter Title'' and ``Section Title'' elements are made visible. However, the hierarchical level could be used to make the section titles invisible below a certain threshold level. By varying this threshold, the granularity of the view can be varied. In the ``mathematics'' view, only Formula elements would be made visible, no matter what their hierarchical level.

Because views are especially useful for producing a synthetic image of the document, it is necessary to adapt the presentation of the elements to the view in which they appear. For example, it is inappropriate to have a page break before every chapter title in the table of contents. Thus, generic presentations take into account the possible views and permit each element type's presentation to vary according the view in which its image appears.

Views are also used, when editing documents, to display the associated elements. So, in addition to the primary view of the document, there can be a ``notes'' view and a ``figures'' view which contain, respectively, the associated elements of the Note and Figure types. In this way, it is possible to see simultaneously the text which refers to these elements and the elements themselves, even if they will be separated when printed.

Pages

Presentation schemas can be defined which display the document as a long scroll, without page breaks. This type of schema is particularly well-suited to the initial phase of work on a document, where jumps from page to page would hinder composing and reading the document on a screen. In this case, the associated elements (such as notes), which are normally displayed in the page footer, are presented in a separate window. But, once the document is written, it may be desirable to display the document on the screen in the same manner in which it will be printed. So, the presentation schema must define pages.

The P language permits the specification of the dimensions of pages as well as their composition. It is possible to generate running titles, page numbers, zones at the bottom of the page for notes, etc. The editor follows this model and inserts page break marks in the document which are used during printing, insuring that the pages on paper are the same as on the screen.

Once a document has been edited with a presentation schema defining pages, it contains page marks. But it is always possible to edit the document using a schema without pages. In this case, the page marks are simply ignored by the editor. They are considered again as soon as a schema with pages is used. Thus, the user is free to choose between schemas with and without pages.

Thot treats the page break, rather than the page itself, as a box. This page break box contains all the elements of one page's footer, a rule marking the edge of this page, and all the elements of the next page's header. The elements of the header and footer can be running titles, page number, associated elements (notes, for example), etc. All these elements, as well as their content and graphical appearance, are defined by the generic presentation.

Numbering

Many elements are numbered in documents: pages, chapters, sections, formulas, theorems, notes, figures, bibliographic references, exercises, examples, lemmas, etc. Because Thot has a notion of logical structure, all of these numbers (with the exception of pages) are redundant with information implicit in the logical structure of the document. Such numbers are simply a way to make the structure of the document more visible. So, they are part of the document's presentation and are calculated by the editor from the logical structure. The structure does not contain numbers as such; it only defines relative structural positio