Date: Fri, 03 Sep 1993 08:13:39 -0400 (EDT)
From: KLENSIN@infoods.mit.edu
(background)

SGML micro-tutorial/introduction

John Klensin

SGML comes out of a text-processing/formatting community. However, it isn't either a page description language (like Postscript (tm)) or a formatting-specifying language (like TeX or nroff) but a "generic markup" language, used to identify fairly high-level concepts in a "document". Where nroff is used to specify "put a blank line here and fill what follows", SGML would simply identify the beginning and end of a paragraph, leaving "how to format a paragraph" to the interpretation of the application. Similarly for "sections", "chapters", and so on--hence the need for nesting.

Almost as soon as the standard was becoming final, people realized that it had more general application than text formatting and that it instead provided a way to identify text elements in a "name and value" style, even when structures were deeply nested. This made it worthwhile for the storage and organization of very complex documents, for hypertext, and for scientific database applications that include a great deal of descriptive information.

The generalization comes not because of the extreme generality of the SGMLlanguage (although it has that too), but because people tend to organize things in mostly-repetitive hierarchical structures. Rolodex (tm) cards, library catalogues, file cabinets and folders -- most of the metaphors we come up with when talking about data and file systems -- are examples of the general model. Hence, perhaps, the almost instant and intuitive appeal of hypertext documents.

Ignoring how one defines "document type definitions", changes syntax rules, etc., SGML breaks text up into "elements". An element consists of a start-tag, usually an end-tag, and some stuff in between. The stuff in between may be either "data" (text strings of one sort or another) or more elements. Elements are usually thought of as having names, the names of the "generic identifier" that is the main (usually only) part of a tag. Start-tags and end-tags are usually identical, except for a bit of syntax.

For example, we might have

    <foo>   drivel, and more drivel </foo>
       \          \                   \
        \          \ data               \ end tag
         \ start tag

In general, line breaks and similar things don't count, so the above could be written as

	    	<foo>
	    	     drivel, and more drivel
	    	</foo>

Real examples are more useful and follow.