Getting Started<?lb?>with &SGML;/&XML;

$Date: 2005/03/25 21:54:04 $ $Revision: 1.1 $ Getting Started<?lb?>with &SGML;/&XML; SGML getting started XML getting started This chapter is intended to provide a quick introduction to structured markup (&SGML; and &XML;). If you're already familiar with &SGML; or &XML;, you only need to skim this chapter. To work with DocBook, you need to understand a few basic concepts of structured editing in general, and DocBook, in particular. That's covered here. You also need some concrete experience with the way a DocBook document is structured. That's covered in the next chapter. &HTML; and &SGML; vs. &XML; HTML XML vs. Hypertext Markup LanguageHTML SGML HTML vs. This chapter doesn't assume that you know what &HTML; is, but if you do, you have a starting point for understanding structured markup. &HTML; (Hypertext Markup Language) is a way of marking up text and graphics so that the most popular web browsers can interpret them. &HTML; consists of a set of markup tags with specific meanings. Moreover, &HTML; is a very basic type of &SGML; markup that is easy to learn and easy for computer applications to generate. But the simplicity of &HTML; is both its virtue and its weakness. Because of &HTML;'s limitations, web users and programmers have had to extend and enhance it by a series of customizations and revisions that still fall short of accommodating current, to say nothing of future, needs. &SGML;, on the other hand, is an international standard that describes how markup languages are defined. &SGML; does not consist of particular tags or the rules for their usage. &HTML; is an example of a markup language defined in &SGML;. XML HTML and SGML vs. &XML; promises an intelligent improvement over &HTML;, and compatibility with it is already being built into the most popular web browsers. &XML; is not a new markup language designed to compete with &HTML;, and it's not designed to create conversion headaches for people with tons of &HTML; documents. &XML; is intended to alleviate compatibility problems with browser software; it's a new, easier version of the standard rules that govern the markup itself, or, in other words, a new version of &SGML;. The rules of &XML; are designed to make it easier to write both applications that interpret its type of markup and applications that generate its markup. &XML; was developed by a team of &SGML; experts who understood and sought to correct the problems of learning and implementing &SGML;. &XML; is also extensible markup, which means that it is customizable. A browser or word processor that is &XML;-capable will be able to read any &XML;-based markup language that an individual user defines. In this book, we tend to describe things in terms of &SGML;, but where there are differences between &SGML; and &XML; (and there are only a few), we point them out. For our purposes, it doesn't really matter whether you use &SGML; or &XML;. During the coming months, we anticipate that &XML;-aware web browsers and other tools will become available. Nevertheless, it's not unreasonable to do your authoring in &SGML; and your online publishing in &XML; or &HTML;. By the same token, it's not unreasonable to do your authoring in &XML;. Basic &SGML;/&XML; Concepts SGML basic concepts XML basic concepts XML basic concepts structured semantic markup languageSGML Here are the basic &SGML;/&XML; concepts you need to grasp: structured, semantic markup elements attributes entities Structured and Semantic Markup appearance SGML and structured markup semantic markup An essential characteristic of structured markup is that it explicitly distinguishes (and accordingly “marks up” within a document) the structure and semantic content of a document. It does not mark up the way in which the document will appear to the reader, in print or otherwise. In the days before word processors it was common for a typed manuscript to be submitted to a publisher. The manuscript identified the logical structures of the documents (chapters, section titles, and so on), but said nothing about its appearance. Working independently of the author, a designer then developed a specification for the appearance of the document, and a typesetter marked up and applied the designer's format to the document. presentationappearance HTML appearance, limitions of specification Because presentation or appearance is usually based on structure and content, &SGML; markup logically precedes and generally determines the way a document will look to a reader. If you are familiar with strict, simple &HTML; markup, you know that a given document that is structurally the same can also look different on different computers. That's because the markup does not specify many aspects of a document's appearance, although it does specify many aspects of a document's structure. text formatting word processors, SGML/XML vs. Many writers type their text into a word processor, line-by-line and word-for-word, italicizing technical terms, underlining words for emphasis, or setting section headers in a font complementary to the body text, and finally, setting the headers off with a few carriage returns fore and aft. The format such a writer imposes on the words on the screen imparts structure to the document by changing its appearance in ways that a reader can more or less reliably decode. The reliability depends on how consistently and unambiguously the changes in type and layout are made. By contrast, an &SGML;/&XML; markup of a section header explicitly specifies that a specific piece of text is a section header. This assertion does not specify the presentation or appearance of the section header, but it makes the fact that the text is a section header completely unambiguous. elements SGML/XML, using titles top-level sections top-level sections characters character sets SGML documents ASCII character set XML Unicode character set Unicode character set XML documents, using &SGML; and &XML; use named elements, delimited by angle brackets (“<” and “>”) to identify the markup in a document. In DocBook, a top-level section is sect1, so the title of a top-level section named My First-Level Header would be identified like this: <sect1><title>My First-Level Header</title> Note the following features of this markup: Clarity A title begins with title and ends with title. The sect1 also has an ending sect1, but we haven't shown the whole section so it's not visible. Hierarchy “My First-Level Header” is the title of a top-level section because it occurs inside a title in a sect1. A title element occurring somewhere else, say in a Chapter element, would be the title of the chapter. Plain text &SGML; documents can have varying character sets, but most are ASCII. &XML; documents use the Unicode character set. This makes &SGML; and &XML; documents highly portable across systems and tools. appearance SGML and formatting SGML documents filenames tags, specifying semantic content, SGML marking for In an &SGML; document, there is no obligatory difference between the size or face of the type in a first-level section header and the title of a book in a footnote or the first sentence of a body paragraph. All &SGML; files are simple text files without font changes or special characters.Some structured editors apply style to the document while it's being edited, using fonts and color to make the editing task easier, but this stylistic information is not stored in the actual &SGML;/&XML; document. Instead, it is provided by the editing application. Similarly, an &SGML; document does not specify the words in a text that are to be set in italic, bold, or roman type. Instead, &SGML; marks certain kinds of texts for their semantic content. For example, if a particular word is the name of a file, then the tags around it should specify that it is a filename: Many mail programs read configuration information from the users filename.mailrcfilename file. stylesheets SGML documents, specifying appearance appearance structure or content vs. CSS stylesheets FOSI stylesheets DSSSL stylesheets XSL stylesheets XML XSL stylesheets If the meaning of a phrase is particularly audacious, it might get tagged for boldness of thought instead of appearance. An &SGML; document contains all the information that a typesetter needs to lay out and typeset a printed page in the most effective and consistent way, but it does not specify the layout or the type.The distinction between appearance or presentation and structure or content is essential to &SGML;, but there is a way to specify the appearance of an &SGML; document: attach a stylesheet to it. There are several standards for such stylesheets: CSS, XSL, FOSIs, and DSSSL. See . DocBook DTD document type definition declarations SGML documents document type definitionsDTDs tags names document type definition combination rules (DTD) DTDs DTDs DocBookDocBook DTD Not only is the structure of an &SGML;/&XML; document explicit, but it is also carefully controlled. An &SGML; document makes reference to a set of declarations—a document type definition (&DTD;)—that contains an inventory of tag names and specifies the combination rules for the various structural and semantic features that make up a document. What the distinctive features are and how they should be combined is “arbitrary” in the sense that almost any selection of features and rules of composition is theoretically possible. The DocBook &DTD; chooses a particular set of features and rules for its users. sections ordering, DocBook DTD rules (example) Here is a specific example of how the DocBook &DTD; works. DocBook specifies that a third-level section can follow a second-level section but cannot follow a first-level section without an intervening second-level section. This is valid:<sect1><title>...</title> <sect2><title>...</title> <sect3><title>...</title> ... </sect3> </sect2> </sect1> This is not:<sect1><title>...</title> <sect3><title>...</title> ... </sect3> </sect1> parsers validating validation SGML documents DTDs validating SGML documents against instance (DocBook document) Because an &SGML;/&XML; document has an associated &DTD; that describes the valid, logical structures of the document, you can test the logical structure of any particular document against the &DTD;. This process is performed by a parser. An &SGML; processor must begin by parsing the document and determining if it is valid, that is, if it conforms to the rules specified in the &DTD;. &XML; processors are not required to check for validity, but it's always a good idea to check for validity when authoring. Because you can test and validate the structure of an &SGML;/&XML; document with software, a DocBook document containing a first-level section followed immediately by a third-level section will be identified as invalid, meaning that it's not a valid instance or example of a document defined by the DocBook &DTD;. Presumably, a document with a logical structure won't normally jump from a first- to a third-level section, so the rule is a safeguard—but not a guarantee—of good writing, or at the very least, reasonable structure. A parser also verifies that the names of the tags are correct and that tags requiring an ending tag have them. This means that a valid document is also one that should format correctly, without runs of paragraphs incorrectly appearing in bold type or similar monstrosities that everyone has seen in print at one time or another. For more information about &SGML;/&XML; parsers, see . In general, adherence to the explicit rules of structure and markup in a &DTD; is a useful and reassuring guarantee of consistency and reliability within documents, across document sets, and over time. This makes &SGML;/&XML; markup particularly desirable to corporations or governments that have large sets of documents to manage, but it is a boon to the individual writer as well. How can this markup help you? semantic markup presentation media, different Semantic markup makes your documents more amenable to interpretation by software, especially publishing software. You can publish a white paper, authored as a DocBook Article, in the following formats: articles formats, listed journal articles On the Web in &HTML; As a standalone document on 8½×11 paper As part of a quarterly journal, in a 6×9 format In Braille In audio You can produce each of these publications from exactly the same source document using the presentational techniques best suited to both the content of the document and the presentation medium. This versatility also frees the author to concentrate on the document content. For example, as we write this book, we don't know exactly how O'Reilly will choose to present chapter headings, bulleted lists, &SGML; terms, or any of the other semantic features. And we don't care. It's irrelevant; whatever presentation is chosen, the &SGML; sources will be transformed automatically into that style. Semantic markup can relieve the author of other, more significant burdens as well (after all, careful use of paragraph and character styles in a word processor document theoretically allows us to change the presentation independently from the document). Using semantic markup opens up your documents to a world of possibilities. Documents become, in a loose sense, databases of information. Programs can compile, retrieve, and otherwise manipulate the documents in predictable, useful ways. links SGML documents, maintaining elements linking to references Consider the online version of this book: almost every element name (Article, Book, and so on) is a hyperlink to the reference page that describes that element. Maintaining these links by hand would be tedious and might be unreliable, as well. Instead, every element name is marked as an element using SGMLTag: a Book is a sgmltagBooksgmltag. Because each element name in this book is tagged semantically, the program that produces the online version can determine which occurrences of the word “book” in the text are actually references to the Book element. The program can then automatically generate the appropriate hyperlink when it should. There's one last point to make about the versatility of &SGML; documents: how much you have depends on the &DTD;. If you take a good photo with a high resolution lens, you can print it and copy it and scan it and put it on the Web, and it will look good. If you start with a low-resolution picture it will not survive those transformations so well. DocBook &SGML;/&XML; has this advantage over, say, &HTML;: DocBook has specific and unambiguous semantic and structural markup, because you can convert its documents with ease into other presentational forms, and search them more precisely. If you start with &HTML;, whose markup is at a lower resolution than DocBook's, your versatility and searchability is substantially restricted and cannot be improved. What are the shortcomings to structural authoring? There are a few significant shortcomings to structured authoring: It requires a significant change in the authoring process. Writing structured documents is very different from writing with a typical word processor, and change is difficult. In particular, authors don't like giving up control over the appearance of their words especially now that they have acquired it with the advent of word processors. But many publishing companies need authors to relinquish that control, because book design and production remains their job, not their authors'. Because semantics are separate from appearance, in order to publish an &SGML;/&XML; document, a stylesheet or other tool must create the presentational form from the structural form. Writing stylesheets is a skill in its own right, and though not every author among a group of authors has to learn how to write them, someone has to. Authoring tools for &SGML; documents can generally be pretty expensive. While it's not entirely unreasonable to edit &SGML;/&XML; documents with a simple text editor, it's a bit tedious to do so. However, there are a few free tools that are &SGML;-aware. The widespread interest in &XML; may well produce new, clever, and less expensive &XML; editing tools. Elements and Attributes elements attributes attributes elements and elements attributesattributes empty elements end tags empty elements, not requiring cross references entities SGML/XML markup &SGML;/&XML; markup consists primarily of elements, attributes, and entities. Elements are the terms we have been speaking about most, like sect1, that describe a document's content and structure. Most elements are represented by pairs of tags and mark the start and end of the construct they surround—for example, the &SGML; source for this particular paragraph begins with a para tag and ends with a para tag. Some elements are “empty” (such as DocBook's cross-reference element, xref) and require no end tag.In &XML;, this is written as <xref/>, as we'll see in the section . ID attribute SGML start tags tags identifiers (SGML) end tags attributes and start tags attribute ID, containing Elements can, but don't necessarily, include one or more attributes, which are additional terms that extend the function or refine the content of a given element. For instance, in DocBook a sect1 start tag can contain an identifier—an id attribute—that will ultimately allow the writer to cross-reference it or enable a reader to retrieve it. End tags cannot contain attributes. A sect1 element with an id attribute looks like this: <sect1 id="idvalue"> namespaces XML tags tags namespaces (XML) validation namespace tags (XML), problems XML namespaces, using In &SGML;, the catalog of attributes that can occur on an element is predefined. You cannot add arbitrary attribute names to an element. Similarly, the values allowed for each attribute are predefined. In &XML;, the use of namespaces may allow you to add additional attributes to an element, but as of this writing, there's no way to perform validation on those attributes. SystemItem element subdividing into URL and email addresses Role attribute systemitem tags, subdividing The id attribute is one half of a cross reference. An idref attribute on another element, for example xref linkend=”idvalue” , provides the other half. These attributes provide whatever application might process the &SGML; source with the data needed either to make a hypertext link or to substitute a named and/or numbered cross reference in place of the xref. Another use for attributes is to specify subclasses of certain elements. For instance, you can subdivide DocBook's systemitem into URLs and email addresses by making the content of the role attribute the distinction between them, as in systemitem role="URL" versus systemitem role="emailaddr". Entities entities functions parsed entities unparsed entities names assigning to data (entities) Entities are a fundamental concept in &SGML; and &XML;, and can be somewhat daunting at first. They serve a number of related, but slightly different functions, and this makes them a little bit complicated. In the most general terms, entities allow you to assign a name to some chunk of data, and use that name to refer to that data. The complexity arises because there are two different contexts in which you can use entities (in the &DTD; and in your documents), two types of entities (parsed and unparsed), and two or three different ways in which the entities can point to the chunk of data that they name. In the rest of this section, we'll describe each of the commonly encountered entity types. If you find the material in this section confusing, feel free to skip over it now and come back to it later. We'll refer to the different types of entities as the need arises in our discussion of DocBook. Come back to this section when you're looking for more detail. Entities can be divided into two broad categories, general entities and parameter entities. Parameter entities are most often used in the &DTD;, not in documents, so we'll describe them last. Before you can use any type of entity, it must be formally declared. This is typically done in the document prologue, as we'll explain in , but we will show you how to declare each of the entities discussed here. General Entities general entities external and internal entities general In use, general entities are introduced with an ampersand (&) and end with a semicolon (;). Within the category of general entities, there are two types: internal general entities and external general entities. Internal general entities internal general entities names text, associating with (internal general entities) text entity, declaring as With internal entities, you can associate an essentially arbitrary piece of text (which may have other markup, including references to other entities) with a name. You can then include that text by referring to its name. For example, if your document frequently refers to, say, “O'Reilly & Associates,” you might declare it as an entity: ]]> Then, instead of typing it out each time, you can insert it as needed in your document with the entity reference ora, simply to save time. Note that this entity declaration includes another entity reference within it. That's perfectly valid as long as the reference isn't directly or indirectly recursive. entities adding directly to DTD If you find that you use a number of entities across many documents, you can add them directly to the &DTD; and avoid having to include the declarations in each document. See the discussion of dbgenent.mod in . External general entities external general entities SGML external documents, referencing (external general entities) parsers external file text, inserting files external, referencing With external entities, you can reference other documents from within your document. If these entities contain document text (&SGML; or &XML;), then references to them cause the parser to insert the text of the external file directly into your document (these are called parsed entities). In this way, you can use entities to divide your single, logical document into physically distinct chunks. For example, you might break your document into four chapters and store them in separate files. At the top of your document, you would include entity declarations to reference the four files: ]]> Your Book now consists simply of references to the entities: <book> &ch01; &ch02; &ch03; &ch04; </book> unparsed entities notations (unparsed entities) Sometimes it's useful to reference external files that don't contain document text. For example, you might want to reference an external graphic. You can do this with entities by declaring the type of data that's in the entity using a notation (these are called unparsed entities). For example, the following declaration declares the entity tree as an encapsulated PostScript image: ]]> elements entity attributes Entities declared this way cannot be inserted directly into your document. Instead, they must be used as entity attributes to elements: ]]> Conversely, you cannot use entities declared without a notation as the value of an entity attribute. Special characters markup distinguishing from content start tags beginning end tags beginning In order for the parser to recognize markup in your document, it must be able to distinguish markup from content. It does this with two special characters: “<,” which identifies the beginning of a start or end tag, and “&,” which identifies the beginning of an entity reference. start characters, changing In &XML;, these characters are fixed. In &SGML;, it is possible to change the markup start characters, but we won't consider that case here. If you change the markup start characters, you know what you're doing. While we're on the subject, in &SGML;, these characters only have their special meaning if they are followed by a name character. It is, in fact, valid in an &SGML; (but not an &XML;) document to write “O'Reilly & Associates” because the ampersand is not followed by a name character. Don't do this, however. characters entities encoding as entities characters angle brackets coding as entities If you want these characters to have their literal value, they must be encoded as entity references in your document. The entity reference lt produces a left angle bracket; amp produces the ampersand. marked sections character sequence, ending The sequence of characters that end a marked section (see ), such as ]]> must also be encoded with at least one entity reference if it is not being used to end a marked section. For this purpose, you can use the entity reference gt for the final right angle bracket. parsers entity references, interpreting If you do not encode each of these as their respective entity references, then an &SGML; parser or application is likely to interpret them as characters introducing elements or entities (an &XML; parser will always interpret them this way); consequently, they won't appear as you intended. If you wish to cite text that contains literal ampersands and less-than signs, you need to transform these two characters into entity references before they are included in a DocBook document. The only other alternative is to incorporate text that includes them in your document through some process that avoids the parser. data entities numeric character references In &SGML;, character entities are frequently declared using a third entity category (one that we deliberately chose to overlook), called data entities. In &XML;, these are declared using numeric character references. Numeric character references resemble entity references, but technically aren't the same. They have the form ϧ, in which “999” is the numeric character number. Unicode character set character numbers (XML) hexadecimal numeric character references (XML) In &XML;, the numeric character number is always the Unicode character number. In addition, &XML; allows hexadecimal numeric character references of the form &#xhhhh;. In &SGML;, the numeric character number is a number from the document character set that's declared in the &SGML; declaration. special characters, encoding as entities Character entities are also used to give a name to special characters that can't otherwise be typed or are not portable across applications and operating systems. You can then include these characters in your document by refering to their entity name. Instead of using the often obscure and inconsistent key combinations of your particular word processor to type, say, an uppercase letter U with an umlaut (Ü), you type in an entity for it instead. For instance, the entity for an uppercase letter U with an umlaut has been defined as the entity Uuml, so you would type in Uuml to reference it instead of the actual character. The &SGML; application that eventually processes your document for presentation will match the entity to your platform's handling of special characters in order to render it appropriately. Parameter Entities entities parameter entitiesparameter entities parameter entities Parameter entities are only recognized in markup declarations (in the &DTD;, for example). Instead of beginning with an ampersand, they begin with a percent sign. Parameter entities are most frequently used to customize the &DTD;. For a detailed discussion of this topic, see . Following are some other uses for them. Marked sections marked sections SGML marked sections XML marked sections You might use a parameter entity reference in an &SGML; document in a marked section. Marking sections is a mechanism for indicating that special processing should apply to a particular block of text. Marked sections are introduced by the special sequence <![keyword[ and end with ]]>. In &SGML;, marked sections can appear in both &DTD;s and document instances. In &XML;, they're only allowed in the &DTD;. CDATA marked sections Actually, CDATA marked sections are allowed in an &XML; document, but the keyword cannot be a parameter entity, and it must be typed literally. See the examples on this page. keywords marked sections INCLUDE keyword (marked section) IGNORE keyword (marked section) The most common keywords are INCLUDE, which indicates that the text in the marked section should be included in the document; IGNORE, which indicates that the text in the marked section should be ignored (it completely disappears from the parsed document); and CDATA, which indicates that all markup characters within that section should be ignored except for the closing characters ]]>. SGML keywords as parameter entities In &SGML;, these keywords can be parameter entities. For example, you might declare the following parameter entity in your document: ]]> Then you could put the sections of the document that are only applicable in a draft within marked sections: <![%draft;[ <para> This paragraph only appears in the draft version. </para> ]]> When you're ready to print the final version, simply change the draft parameter entity declaration: ]]> and publish the document. None of the draft sections will appear. How Does DocBook Fit In? DocBook DTD history and overview DocBook is a very popular set of tags for describing books, articles, and other prose documents, particularly technical documentation. DocBook is defined using the native &DTD; syntax of &SGML; and &XML;. Like &HTML;, DocBook is an example of a markup language defined in &SGML;/&XML;. A Short DocBook History DocBook is almost 10 years old. It began in 1991 as a joint project of HaL Computer Systems and O'Reilly. Its popularity grew, and eventually it spawned its own maintenance organization, the Davenport Group. In mid-1998, it became a Technical Committee (TC) of the Organization for the Advancement of Structured Information Standards (OASIS). The HaL and O'Reilly era Open Software Foundation troff markup (UNIX documentation) UNIX DocBook DTD, development The DocBook &DTD; was originally designed and implemented by HaL Computer Systems and O'Reilly & Associates around 1991. It was developed primarily to facilitate the exchange of &UNIX; documentation originally marked up in troff. Its design appears to have been based partly on input from &SGML; interchange projects conducted by the Unix International and Open Software Foundation consortia. Davenport Group (DocBook maintenance) When DocBook V1.1 was published, discussion about its revision and maintenance began in earnest in the Davenport Group, a forum created by O'Reilly for computer documentation producers. Version 1.2 was influenced strongly by Novell and Digital. In 1994, the Davenport Group became an officially chartered entity responsible for DocBook's maintenance. DocBook V1.2.2 was published simultaneously. The founding sponsors of this incarnation of Davenport include the following people: Jon Bosak, Novell Dale Dougherty, O'Reilly & Associates Ralph Ferris, Fujitsu OSSI Dave Hollander, Hewlett-Packard Eve Maler, Digital Equipment Corporation Murray Maloney, SCO Conleth O'Connell, HaL Computer Systems Nancy Paisner, Hitachi Computer Products Mike Rogers, SunSoft Jean Tappan, Unisys The Davenport era Under the auspices of the Davenport Group, the DocBook &DTD; began to widen its scope. It was now being used by a much wider audience, and for new purposes, such as direct authoring with &SGML;-aware tools, and publishing directly to paper. As the largest users of DocBook, Novell and Sun had a heavy influence on its design. DocBook DTD releases, rules for new versions In order to help users manage change, the new Davenport charter established the following rules for DocBook releases: Minor versions (point releases such as V2.2) could add to the markup model, but could not change it in a backward-incompatible way. For example, a new kind of list element could be added, but it would not be acceptable for the existing itemized-list model to start requiring two list items inside it instead of only one. Thus, any document conforming to version n.0 would also conform to n.m. Major versions (such as V3.0) could both add to the markup model and make backward-incompatible changes. However, the changes would have to be announced in the last major release. Major-version introductions must be separated by at least a year. DocBook DTD XML XML-compliant version XML DocBook version compliant with V3.0 was released in January 1997. After that time, although DocBook's audience continued to grow, many of the Davenport Group stalwarts became involved in the &XML; effort, and development slowed dramatically. The idea of creating an official &XML;-compliant version of DocBook was discussed, but not implemented. (For more detailed information about DocBook V3.0 and plans for subsequent versions, see .) OASIS DocBook Technical Committee The sponsors wanted to close out Davenport in an orderly way to ensure that DocBook users would be supported. It was suggested that OASIS become DocBook's new home. An OASIS DocBook Technical Committee was formed in July, 1998, with Eduardo Gutentag of Sun Microsystems as chair. The <acronym>OASIS</acronym> era The DocBook Technical Commitee is continuing the work started by the Davenport Group. The transition from Davenport to OASIS has been very smooth, in part because the core design team consists of essentially the same individuals (we all just changed hats). DocBook V3.1, published in February 1999, was the first OASIS release. It integrated a number of changes that had been in the wings for some time. In February of 2001, OASIS made DocBook SGML V4.1 and DocBook XML V4.1.2 official OASIS Specifications. Version 4.2 of the DocBook &DTD;, for both &SGML; and &XML;, was released in July 2002. The committee continues new DocBook development to ensure that the &DTD; continues to meet the needs of its users. Forthcoming and experimental work includes: A V5.0 DTD projected for release no earlier than the end of 2002. Experimental RELAX NG schemas available. Experimental W3C XML Schema versions available. Experimental RELAX schemas available. Experimental TREX schemas available.