From HTML WG Wiki
Defining terms, variables and abbreviations
Corresponds to issue number 44
This page describes a solution to a common problem in writing technical documents: the need to define and deploy specialized terms and abbreviations. To address the problem, this proposal recommends adding a few new elements and several related attributes. The proposal basically establishes elements to explicitly markup a definition and to associate that definition with the terms, variables, and proper nouns that delineate the definition. The proposal also calls for using the existing DFN element to specifically mean, not only the defining instance of a term, but also the definition of an abbreviation’s expanded word or phrase. Attributes allow for DEFINE and DFN elements to be associated with the appropriate phrase elements. Other attributes allow the pronunciation of a term, abbreviation or proper noun to be established once and used throughout the document. Override pronunciations may also be set to define variant pronunciation or the pronunciation of terms and abbreviations with no associated definition.
Problem statement / use cases
- Authors need a way to define terms, abbreviations, variables, and proper nouns that may have a specialized meaning only within a particular academic discipline, a specific collection of documents, a specific document, or even just a section or article within a document.
- For documents making use of these specialized terms, abbreviations, variables and proper nouns, authors and users want to be able to provide custom styling to these semantics.
- Users of documents with specialized terminology want an easy mechanism to interactive discovery of the meaning, and pronunciation of terms as well as the expansion of abbreviations.
- Authors and users of documents with specialized terminology want an easy way to generate a terminology index, a glossary, people index, and so on.
- Currently HTML requires excessive repetition of information in a document to achieve these results.
- Users of speaking applications, screen readers and other audible UAs may require authors to convey pronunciation information for specialized and newly coined abbreviations and terms.
Add new elements, attributes, and UA behavior
The semantics currently available in HTML provide some rudimentary capabilities to address these use cases. Unfortunately HTML stops just shy of providing a complete solution. Adding a few new elements and some new attributes, will round-out HTML and allow authors to properly specify these semantics with minimal additional markup.
To round-out HTML’s specialized terminology mechanism this proposal recommends adding the following elements and attributes. The elements fall into two categories:
- definition elements (DEFINE, DD, and DFN) and
- definition referencing elements (DFN, T, VAR, PN, ABBR, DT)
Note that DFN serves a dual role as either a definition element or a definition referencing element or both. Note also that authors pair a DT definition referencing element with a DD definition element within definition lists (the DL element).
- DFN element: contains the first usage of a term within a particular scope and typically contained in or near a DEFINE element.
- @type(QName): allows authors specify the type of term defined. UAs may use tye DFN type to segregate terms into different indexes. The type _homograph is reserved and allows the matching of a DFN containing a homograph with all matching strings in the document regardless of whether they are marked up or not. This implies that the secondary homograph must be marked up if more than one homograph from a group of homographs appears in the same document. The DFN element can also be used to relay associated terms, variables and proper names to definitions located in remote resources.
- @variantof(string): used for matching with a DEFINE elements @word attribute string value when the contents of the DFN element are a variant of the term
- DEFINE element (or DD element in this context): contains a definition or the expansion of an abbreviation. Authors must include the term defined by the DEFINE element in a DFN element. Authors may include the DFN element containing the term within the DEFINE element or in the text preceding or trailing the DEFINE element. By including the defined term before or after the DEFINE element, authors facilitate easier generation of a glossary for a document. Authors must include either a valid id attribute value or a valid @word attribute value on each DEFINE element. For referencing reusable DEFINE elements from other documents, authors must include an id attribute on the DEFINE element.
Attributes for both the DFN and DEFINE elements
- @declare(BOOL): indicates the element is only to declare a definition and normally should not be displayed in the normal flow of the document
- @word(STRING): the variable, term, proper name/noun, or the abbreviation for the abbreviation expansion defined in the element.
- @casesensitive(BOOL): indicates that UAs must match the @word attribute’s value with variables, terms, and proper names/nouns only in a case sensitive manner
- @apply2onlymarkedup (BOOL): indicates @word attribute value for the DEFINE or DFN element should only be applied to words matching the string that are marked up in a VAR, T, PN, or DFN element (without this boolean attribute the UA will attempt to match every word in the text contents of the document — words as defined by Unicode — with the @word attribute value of the DEFINE or DFN element)
- @type(QName): allows authors to indicate the type of the variable, term, or abbreviation defined. Authors may use the prefix "a-" to indicate their own types for T, PN, and VAR elements but not ABBR elements. This attribute can be overridden by including a separate type attribute value on definition referencing elements.
Definition referencing elements
- ABBR element: contains an abbreviated form that may be associated with an expansion of the abbreviated from in a DEFINE element.
- @type(QName): allows authors to indicate the type of abbreviated form. Since only six values QName values are included for ABBR and authors are discouraged from using their own types on ABBR elements, these QNames serve the same as enumerated values: abbreviation | short | acronym | initialism | camelcase-abbr | alpha-numeric
- VAR element: contains a variable. The variable can be further defined through the DEFINE element or by indicating it is a variable of a certain type by using the type attribute
- @type(QName): allows authors to more precisely specify the type of variable the VAR element represents. UAs which perform document processing relevant to the VAR element’s type should treat all matching VAR elements the same as the type of the prior VAR element that has a value indicated for the type attribute
- T and DFN elements: contain a term that either does not commonly appear in a dictionary or is used in the present document in a specialized or technical way. Uses for the term element include: neologisms, technical terms, terms for indexing, specialized term usage, or homographs
- @type(QName): allows authors to indicate the type of the term. Including the type attribute on a definition referencing element overrides the definition element type if any
- PN element: contains a proper name (a proper noun) that may be further defined through a DEFINE element.
- @iriref(IRI): since proper names make persistent reference to specific entities, the PN element includes this attribute to permit authors to include precise reference through an IRI that persistently and uniquely identifies the proper name referenced entity
- @type(QName): allows authors to indicate the type of the variable, term, or abbreviation defined. Including the type attribute on a definition referencing element overrides the definition element type if any
Attributes common to ABBR, VAR, T, DFN, and PN
- variantof(string): used for matching with a DEFINE elements @word attribute string value when the contents of the DFN element are a variant of the term and no match can be found for the wordfor attribute
- define(string): allows authors to include a definition within an attribute when a document fragment is either unnecessary or unwanted and the author does not need the definition to appear in the normal flow of the document without using CSS
- wordfor(URI or IDREF hash): allows the element to be associated with a definition by a specific URI/IDREF association. Authors should only use either the variantof attribute of the wordfor attribute but not both. For matching VAR, T, PN, and DFN elements on one hand with a define attribute value, or a DFN or DEFINE element on the other hand, UAs must treat the order of precedence as: 1) the define attribute, 2) wordfor attribute, 3) variantof attribute, 4) the element’s innertext contents
The type attribute
Aside from ABBR elements, a leading use for the type attribute is to separate terms, variables, and proper names into categories for indexes or other listings.
Pronunciation attributes common to all these elements (see PronunciationSemantics: separate issue):
- expressed-as(enumerated value: characters|word|phrase) (indicates the authors preference for expanded or abbreviated pronunciation)
The use of the pronunciation attributes on the DEFINE element permits authors to use the attributes once and have them reused throughout the document. However, authors can also use these attributes on any separate definition referencing element to override the pronunciation attributes included on the DEFINE element.
Before performing matches, UAs must retrieve each resource referenced by a LINK element with the rel attribute set to a value of 'glossary'. For each resource so retrieved, the following matching algorithm must be performed as if the linked document was a part of the present document except for the purpose of the wordfor attribute which specifies only URIs or alternately IDREF hashes to document fragments in the present document.
If a definition referencing element has a define attribute set with a non-null string, UAs must use that attribute value as the elements definition. For any definition referencing element without a non-null define attribute value, UAs providing terms processing must match each definition referencing element (DFN, T, VAR, PN, ABBR) — that has no value for the define attribute — with a corresponding definition element (DFN and DEFINE), if possible, following the appropriate order of precedence. To match the definition referencing elements with a definition element, the UA should match the wordfor attribute with a corresponding DEFINE or DFN element either in the present document or another document by the wordfor attribute. If no match is found or the resource cannot be retrieve, the UA must either 1) if the definition referencing element has a variantof attribute value other than null, match the variantof string with a string from the @word attribute value of a DFN element; 2) if no matching @word attribute is found on a DFN element match the variantof string with a string from a @word attribute value of a DEFINE or DD or LI element (in that order); 3) if the element has no variantof attribute, match the innertext string contents of the element with the @word attribute value of a DFN element; 4) finally if the UA cannot match a DFN element @word attribute value, the UA should match the innertext string contents of the element with the @word attribute value of a DEFINE or DD or LI element (in that order).
For definition referencing elements that have no matching definition element or no matching definition element can be found, the UA should provisionally treat the title attribute of the element as its definition (or its expansion in the case of an ABBR element).
Finally for speaking UAs encountering an ambiguous homograph, the UA should match the homograph with any DFN elements with type='homograph' indicated. Authors can therefore use a DFN element type='_homograph' phonetic='<string>' or accompanying CSS to define the pronunciation of such homographs.
For DT definition referencing elements, the following DD element is always contains the associated definition. UAs must ignore any 'define', 'wordfor', or 'variantof' attributes on DT elements.
Duplicate matching @word strings: If more than one definition element (DD, DEFINE or DFN) meets the criteria to match a definition referencing element, UAs must match the last such definition element which appears before the definition referencing element according to document order.
Minimal markup and maximal reuse
By allowing authors to specify all of the semantics for terms on a single defining element (DFN, DD, or DEFINE) authors need only provide semantics once in a document or even once in a large collection of documents by making reference to an external glossary. Therefore, authors also gain extensive reusability of their own custom definitions or on definitions provided externally or through various authoring communities (e.g., microformat communities). Also authors may use the "relay" aspects of the DFN element to move any number of attributes to the local document or even override the semantics provided by external documents and other defining element document fragments. Similarly, by using multiple DFN elements all pointing to a single DEFINE element, authors can explicitly express the many variants of a word (e.g., “fly” and “flying”), if needed, instead of relying on the variantof attribute.
Using the wordfor(URI) attribute authors may even use major online dictionaries such as wiktionary to reference term definitions (especially for homographs for example) [does wikitionary have any kind of stability policy, especially for homograph ordering; probably not].
To promote maximum reuse and still permit some local resource data, the DFN element serves double duty as both a definition element and a definition reference element.
<p>In this article, I use the term <dfn wordfor='http://www.example.com/glossaries/important-terms.html#overdetermine' word='overdetermine' phonetic='ˌōvərdiˈtərmən' >overdetermine</dfn> in the usual manner... </p> ⋮ <p>However, some things may not be <t variantof='overdeterine' >overdetermined</t></p> ⋮ <p>When too many of these <t>overdetermine</t> the conditions of... </p>
This example shows how to explicitly specify differentiated pronunciations for homographs. The first in the set of homographs (“wind”, pronounced “wind” not “wined”) is merely declared so that it doesn't appear in the normal flow of the document (CSS display property set to 'none'). Since this term will seldom be used in this document, the @word attribute value is set to wind2 implying that it will need markup for the few subsequent appearances of the term in the present document. The second of the homographs (“wind”, pronounced “wined”) is marked-up with the word itself as the @word attributes string value, enabling its repeated use throughout the document without any other markup around the term.
<h1>Properly winding the banner</h1> ⋮ <p><dfn declare='declare' type='homograph' word='wind2' phonetic='|wind|' > Remember each evening to <dfn type='homograph' word='wind' phonetic='|waɪnd|' >wind</dfn> the banner around its standard to protect it from the heavy overnight <t variantof='wind2' >wind</t>. It is best to wind the banner clockwise to...</p>
- algorithms for generating a glossary, and indexes for terms, abbreviations, names and variables.
- UA norm for interactive display of definitions, abbreviation expansions, and pronunciation hints.
- audible UA norm for using phonetic and pronunciation attributes for speech synthesis pronunciation of abbreviations and terms. Also when encountering new terms and abbreviations within a document, UAs may prompt the user to add these terms and abbreviations to the users own dictionary.
QNames for @type attribute on VAR element
These type NCNames would come from standard academic and professional disciplines that make use of variables:
- Computer science native data types, for example:
- A) float
- B) integer
- C) decimal
- D) string
- E) char
- F) array
- G) structure
- H) function
- I) method
- Computer science OOP, for example:
- A) class
- B) instance
- Computer science markup, for example:
- A) element
- B) attribute
- C) content model
- D) parameter entity reference
- E) entity reference
- F) markup schema
- G) property
- Mathematics and formal logic, for example:
- A) proposition
- B) set
- C) member
- D) real number
- E) complex number
- F) rational number
Liaison with CSS WG
This proposal provides methods for authors to include definitions and abbreviations expansions that UAs can present interactively. The proposal also permits authors to explicitly markup glossaries using the definition list elements and allow extensive reuse of the semantic properties of those glossaries. However, authors may also want to provide definitions and abbreviation expansion "inline" throughout the document and then have content generated automatically as part of presentation of the innate HTML markup.
The markup in this proposal allows authors to encode in HTML documents complete data to generate:
- a glossary
- a table of authority
- index (or indexes)
Discussion and Evaluation
- Allows authors to include specialized terms, abbreviations, variables and proper nouns with a minimal amount of additional markup or text
- Allows authors and users to style the definitions and the references to the definitions (the terms and proper nouns)
- Facilitates the automatic generation of term, name, variable and abbreviation indexes
- Facilitates the automatic generation of glossaries.
- Facilitates interactive user discovery of meaning for specialized terminology (e.g., users skimming document can easily catch up on the meaning of earlier defined terms).
- Degrades gracefully.
- Does not work in pre-HTML5 UAs (this would need to be supplemented with the verbose techniques currently used).
- Rob Burns (July 2007). Proposes much of this approach as part of a review of the phrase elements draft section.
- Ben Boyle (July 2007). Proposes much of this approach regarding abbreviations as part of a review of the phrase elements draft section.
- Olivier GENDRIN (August 2007) Raises the issue of trying to handle multiple languages using the current title attribute abbreviation expansion practice.
- Ben Boyle (January 2008). Rambling email on abbrreviations.
- AbbrAndInitialisms#head-0ae1fbcaa09fb665752dddcaeb312cd3a765c335: HTML needs an initialism element (Solution demonstrated for ABBR)
- Proposals facilitating the automatic generation of lists of document data
- Subtext: footnotes and endnotes from subtext markup
- AttrtibuCitaQuotationReferencing: bibliographic source reference list from CITE, Q, BLOCKQUOTE, A AND OL@type='references' markup
- glossary and indexes from VAR, T, PN and ABBR markup (this proposal)
- outline or table of contents from H and SECTION markup
- SemanticPresentationLegendCSS: presentation / semantic legend through CSS properties and supplemental HTML