Contents | PDF
Glossary | References | PDF
Home | Printable PDF
 
PDF Techniques
for Web Content Accessibility Guidelines 1.0 and 2.0
W3C Internal Working Draft 24 July 2001
  - This version:
- 
      http://www.w3.org/WAI/GL/WCAG-PDF-TECHS-20010724
- Latest version:
- 
      http://www.w3.org/WAI/GL/WCAG-PDF-TECHS/
- Editors:
- Loretta Guarino
      Reid , Adobe Systems
- Katie Haritos-Shea
      Paradigm Solutions Corp. & NTIS - US Dept. of Commerce
- Wendy
      Chisholm , W3C
 
Status
 
Introduction
This paper is just one in a series of techniques documents designed for
authoring Accessible Web Content. For information about the other documents
in the techniques series, please refer to the 
Techniques for Web Content Accessibility Guidelines 1.0 document.
In this PDF Techniques document we describe how to create accessible Adobe
Portable Document Format
 as refered to in the PDF Reference Manual Second Edition, Version 1.3 
(HTML Version) (PDF Version),
and the Tagged PDF Document
(HTML Version) (PDF Version),
which explains the changes between PDF 1.3 and 1.4
.
This document is also intended to demonstrate:
  - How a web author can determine whether a PDF file is accessible,
  and
- How developers can create tools that generate accessible PDF, and how
    authors can use tools to alter PDF content to improve accessibility.
Because PDF is a Page
Description Language, it is usually not intended to be edited directly by
authors. Therefore these techniques are intended particularly for the
developers of authoring tools that
generate PDF as an output format.
Developers should also see the Authoring
Tools Accessibility Guidelines 1.0 and the Authoring
Tool Accessibility Guidelines "Wombat" Draft documents that are
nearby.
However, since authors routinely deliver PDF documents as web content, it
is important that they too understand what constitutes an accessible PDF
file. Webmasters and other web authors should also see the Web
Content Accessibility Guideline 1.0 and Web
Content Accessibility Guideline 2.0 documents. The most reliable way to
generate accessible PDF is to create Tagged
PDF files.
Tagged PDF is a stylized use of PDF that allows reliable recovery of text,
graphics, and images in PDF documents, with no ambiguity about the contents
or the ordering of the contents. A Tagged PDF file is page oriented. Each page of a Tagged PDF
document contains the text, graphics, and images in reading order, as determined by the
authoring application. A Tagged PDF is a Logical Structured PDF. Logical
Structure is used to carry information necessary to support tagging for access and content extraction, as well as styling
properties needed for access, reflow, and
content extraction. It also provides the identification of the article flows in the cross-page environment for access
and content extraction.
A short paragraph here making pdf relevant to people
and their different devices. Examples of these devices could be
Text-to-Speech for Voice Activation (phones.....reading aloud a pdf file
online?) and for the Reflowing of Text to/for small PDA's. William, in
support of device independence, would you be willing to create this for
us??
To promote continuity across WAI documents and to assist in
understanding accessibility principles, we have chosen to place each PDF
checkpoint under the most appropriate one of the four basic WCAG 2.0
Guidelines.
For each technique, we identify the version of PDF in which the language
support is first available. Where no version is specified, the technique can
be applied in all versions of PDF.
 
 
 
How This Document is Organised
  - WCAG Guidelines Refer to the main Web
    Content Accessibility Guidelines 2.0 Document
- PDF Checkpoints Refer to this PDF
    Techniques for Web Content Accessibility Guidelines Document. These are
    the points that must be satisfied to claim Conformance, if you wish to
    make your PDF documents accessibile to everyone.
- PDF Techniques and Examples Are some
    ideas to help you acheive that goal.
- Acrobat 5.0 Tips and Examples Are some
     ways 
to use Acrobat 5.0 to fix errors.
 
 
PDF Checkpoints
 
 
WCAG Guideline 1: PRESENTATION
Design content that allows presentation according to the user's needs and
preferences
The PDF Techniques Checkpoints that are covered under this WCAG 2.0
Guideline 1 are . . . . . . .
 
 
  PDF Checkpoint 1 Ensure
  That the Text of the Document is Accessible 
  
  Ensure that the text of your document can be extracted reliably in
  logical reading order.
  
   
  
    - PDF Checkpoint 1.1
 Render Characters and Words in Reading Order within the Page
- Render words, and characters within words, in reading order within
        the page-content stream.
        The ReverseChars marked content
        may be used when rendering right-to-left text that will be typeset
        left-to-right (PDF 1.4).
 
  
      - Correct PDF Technique:
 
 @@
 
  
      - Deprecated PDF Techniques:
        (What you should NOT do)
 
 Some PDF authoring applications save space by rendering all the
        characters in one font at a time. Hence, the PDF file may render all
        the bold characters on a page, then all the normal weight characters.
        Generally, this will cause characters not to be rendered in reading
        order.
 A PDF page may render all characters left-to-right, top-to-bottom.
        For a multi-column document, this does not render the words in
        reading order, since the first line of the second column will be
        rendered before the second line of the first column.
 
   
  
    - PDF Checkpoint 1.2
 Separate words explicitly with spacing characters.
- Separate words explicitly with spacing characters. Do not rely
        on the location of the characters or the division of characters into
        showstring operations to
        indicate word breaks. Note that this implies that lines of text for
        western languages usually end with a trailing space
      character.
 
  
    - Consider rendering the two line example:
 
- Now is the winter
 of our discontent.
      - Correct PDF Technique:
 
 
          - Position to the beginning of line 1
- Show String ("Now is the winter ")
- Position to the beginning of line 2
- Show String ("of ")
- Position to the beginning of "our"
- Show String ("our ")
- Position to the beginning of "discontent"
- Show String ("discontent. ")
 
      - Deprecated PDF Technique:
        (What you should NOT do)
 
 
          - Position to the beginning of line 1
- Show String ("Now is the winter")
- Position to the beginning of line 2
- Show String ("of")
- Position to the beginning of "our"
- Show String ("our")
- Position to the beginning of "discontent"
- Show String ("discontent.")
 
 
      - Note: In the deprecated example above, there are
        no spacing characters at the end of either line, and there are no
        spacing characters between the words of the second line.
 
   
   
  
    - PDF Checkpoint 1.3
 Use soft hyphens and hard hyphens appropriately.
- Use a soft hyphen, identified by
        a character that maps to the Unicode
        value U+00AD or 173 decimal, when a line-break hyphen is
        introduced into the middle of a word.
 
  
      - Correct PDF Technique:
 
 If the word "father-in-law" is hyphenated after the first syllable
        ("fa-ther-in-law"), the first hyphen should be a soft hyphen, and the
        second and third hyphens should be hard hyphens.
 
   
   
  
    - PDF Checkpoint 1.4
 Use the ActualText attribute.
- If text characters are not rendered using the show string
        operation, they must be marked in the logical structure tree with
        ActualText attridutes containing the
        derived Unicode values.
 
  
      - Correct PDF Technique:
 
 Suppose the word "Arthur" is rendered using an illuminated A.
 The structure subtree for this word might contain
 <Figure> graphics or image for illuminated A
 <Span> "rthur"
 The <Figure> structural element should have the ActualText
        attribute, with value "A", so the word "Arthur" could be
      extracted.
- @@Loretta will come up with a
        better example here: 7-20-2001@@
 
   
   
  
    - PDF Checkpoint 1.5:
 Ensure that all characters codes map reliably to Unicode.
- Within PDF, show string operations operate on a sequence of character codes with associated
        fonts. The character code indicates which of the fonts glyphs to
        render on the page. However, some fonts do not provide information
        about the Unicode character
        that corresponds to the glyph. Without
        this information, the text cannot be constructed.
- So every such sequence of character codes must map unambiguously into a sequence of Unicode
        code points. Mapping is described:
 @@ put link here
        to section 2.4 on page 102 of Tagged pdf doc, and create or get from
        LOretta html page as well.
 
  
      - PDF Example:
 
 @@
 
      - PDF Example:
 
 @@
 
  
      - Rationale for the importance of Unicode use: 
 The Unicode character set is a widely accepted international standard
        for encoding text. It is non-proprietary, and allows for a
        richness of character codes. Utilized the world over in internet and
        network protocols, documents of all
        kinds, assistive
        technology devices, user agents,
        and in an abundance of other areas, Unicode fulfills an important
        need in local, as well as international communication. It is also the
        basis on which the World Wide Web's present architecture is, and
        future architecture will be, built. Therefore it is the best existing
        system to use for ease of system, device and protocol
        interoperability for text encoding.
 @@ KHS.......Is this accurate? We
        need help here....@@
 
- @@ Also add more info and
        motivation on why we need to map to Unicode. @@
 See 
        http://www.w3.org/International/ and 
        http://www.unicode.org/
 
 
 
 
  
  PDF Checkpoint 2: Provide
  Text Alternatives for Images and Graphics 
  
  
Alternative Text is additional or descriptive text that can be used to
  describe an image, formula, or other item in the document that does not
  translate naturally into text.
  
  
    - Since Tagged PDF enables many consumers to recover the contents of
    PDF pages, a Tagged PDF file should use the alternative text (Alt),
    actual text (ActualText), facilities.
 
  
      - PDF Example:
 
 A pie chart may be described giving the values of the various sectors
        to provide visually impaired users with more detailed information
        than generally available from a single caption.
 
      - PDF Example:
 
 
 
 
 
 
  
  PDF Checkpoint 3: Provide
  Structural Grouping  
  
    - PDF Checkpoint 3.1
 Provide logical structure (PDF Reference Manual Section 8.4.3) for the
    document.
- 
          - Map structure types to the standard structure types described
            in Adobe Technical Note #5401. (PDF 1.3)
- Suggested reading Section 3 of Tagged pdf doc
            @@ Put link here @@
 
 
  
      - PDF Example:
 
 @@
 
      - PDF Example:
 
 @@
 
   
  
    - PDF Checkpoint 3.2
 Tag artifacts in the page contents with the /Artifact marked content
- This is so that users can control how and whether they are
        included in the contents of the document.
    - Artifacts are either:
- Artifacts of the printing process, like crop-box markings or the
        document file name.
- Artifacts of the pagination of the
        document, that is elements that would be absent or present in a much
        different form if a document was always one big page. like running headers and page
      numbers
- Artifacts of the layout process and typographic style, like a
        horizontal rule above a footnote.
 
  
      - PDF Example:
 
 @@
 
      - PDF Example:
 
 @@
 
   
 
 
 
  
  PDF Checkpoint 4: Design
  for User Control of Color and Contrast 
  Acrobat lets the user control foreground and background color of a
  document. Text in the original document will use the foreground color.
  Background elements will use the background color. A background element is
  defined to be any rectangle that is aligned with the edges of the page (not
  skewed or rotated) and that covers 50% or more of the page.
   
   
  
    - PDF Checkpoint 4.1
 Avoid drawing rectangles behind text that are not background
    elements.
- Even if the rectangle color matches the background color, when
        the user changes the background color, the rectangle will not change
        and may cause contrast problems with
        the new foreground color. 
 
  
      - PDF Example:
 
 A document with a white background may draw a title by placing
        characters on a pink rectangle. Since the pink rectangle will not be
        recognized as part of the background, when the user sets the
        foreground color to white and the background color to black, the
        white characters will be hard to see on the pink background.
 
   
  
    - PDF Checkpoint 4.2
 Text should not be placed on top of images.
- Images are not affected by the background and foreground color
        settings, so when text is placed on top of images and the user
        changes the foreground color, the text may be difficult to read. 
 
  
      - PDF Example:
 
 Black text will be visible on a pastel-colored background image, but
        yellow text may not be, and changing the background color to black
        will not change the colors in the image.
 
   
  
    - PDF Checkpoint 4.3
 Avoid the use of Color as the Only Means to Convey Information
- Does the document avoid using color-coding as the only means of
        conveying information, indicating an action, prompting a response, or
        distinguishing a visual element?
 @@New checkpoint: Use WCAG
        wording here. KHS and LGR@@
 
  
      - PDF Example:
 
 
 
 
 
 
  
  PDF Checkpoint 5: Identify the Natural Language of all Text in the
  Document 
  Specification of the language used for text in a PDF document can
  increase the accessibility of that document for disabled users. For
  example, with the language correctly identified, text-to-speech engines can
  properly vocalize the text, either via a screen reader or some more direct
  invocation of a text-to-speech engine. KHS
  
   
  
    - PDF Checkpoint 5.1
 Identify the documents primary language.
- Use the language tagging facilities (Lang) to specify 
          the natural language  of all text in the document. (PDF 1.4)
      
 
      - Acrobat 5.0 Tip:
 
     - To view or edit the language element Properties 
            in Acrobat 5.0:- 
       - Select an element in the Tags panel
- Choose > Properties
- Enter or change the language property
- Click OK
 
 
     
      - Correct PDF Technique:
 
 
      - The language may be marked with an instream marking of the
      form:
 /Span << /Lang (en-us) >> BDC (Text to be
        interpreted as US. English.)Tj EMC
  
      - Note that the expansion text and language tagging can be
      combined:
- /Span << /Lang (en-us) /E (replacement)
        >> BDC (abbrev.)Tj EMC
 
   
  
    - PDF Checkpoint 5.2
 Identify when a language change occurs on the page.
- In addition to identifying the primary language as above,
         individual elements containing content in a language other than
         main document language should be set to indicate a 
         language change has occured on the page. Thus, this signals the
         screen-reader to switch to an alternate pronunciation scheme,
         or to identify alternative hyphenation schemes for various 
         languages. 
 
      - Acrobat 5.0 Tip:
 
     - To view or edit the language element Properties 
            in Acrobat 5.0:- 
       - Select an element in the Tags panel
- Choose > Properties
- Enter or change the language property
- Click OK
 
 
  
      - Note: If it is not empty, the value of this key is
        a language identifier as defined by [IETF RFC 1766], “Tags for the
        Identification of Languages,” described in sec-tion 2.12 of the XML 1.0
        Specification, http://www.w3.org/TR/REC-xml.
 
   
 
 
 
WCAG Guideline 2: INTERACTION
Design content that allows interaction according to the user's needs and
preferences
The PDF Techniques Checkpoints that are covered under this WCAG 2.0
Guideline 2 are . . . . . . .
 
 
  
  PDF Checkpoint 6: Document Navigation 
  
    - 
    
 
    - PDF Checkpoint 6.1
 Use bookmarks to provide navigation aids
    into a document.
- (Here..?)Let the user jump to important locations at any time using the 
        Bookmarks panel
  
  
      - PDF Example:
 
 Provide bookmarks for:
                   - Table of Contents
- Beginning of Chapters
- Index
 
  
      - Acrobat 5.0 Tip:
 
 (or here..?)Let the user jump to important locations at any time using the
         Bookmarks panel
  
  
   
  
    - PDF Checkpoint 6.2
 Use links within a document.
- 
When the content of the document might lead the reader to consult another
document, or, another location with-in the same document, provide a link.
  
  
      - PDF Example:
 
 @@
  
      
  
  
   
   
  
    - PDF Checkpoint 6.3
 If the value of the link does not describe the target clearly and
    accurately, provide Alt attributes.
- @@
  
  
      - PDF Example:
 
 @@
  
      - PDF Example:
 
 @@
  
   
   
  
    - PDF Checkpoint 6.4
 Provide a clear, descriptive name for all form feilds
- In Acrobat, this field of the Form dialog is called the 
        Short Description. "User name" - the PDF Manual refers to feild
        name as "user name". The Field Dictionary defines a field in an
        interactive form.  It may contain the optional /TU key to define
        the description used by messages about the field.
        
  
  
      - Acrobat 5.0 Tip:
 
 Provide a user name (/TU key) for all form fields
  
      - PDF Example:
 
 @@
  
  
   
 
 
 
WCAG Guideline 3: COMPREHENSION
Make it as easy as possible to use and understand
The PDF Techniques Checkpoints that are covered under this WCAG 2.0
Guideline 3 are . . . . . . .
 
 
  
  PDF Checkpoint 7: Provide
  Expansions for Acronyms and Abbreviations 
  Defining key terms and specialized language will help people who are 
     not familiar with the topic you are presenting. Providing the expansion of 
     abbreviations and acronyms not only helps people who are not familiar with 
     the abbreviation or acronym but can clarify which meaning of an abbreviation 
     or acronym is appropriate to use.
  
  
      - PDF Example:
 
 The acronym "ADA" stands for both the American 
           with Disabilities Act as well as the American Dental
           Association.
  
      - PDF Example:
 
 @@
  
   
 
 
 
WCAG Guideline 4: TECHNOLOGY
CONSIDERATIONS
Design for compatibility and interoperability
The PDF Techniques Checkpoints that are covered under this WCAG 2.0
Guideline 4 are . . . . . . .
 
 
  
  PDF Checkpoint 8: Set
  document protections to permit access 
  Set the data access restrictions on the document to permit the contents
  to be accessed by assistive technologies.
  
    - When using 40 bit
      encryption:
    Permit the text and graphics in the document to be copied. 
    
    - When using 128 bit
      encryption:
    Set accessibility permission for the document. 128 encryption was
    introduced with pdf 1.4, and cannot be read with versions of Acrobat
    previous to 5.0
  
  
      - PDF Example:
 
 @@
  
      - PDF Example:
 
 @@
  
   
   
 
 
 
PDF Glossary
 
 
  - 40 bit
  encryption
- @@
- 128 bit
  encryption
- @@
- Accessibility
  permission
- A PDF file can be encrypted (PDF 1.1) to protect its
      contents from unauthorized access. PDF's standard security handler
      defines a set of access privileges for a document, including privileges
      such as modifying the document's contents, copying text and graphics
      from the document, and printing the document. In PDF 1.4, this set
      includes accessibility permission, which controls whether the contents
      of the document are available via standard accessibility APIs to screen
      readers and other assistive technology.
- ActualText
  value
- Sometimes characters are rendered by graphics
      commands other than showstring. For instance, an illuminated character
      may be rendered by an image or a series of graphics commands. In this
      situations, the Actual Text property is used to identify the character
      being rendered. This character may be concantentated with adjoining
      text to form a word.
- Adobe glyph
  name
- The name of a character in the Adobe standard
      character encodings, in Appendix D of the PDF 1.3 Reference Manual. The
      encodings list characters, character names, and character codes used in
      platform standard encodings.
- Article
  flows
- @@
- Artifacts
- A page element that is a side effect of rendering,
      rather than an intrinsic part of the document or story. For example,
      artifacts of the printing process might include crop-box markings or
      the document file name printed outside the crop box. Artifacts of the
      pagination of a document are elements that would be absent (or present
      in a different form) if the document was always one very big page. So
      pagination artifacts include running headers and page. A horizontal
      rule above a foornate would be an artifact of the layout process and
      typographic style.
- Assistive
  Technology Devices
- @@
- Authoring
  Tools
- @@
- Bookmarks
- @@
- Characters
- A character is a printable symbol having phonetic or
      pictographic meaning and usually forming part of a word of text,
      depicting a numeral, or expressing grammatical punctuation. A character
      is generally one of a limited number of symbols, including the letters
      of a particular language's alphabet, the numerals in the decimal number
      system, and certain special symbols such as the ampersan
- Character
  codes
- A show string is the encoded representation of a
      sequence of non-negative integers. Each of those integers is a
      Character Code. The interpretation of a show string depends on the
      associated font: some fonts imply a one-byte representation while
      others imply a multi-byte representation. Note: This is the PDF
      definition of character code and somewhat con-trary to normal usage.
      The same letter, for example a capital A, can have many different
      character codes on the same page.
- Character
  Encoding
- Character encoding is a table in a font or a
      computer operating system that maps character codes to glyphs in a
      font. Most operating systems today represent character codes with an
      8-bit unit of data known as a byte. Thus, character encoding tables
      today are restricted to at most 256 character codes.
- Not all operating system manufacturers use the same
      character encoding. For example, the Macintosh platform uses the
      standard Macintosh character set as defined by Apple Computer, Inc.,
      while the Windows operating system uses another encoding entirely, as
      defined by Microsoft. Fortunately, standard Type 1 fonts contain all
      the glyphs needed for both these encodings, so they work correctly not
      only with these two systems, but others as well.
- Column
  headers
- @@
- CMap
- A CMap specifies the mapping from character codes to
      character selectors (CIDs, character names, or character codes) in one
      or more associated fonts or CIDFonts. It serves a function analogous to
      the Encoding dictionary for a simple font. A Cmap also specifies the
      writing mode - horizontal or vertical - for any CIDFont with which the
      CMap is combined.
 Also a CMap (character map) file specifies the correspondence between
      character codes and the CID (character identifier) numbers used to
      identify characters. For composite (Type 0) fonts, it is the equivalent
      to the concept of an encoding in a simple font. A CMap can describe a
      mapping from multiple-byte codes to thousands of characters in a large
      CID-keyed font.
- Concatenate
- To combine character strings, to join together two
      or more files or lists to form one big one. Example: The Unix cat
      command can be used to concatenate files.
- @@
- Contrast
- A subjective feeling that graphic elements (such as
      fonts) are different but work together well. This gives a feeling of
      variety without losing harmony. Within a particular font, contrast also
      refers to the variety of stroke thicknesses that make up the
      characters. Helvetica has low contrast and Bodoni has high
    contrast.
- Crop box
- The crop box defines the region to which the contents of the page are
      to be clipped (cropped) when displayed or printed.
- Crop box
  markings
- @@
- Cross page
  environment
- @@
- Data tables
- @@
- Encoding
- Mapping from a character set definition to the
      actual code units used to represent the data. See also character encoding
- Expansion
- @@
- Form fields
- @@
- Glyph
- An image used in the visual representation of characters; roughly speaking, how a
      character looks. A font is a set of glyphs. In the simple case, for a
      given font (typeface and size), each character corresponds to a single
      glyph but this is not always the case, especially in a language with a
      large alphabet where one character may correspond to several glyphs or
      several characters to one glyph (a character encoding). A glyph can be
      an alphabetic or numeric font or some other symbol that pictures an
      encoded character. The following quote is from a document written as
      background for the Unicode character set standard. An ideal
      characterization of characters and glyphs and their relationship may be
      stated as follows: A character conveys distinctions in meaning or
      sounds. A character has no intrinsic appearance. A glyph conveys
      distinctions in form. A glyph has no intrinsic meaning. One or more
      characters may be depicted by one or more glyph representations
      (instances of an abstract glyph) in a possibly context dependent
      fashion. Glyph is from a Greek word for "carving."
- Indexing
  value
- @@
- Line-break hyphen or Hard
  hyphen
- Hyphens that you add explicitly by entering the dash
      character are called line-break or hard hyphens. A hyphen that is
      always set; for example, the hyphen in "cost-effective." A soft hyphen,
      by contrast, will only be set when a word that is not normally
      hyphenated falls at the end of a line, and must be broken for proper
      type spacing. Word processors use two basic techniques to perform
      hyphenation. The first employs an internal dictionary of words that
      indicates where hyphens may be inserted. The second uses a set of
      logical formulas to make hyphenation decisions. The dictionary method
      is more accurate but is usually slower. The most sophisticated programs
      use a combination of both methods. Most word processors allow you to
      override their own hyphenation rules and define yourself where a word
      should be divided.
- Link text
- @@
- Logical structure
  tree
- A PDF facility that allows the structure of a PDF
      file to be expressed via a Logical Structure. Ref: PDF Reference,
      section 8.4.3. The Logical Structure is separable from Tagged PDF,
      although Tagged PDF requires the use of Logical Structure. Every Tagged
      PDF is a Logical Structured PDF, but not all Logical Structured PDF
      files are Tagged PDFs.
- MacRomanEncoding,
  MacExpertEncoding, or WinAnsi Encoding
- The regular font encodings used for Latin-text fonts
      on mac OS and Windows systems are named MacRomanEncoding and
      WinAnsiEncoding, respectively. Additionally, an encoding named
      MacExpertEncoding is used with "expert" fonts that contain addiitonal
      characters useful for sophisticated typography. Complete details of
      these encodings and the characters present in typical fonts are found
      in Appendix D of the PDF Version 1.3 Reference Manual.
- Map,
  mapped
- @@.
- Markup
- @@
- Non-proprietary
- @@
- The rendered text content of a link.
- Objects
- An object is an identifiable, encapsulated entity
      that provides one or more services requested by a client. Objects can
      refer to the objects in OOP (object-oriented programming) or the
      objects in OLE (Object Linking and Embedding). In object-oriented
      programming, objects are the things you think about first in designing
      a program and they are also the units of code that are eventually
      derived from the process. In between, each object is made into a
      generic class of object and even more generic classes are defined so
      that objects can share models and reuse the class definitions in their
      code. Each object is an instance of a particular class or subclass with
      the class's own method or procedures and data variable. An object is
      what actually runs in the computer. An object can be a spell checker or
      a piece of a graphics program used to draw squares or circles. Do you
      remember the crazy story people used to try to tell about a word
      processer where you could pick all of your favorite pieces (favorite
      spell checker, grammar checker, text editor, font manager, etc.) and
      piece them together to form the ultimate customizable word processer?
      Well, those pieces are objects. In OLE, an object is a piece of a
      document, a graphic, or some multimedia. In general multimedia terms,
      an object is a stored data element, such as a video clip, an audio
      file, or a graphic representation of an object.
- Output
  Format
- @@
- Output
  method
- @@
- Page-content
  stream
- A page's content stream contains operands and
      operators used to place "paint" on a page in selected areas. By
      executing the actions described in the page content stream, an
      application builds up the image of the page described by the
    stream.
- Page
  oriented
- @@
- @@
- Protocols
- A formal description of message formats and the
      rules two computers must follow to exchange those messages. Protocols
      can describe low-level details of machine-to-machine interfaces (e.g.,
      the order in which bits and bytes are sent across a wire) or high-level
      exchanges between allocation programs (e.g., the way in which two
      programs transfer a file across the Internet).
- Reading
  order
- @@
- Reflow
- @@
- ReverseChars
- Font characteristics may suggest that right-to-left
      text be typeset left-to-right. The ReverseChars marked content
      indicates that the show strings within the marked content are
      individually reversed in reading order.
- @@.
- Showstring
- (a la Loretta) The strings that are the arguments to
      the PDF and Postscript text-showing operators that show text on a page.
      The show string is interpreted as a sequence of character codes
      identifying the glyphs to be painted.
- Showstring
  operations
- @@.
- Spacing
  characters
- @@.
- Soft hyphen
- (a la Loretta) A character that is used to mark
      conditional hyphenation points. Unicode and ISO_Latin-1 code-point
      0xAD. 
 A hyphen that will only be set if the word falls at the end of a line
      which is too long, and has to be broken. Hyphens inserted automatically
      by a hyphenation utility are called discretionary or soft hyphens. Word
      processors use two basic techniques to perform hyphenation. The first
      employs an internal dictionary of words that indicates where hyphens
      may be inserted. The second uses a set of logical formulas to make
      hyphenation decisions. The dictionary method is more accurate but is
      usually slower. The most sophisticated programs use a combination of
      both methods. Most word processors allow you to override their own
      hyphenation rules and define yourself where a word should be
    divided.
- Tagged PDF
- Is a version od PDF that provides structure and
      orders information to allow PDF documents to be read by screen-readers
      and to be reflowed to fit different display screen sizes. To accomplish
      this, Tagged PDF marks, or tags, the various elements that make up a
      page.
- Tagging
- The association of attributes of text with a point
      or range of the primary text. The value of a particular tag is not
      generally considered to be a part of the "content" of the text. A
      typical example of tagging is to mark the language or the font for a
      portion of text.
- Trailing space
  character
- A white space character inserted into the text for a
      page after the last word on a line. A trailing space character is not
      needed to produce the correct page image, but is important for
      determining word breaks in the text of the page.
- Type 0 font, Type 1
  font
- Type 0 font: a composite font, that is, a font
      composed of other fonts, organized hierarchically.
 Type 1 font: a font represented using the Adobe Type 1 Font Format. A
      Type 1 font program is a stylized PostScript program that describes
      glyph shapes.
- Typographic
  style
- @@.
- Unicode
- A character coding scheme that uses 16 bits for each
      character, designed to extend the capabilities of ASCII, which uses
      seven bits. Nearly all letters and symbols in all languages can be
      represented in a standard way with Unicode. The first 128 characters of
      Unicode are identical to those in standard ASCII. Unicode is an
      entirely new idea in setting up binary codes for text or script
      characters. Officially called the Unicode Worldwide Character Standard,
      it is a system for "the interchange, processing, and display of the
      written texts of the diverse languages of the modern world." It also
      supports many classical and historical texts in a number of languages.
      Currently, the Unicode standard contains 57709 distinct coded
      characters derived from 24 supported language scripts. These characters
      cover the principal written languages of the world. Originally Unicode
      was designed to be universal, unique, and uniform, i.e., the code was
      to cover all major modern written languages (universal), each character
      was to have exactly one encoding (unique), and each character was to be
      represented by a fixed width in bits (uniform). Parallel to the
      development of Unicode an ISO/IEC standard was being worked on that put
      a large emphasis on being compatible with existing character codes such
      as ASCII or ISO Latin 1. To avoid having two competing 16-bit
      standards, in 1992 the two teams compromised to define a common
      character code standard, known both as Unicode and BMP. Since the
      merger the character codes are the same but the two standards are not
      identical. The ISO/IEC standard covers only coding while Unicode
      includes additional specifications that help implementation. Unicode is
      not a glyph encoding. The same character can be displayed as a variety
      of glyphs, depending not only on the font and style, but also on the
      adjacent characters. A sequence of characters can be displayed as a
      single glyph or a character can be displayed as a sequence of glyphs.
      Which will be the case, is often font dependent.
- Unicode
  value
- (a la Loretta)Unicode value or code point: The
      Unicode Consortium defined a set of sixteen-bit code points, 57709 of
      which are currently assigned and named Unicode Characters. The lowest
      65536 code-points in ISO 10646-1 1993 are idential to the Unicode
      Standard and are sometimes called the Basic Multilingual Plane. See http://www.unicode.org
- Unicode
  character
- @@
- User agents
- Software to access Web content, including desktop
      graphical browsers, text browsers, voice browsers, mobile phones,
      multimedia players, plug-ins, and some software assistive technologies
      used in conjunction with browsers such as screen readers, screen
      magnifiers, and voice recognition software.
- User name (/TU
  key)
- Any interactive form field may contain the optional
      /TU entry in its dictionary. This entry, known as the user name or
      short description, is used to identify this field when generating an
      error message or naming the field to a screen reader.
- Word breaks
- Applications divide the text of a page into words;
      word breaks are the points in the text stream that separate adjoining
      words. Different applications may use different rules for defining
      words; for example, one application may consider everything between
      white space characters to be a word. Another application may not
      include leading or trailing punctuation as part of a word.