Contents | PDF Glossary | References | PDF Home | Printable PDF

W 3 C's Logo - Links you to the W 3 C Home PageWAI Logo -  Links you to the W 3 C's Web Accessibility Initiative Home Page

PDF Techniques
for Web Content Accessibility Guidelines 1.0 and 2.0

W3C Internal Working Draft 13 September 2001

This version:
http://www.w3.org/WAI/GL/WCAG-PDF-TECHS-20010913
Latest version:
http://www.w3.org/WAI/GL/WCAG-PDF-TECHS/
Editors:
Loretta Guarino Reid, Adobe Systems
Katie Haritos-Shea
Wendy Chisholm, W3C

Copyright © 1999-2001 W3C (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.


Status


Introduction

This paper is just one in a series of techniques documents designed for authoring Accessible Web Content. For information about the other documents in the techniques series, please refer to the Techniques for Web Content Accessibility Guidelines 1.0 document.

In this PDF Techniques document we describe how to create accessible Adobe Portable Document Format
as refered to in the PDF Reference Manual Second Edition, Version 1.3 (HTML Version) (PDF Version),
and the Tagged PDF Document (HTML Version) Not available yet (PDF Version),
which explains the changes between PDF 1.3 and 1.4 .

This document is also intended to demonstrate:

  1. How a web author can determine whether a PDF file is accessible, and
  2. How developers can create tools that generate accessible PDF, and how authors can use tools to alter PDF content to improve accessibility.

Because PDF is a Page Description Language, it is usually not intended to be edited directly by authors. Therefore these techniques are intended particularly for the developers of authoring tools that generate PDF as an output format. Developers should also see the Authoring Tools Accessibility Guidelines 1.0 and the Authoring Tool Accessibility Guidelines "Wombat" Draft documents that are nearby.

However, since authors routinely deliver PDF documents as web content, it is important that they too understand what constitutes an accessible PDF file. Webmasters and other web authors should also see the Web Content Accessibility Guideline 1.0 and Web Content Accessibility Guideline 2.0 Working Draft documents. The most reliable way to generate accessible PDF is to create Tagged PDF files.

Tagged PDF is a stylized use of PDF that allows reliable recovery of text, graphics, and images in PDF documents, with no ambiguity about the contents or the ordering of the contents. A Tagged PDF file is page oriented. Each page of a Tagged PDF document contains the text, graphics, and images in reading order, as determined by the authoring application. A Tagged PDF is a Logical Structured PDF. Logical Structure is used to carry information necessary to support tagging for access and content extraction, as well as styling properties needed for access, reflow, and content extraction. It also provides the identification of the article flows in the cross-page environment for access and content extraction.

A short paragraph here making pdf relevant to people and their different devices. Examples of these devices could be Text-to-Speech for Voice Activation (phones.....reading aloud a pdf file online?) and for the Reflowing of Text to/for small PDA's. William, in support of device independence, would you be willing to create this for us??

To promote continuity across WAI documents and to assist in understanding accessibility principles, we have chosen to place each PDF checkpoint under the most appropriate one of the four WCAG 2.0 Guidelines.

For each technique, we identify the version of PDF in which the language support is first available. Where no version is specified, the technique can be applied in all versions of PDF.

 


Table of Contents

Status

Introduction

How This Draft is Organized

Assess PDF for Accessibility

PDF Glossary

PDF References Not available yet

WCAG 2.0 Guidelines

WCAG Guideline 1: PRESENTATION
WCAG Guideline 2: INTERACTION
WCAG Guideline 3: COMPREHENSION
WCAG Guideline 4: TECHNOLOGY CONSIDERATIONS

 

PDF Checkpoints

WCAG Guideline 1: PRESENTATION

PDF Checkpoint 1:
Ensure That The Text of the Document is Accessible

Checkpoint 1.1: Render Characters and Words in Reading Order within the Page
Checkpoint 1.2: Separate words explicitly with spacing characters
Checkpoint 1.3: Use soft hyphens and hard hyphens appropriately
Checkpoint 1.4: Use the ActualText attribute
Checkpoint 1.5: Ensure that all characters codes map reliably to Unicode

PDF Checkpoint 2:
Provide Text Alternatives for Images and Graphics

PDF Checkpoint 3:
Provide Structural Grouping

Checkpoint 3.1: Provide logical structure
Checkpoint 3.2: Tag artifacts in the page contents with /Artifact

PDF Checkpoint 4:
Design for User Control of Color and Contrast
Checkpoint 4.1: Avoid drawing rectangles behind text that are not background elements
Checkpoint 4.2: Text should not be placed on top of images
Checkpoint 4.3: Avoid the use of Color as the Only Means to Convey Information

PDF Checkpoint 5:
Identify the Natural Language of all Text in the Document
Checkpoint 5.1: Identify the documents primary language
Checkpoint 5.2: Identify when a language change occurs on the page

WCAG Guideline 2: INTERACTION

PDF Checkpoint 6:
Document Navigation
Checkpoint 6.1: Use bookmarks to provide navigation aids into a document.
Checkpoint 6.2: Use links within a document
Checkpoint 6.3: If the value of the link does not describe the target clearly and accurately, provide Alt attributes
Checkpoint 6.4: Provide a clear descriptive name for all form fields

WCAG Guideline 3: COMPREHENSION

PDF Checkpoint 7:
Provide Expansions for Acronyms and Abbreviations

WCAG Guideline 4: TECHNOLOGY CONSIDERATIONS

PDF Checkpoint 8:
Set Document Protections to Permit Access

 


How This Document is Organized

 


PDF Checkpoints

 


WCAG Guideline 1: PRESENTATION
Design content that allows presentation according to the user's needs and preferences

The PDF Techniques Checkpoints that are covered under this WCAG 2.0 Guideline 1 are . . . . . . .

 


PDF Checkpoint 1 Ensure That the Text of the Document is Accessible

Ensure that the text of your document can be extracted reliably in logical reading order.


PDF Checkpoint 1.1
Render Characters and Words in Reading Order within the Page

Render words, and characters within words, in reading order within the page-content stream. The ReverseChars marked content may be used when rendering right-to-left text that will be typeset left-to-right (PDF 1.4).

 

Correct PDF Technique:

@@

 

Deprecated PDF Techniques: (What you should NOT do)

Some PDF authoring applications save space by rendering all the characters in one font at a time. Hence, the PDF file may render all the bold characters on a page, then all the normal weight characters. Generally, this will cause characters not to be rendered in reading order.

A PDF page may render all characters left-to-right, top-to-bottom. For a multi-column document, this does not render the words in reading order, since the first line of the second column will be rendered before the second line of the first column.

 

PDF Checkpoint 1.2
Separate words explicitly with spacing characters.

Separate words explicitly with spacing characters. Do not rely on the location of the characters or the division of characters into showstring operations to indicate word breaks. Note that this implies that lines of text for western languages usually end with a trailing space character.

 

Consider rendering the two line example:

Now is the winter
of our discontent.

Correct PDF Technique:

  • Position to the beginning of line 1
  • Show String ("Now is the winter ")
  • Position to the beginning of line 2
  • Show String ("of ")
  • Position to the beginning of "our"
  • Show String ("our ")
  • Position to the beginning of "discontent"
  • Show String ("discontent. ")


This is the PDF content stream for this example:
/P <</MCID 0 >>B
/CS0 cs 0 0 0 scn
1 i /RelativeColorimetric ri
/GS0 gs
BT
/TT0 1 Tf                 // set the font
0.0004 Tc -0.0004 Tw      // set character and word spacing
12 0 0 12 90 708.96 Tm    // set text matrix
(Now is the winter )Tj    // note trailing space
0.0005 Tc 0 Tw            // set character and word spacing
0 -1.15 TD                // move to the next line
(of )Tj                   // note trailing space
[-200 (our )]TJ           // note trailing space
[-200 (discontent. )]TJ   // note trailing space
ET
EMC


Deprecated PDF Technique: (What you should NOT do)

  • Position to the beginning of line 1
  • Show String ("Now is the winter")
  • Position to the beginning of line 2
  • Show String ("of")
  • Position to the beginning of "our"
  • Show String ("our")
  • Position to the beginning of "discontent"
  • Show String ("discontent.")


This is the PDF content stream for this example:
/P <</MCID 0 >>BDC
/CS0 cs 0 0 0 scn
1 i
/RelativeColorimetric ri
/GS0 gs
BT
/TT0 1 Tf                   // set the font
0.0004 Tc -0.0004 Tw        // set character and word spacing
12 0 0 12 90 708.96 Tm      // set text matrix
(Now is the winter)Tj       // note no trailing space
0.0005 Tc 0 Tw              // set character and word spacing
0 -1.15 TD                  // move to the next line
(of)Tj                      // note no trailing space
[-300 (our)]TJ              // note no trailing space
[-250 (discontent.)]TJ      // note no trailing space
ET
EMC


 

Note: In the deprecated example above, there are no spacing characters at the end of either line, and there are no spacing characters between the words of the second line.

 

 

PDF Checkpoint 1.3
Use soft hyphens and hard hyphens appropriately.

Use a soft hyphen, identified by a character that maps to the Unicode value U+00AD or 173 decimal, when a line-break hyphen is introduced into the middle of a word.

 

Correct PDF Technique:

If the word "father-in-law" is hyphenated after the first syllable ("fa-ther-in-law"), the first hyphen should be a soft hyphen, and the second and third hyphens should be hard hyphens.


This is the PDF content stream for this example:
BT
/TT0 1 Tf                   // set the font
0.0003 Tc -0.0003 Tw        // set character and word spacing
12 0 0 12 90 708.96 Tm      // set text matrix
(fa)Tj                      // "fa"
/T1_0 1 Tf                  // switch to a font containing soft hyphen
(\255)Tj                    // the soft hyphen is at location
\255
                            // of the font
/TT0 1 Tf                   // switch font back
0 Tw 0 -1.15 TD             // position to next line
[(ther-)3(i)-2(n-)3(law )]TJ// the rest of the word
ER

 

 

PDF Checkpoint 1.4
Use the ActualText attribute.

If text characters are not rendered using the show string operation, they must be marked in the logical structure tree with ActualText attributes containing the derived Unicode values.

 

Correct PDF Technique:

Suppose the word "Arthur" is rendered using an illuminated A.
The structure subtree for this word might contain
<Figure> graphics or image for illuminated A
<Span> "rthur"
The <Figure> structural element should have the ActualText attribute, with value "A", so the word "Arthur" could be extracted.
@@Loretta will come up with a better example here: 7-20-2001@@

 

 

PDF Checkpoint 1.5:
Ensure that all characters codes map reliably to Unicode.

Within PDF, show string operations operate on a sequence of character codes with associated fonts. The character code indicates which of the fonts glyphs to render on the page. However, some fonts do not provide information about the Unicode character that corresponds to the glyph. Without this information, the text cannot be constructed.
So every such sequence of character codes must map unambiguously into a sequence of Unicode code points. Mapping is described:
@@ put link here to section 2.4 on page 102 of Tagged pdf doc, and create or get from LOretta html page as well.

 

PDF Example:

@@

 

PDF Example:

@@

 

Rationale for the importance of Unicode use:
The Unicode character set is a widely accepted international standard for encoding text. It is non-proprietary, and allows for a richness of character codes. Utilized the world over in internet and network protocols, documents of all kinds, assistive technology devices, user agents, and in an abundance of other areas, Unicode fulfills an important need in local, as well as international communication. It is also the basis on which the World Wide Web's present architecture is, and future architecture will be, built. Therefore it is the best existing system to use for ease of system, device and protocol interoperability for text encoding.
@@ KHS.......Is this accurate? We need help here....@@

 

@@ Also add more info and motivation on why we need to map to Unicode. @@
See http://www.w3.org/International/ and http://www.unicode.org/

 

 


PDF Checkpoint 2: Provide Text Alternatives for Images and Graphics

Alternative Text is additional or descriptive text that can be used to describe an image, formula, or other item in the document that does not translate naturally into text.

Since Tagged PDF enables many consumers to recover the contents of PDF pages, a Tagged PDF file should use the alternative text (Alt), actual text (ActualText), facilities.

 

PDF Example:

A pie chart may be described giving the values of the various sectors to provide visually impaired users with more detailed information than generally available from a single caption.

 

PDF Example:

 

 


PDF Checkpoint 3: Provide Structural Grouping

PDF Checkpoint 3.1
Provide logical structure (PDF Reference Manual Section 8.4.3) for the document.

  • Map structure types to the standard structure types described in Adobe Technical Note #5401. (PDF 1.3)
  • Suggested reading Section 3 of Tagged pdf doc @@ Put link here @@

 

PDF Example:

@@

 

PDF Example:

@@

 

PDF Checkpoint 3.2
Tag artifacts in the page contents with the /Artifact marked content

This is so that users can control how and whether they are included in the contents of the document.


Artifacts are either:
Artifacts of the printing process, like crop-box markings or the document file name.
Artifacts of the pagination of the document, that is elements that would be absent or present in a much different form if a document was always one big page. like running headers and page numbers
Artifacts of the layout process and typographic style, like a horizontal rule above a footnote.

 

PDF Example:

@@

 

PDF Example:

@@

 

 

 


PDF Checkpoint 4: Design for User Control of Color and Contrast

Acrobat lets the user control foreground and background color of a document. Text in the original document will use the foreground color. Background elements will use the background color. A background element is defined to be any rectangle that is aligned with the edges of the page (not skewed or rotated) and that covers 50% or more of the page.

 

PDF Checkpoint 4.1
Avoid drawing rectangles behind text that are not background elements.

Even if the rectangle color matches the background color, when the user changes the background color, the rectangle will not change and may cause contrast problems with the new foreground color.

 

PDF Example:

A document with a white background may draw a title by placing characters on a pink rectangle. Since the pink rectangle will not be recognized as part of the background, when the user sets the foreground color to white and the background color to black, the white characters will be hard to see on the pink background.

 

PDF Checkpoint 4.2
Text should not be placed on top of images.

Images are not affected by the background and foreground color settings, so when text is placed on top of images and the user changes the foreground color, the text may be difficult to read.

 

PDF Example:

Black text will be visible on a pastel-colored background image, but yellow text may not be, and changing the background color to black will not change the colors in the image.

 

PDF Checkpoint 4.3
Avoid the use of Color as the Only Means to Convey Information

Does the document avoid using color-coding as the only means of conveying information, indicating an action, prompting a response, or distinguishing a visual element?
@@New checkpoint: Use WCAG wording here. KHS and LGR@@

 

PDF Example:

 

 


PDF Checkpoint 5: Identify the Natural Language of all Text in the Document

Specification of the language used for text in a PDF document can increase the accessibility of that document for disabled users. For example, with the language correctly identified, text-to-speech engines can properly vocalize the text, either via a screen reader or some more direct invocation of a text-to-speech engine. KHS

PDF Checkpoint 5.1
Identify the documents primary language.

Use the language tagging facilities (Lang) to specify the natural language of all text in the document. (PDF 1.4)

 

Acrobat 5.0 Tip:

To view or edit the language element Properties in Acrobat 5.0:

  1. Select an element in the Tags panel
  2. Choose > Properties
  3. Enter or change the language property
  4. Click OK

 

Correct PDF Technique:

The language may be marked with an instream marking of the form:
/Span << /Lang (en-us) >> BDC (Text to be interpreted as US. English.) Tj EMC

 

 

PDF Checkpoint 5.2
Identify when a language change occurs on the page.

In addition to identifying the primary language as above, individual elements containing content in a language other than main document language should be set to indicate a language change has occured on the page. Thus, this signals the screen-reader to switch to an alternate pronunciation scheme, or to identify alternative hyphenation schemes for various languages.

 

Acrobat 5.0 Tip:

To view or edit the language element Properties in Acrobat 5.0:

  1. Select an element in the Tags panel
  2. Choose > Properties
  3. Enter or change the language property
  4. Click OK

 

Note: If it is not empty, the value of this key is a language identifier as defined by [IETF RFC 1766], “Tags for the Identification of Languages,” described in sec-tion 2.12 of the XML 1.0 Specification, http://www.w3.org/TR/REC-xml.

 

 

 


WCAG Guideline 2: INTERACTION
Design content that allows interaction according to the user's needs and preferences

The PDF Techniques Checkpoints that are covered under this WCAG 2.0 Guideline 2 are . . . . . . .

 


PDF Checkpoint 6: Document Navigation


PDF Checkpoint 6.1
Use bookmarks to provide navigation aids into a document.

A PDF file may contain a document outline, allowing the user to navigate interactively from one part of the document to another. The outline consists of a tree-structure hierarchy of outline items, sometimes called bookmarks, which permit the user to go to significant locations in the document.
The root of a document's outline hierarchy is an outline dictionary specified by the Outlines entry in the document catalog. Each outline item is defined by an outline item dictionary. The items at each level of the hierarchy are are chained together through the Prev and Next entries.

 

Provide bookmarks for these Outline items:
  • Table of Contents
  • Beginning of Chapters
  • Index

 

PDF Examples:

Here are examples of a typical outline dictionary and a typical outline item dictionary:


This is the PDF content stream for this Outline dictionary:
// Outline dictionary
21 0 obj
  << /Count 6
    /First 21 0 R
    /Last 29 0 R
  >>
endobj
                                   


This is the PDF content stream for this Outline item dictionary:
//Outline item dictionary
22 0 obj
  << /Title (Chapter 1)
    /Parent 21 0 R
    /Next 26 0 R
    /First 23 0 R
    /Last 25 0 R
    /Count 3
    /Dest [3 0 R /XYZ 0 792 0]
  >>
endobj
 

 

Acrobat 5.0 Tip:

Users can access bookmarks in the Bookmarks Panel in the Navigation Pane. The Acrobat Help documentation, starting on page 91, describes how to work with Bookmarks, creating, editing, and deleting them.

 


PDF Checkpoint 6.2
Use links within a document.

When the content of the document might lead the reader to consult another document, or, another location with-in the same document, provide a link.

 

PDF Example:

@@

 

PDF Examples:

Here are examples of a typical outline dictionary and a typical outline item dictionary:


This is the PDF content stream for this Outline dictionary:
// Outline dictionary
21 0 obj
  << /Count 6
    /First 21 0 R
    /Last 29 0 R
  >>
endobj
                                   


This is the PDF content stream for this Outline item dictionary:
//Outline item dictionary
22 0 obj
  << /Title (Chapter 1)
    /Parent 21 0 R
    /Next 26 0 R
    /First 23 0 R
    /Last 25 0 R
    /Count 3
    /Dest [3 0 R /XYZ 0 792 0]
  >>
endobj
 

 

Acrobat 5.0 Tip:

The Acrobat Help document, starting on page 98, describes how to create and work with links. "Advanced Techniques for Creating Accessible Adobe PDF Files" at http://www.adobe.com/products/acrobat/pdfs/CreateAccessibleAdvanced.pdf discusses how to use the Tags panel to add links to the logical structure tree.

 

 

PDF Checkpoint 6.3
If the value of the link does not describe the target clearly and accurately, provide Alt attributes.

@@

 

PDF Example:

@@

 

PDF Example:

@@

 

 

PDF Checkpoint 6.4
Provide a clear, descriptive name for all form feilds

In Acrobat, this field of the Form dialog is called the Short Description. "User name" - the PDF Manual refers to feild name as "user name". The Field Dictionary defines a field in an interactive form. It may contain the optional /TU key to define the description used by messages about the field.

 

Acrobat 5.0 Tip:

Provide a user name (/TU key) for all form fields

 

PDF Example:

This is a demonstration of the short description attached to a text form field named fl-4


This is the PDF content stream for this example:
304 0 obj <<
  /T (f1-4)                            // field name
/Kids [ 1038 0 R ]                     // children of this field
/FT /Tx                                // field type is Text
/DA (/HeBo 9 Tf 0 0 0.627 rg           // default appearance
/AA << >>                              // no additional actions for this field
/TU (your first name and initial)      // short description of field
>> endobj                                    


 


 

 


WCAG Guideline 3: COMPREHENSION
Make it as easy as possible to use and understand

The PDF Techniques Checkpoints that are covered under this WCAG 2.0 Guideline 3 are . . . . . . .

 


PDF Checkpoint 7: Provide Expansions for Acronyms and Abbreviations

Defining key terms and specialized language will help people who are not familiar with the topic you are presenting. Providing the expansion of abbreviations and acronyms not only helps people who are not familiar with the abbreviation or acronym but can clarify which meaning of an abbreviation or acronym is appropriate to use.

 

PDF Example:

The acronym "ADA" stands for both the American with Disabilities Act as well as the American Dental Association.

 

PDF Example:

@@

 

 

 


WCAG Guideline 4: TECHNOLOGY CONSIDERATIONS
Design for compatibility and interoperability

The PDF Techniques Checkpoints that are covered under this WCAG 2.0 Guideline 4 are . . . . . . .

 


PDF Checkpoint 8: Set document protections to permit access

Set the data access restrictions on the document to permit the contents to be accessed by assistive technologies.

  • When using 40 bit encryption:

  • Permit the text and graphics in the document to be copied.

  • When using 128 bit encryption:

  • Set accessibility permission for the document. 128 encryption was introduced with pdf 1.4, and cannot be read with versions of Acrobat previous to 5.0

 

PDF Example:

@@

 

PDF Example:

@@

 

 

 


Assessing the Accessibility of PDF

@@Auto checks: fill in with list of tools that automatically provide results to these checkpoints. Also, in manual provide list of tools that will give warning for possible problem.

WCAG 1.0 checkpoints that are not easily identifiable in WCAG 2.0 checkpoints:

WCAG 2.0 checkpoint Manual
checks
Auto
checks
Examples and
Techniques
1.1 Provide a text equivalent for all non-text content. Generated content
1.2 Synchronize text equivalents with multimedia presentations. N/A N/A Refer to client-side scripting techniques
1.3 Synchronize a description of the essential visual information in multimedia presentations. N/A N/A Refer to client-side scripting techniques
1.4 Use markup or a data model to provide the logical structure of content. Generate outline and determine if structure is good
1.5 Separate content and structure from presentation. Increase font size in the browser settings. does all the text increase in size? HTML Validator
2.1 Provide more than one path or method to find content. N/A N/A Refer to Core techniques
2.2 Provide consistent responses to user actions. N/A N/A Refer to client-side scripting techniques
2.3 Give users control of mechanisms that cause extreme changes in context. N/A N/A Refer to client-side scripting techniques
2.4 Give users control over how long they can spend reading or interacting with content. N/A N/A Refer to client-side scripting techniques
2.5 Use device-independent event handlers. N/A N/A Refer to client-side scripting techniques
3.1 Use consistent presentation. Check for use of external style sheets.

Check that classes are used for function rather than presentation (?)

3.2 Emphasize structure through presentation, positioning, and labels. Print on paper using only black and white ink. Is all the info present? i.e., is info presented not in color alone?
3.3 Write clearly and simply. N/A N/A Refer to Core techniques
3.4 Use multimedia to illustrate concepts.
3.5 Summarize complex information. N/A N/A Refer to Core techniques
3.6 Define key terms, abbreviations, acronyms, and specialized language. N/A N/A Refer to Core techniques
3.7 Divide information into smaller, more manageable units. N/A N/A Refer to Core techniques
4.1 Choose languages, API's, and protocols that support the use of these guidelines. Use CSS1 or CSS2 with HTML 4.01 or XHTML 1.1
4.2 Use languages, API's, and protocols according to specification.

check that link element in head of page references an external style sheet.

ensure that no style attributes exist on elements,

ensure that font element is not used.

etc.

W3C CSS Validator

W3C HTML Validator

4.3 Design assistive-technology compatible user interfaces. Does your site work with a variety of assistive technologies, such as screen readers, magnifiers, on screen keyboards, etc.?
4.4 Design content so that when presentation effects are turned off or not supported the content is still usable. Use a browser that does not support or that will turn off CSS to determine if content is still usable.

 


PDF Glossary

 


 

40 bit encryption
@@
128 bit encryption
@@
Accessibility permission
A PDF file can be encrypted (PDF 1.1) to protect its contents from unauthorized access. PDF's standard security handler defines a set of access privileges for a document, including privileges such as modifying the document's contents, copying text and graphics from the document, and printing the document. In PDF 1.4, this set includes accessibility permission, which controls whether the contents of the document are available via standard accessibility APIs to screen readers and other assistive technology.
ActualText value
Sometimes characters are rendered by graphics commands other than showstring. For instance, an illuminated character may be rendered by an image or a series of graphics commands. In this situations, the Actual Text property is used to identify the character being rendered. This character may be concantentated with adjoining text to form a word.
Adobe glyph name
The name of a character in the Adobe standard character encodings, in Appendix D of the PDF 1.3 Reference Manual. The encodings list characters, character names, and character codes used in platform standard encodings.
Article flows
Some documents may contain sequences of content items that are logically connected but not physically sequential. For example, a news story may begin on the first page of a newsletter and run over onto one or more nonconsecutive interior pages. A PDF document may define an article to represent such a sequence. The article flow is the order of content within the article.
Artifacts
A page element that is a side effect of rendering, rather than an intrinsic part of the document or story. For example, artifacts of the printing process might include crop-box markings or the document file name printed outside the crop box. Artifacts of the pagination of a document are elements that would be absent (or present in a different form) if the document was always one very big page. So pagination artifacts include running headers and page. A horizontal rule above a foornate would be an artifact of the layout process and typographic style.
Assistive Technology Devices
@@
Authoring Tools
@@
Bookmark
A bookmark is an item in the tree-structured document outline for a PDF document. When a bookmark is activated, it causes the PDF viewer to jump to a destination or trigger an action associated with the bookmark.
Characters
A character is a printable symbol having phonetic or pictographic meaning and usually forming part of a word of text, depicting a numeral, or expressing grammatical punctuation. A character is generally one of a limited number of symbols, including the letters of a particular language's alphabet, the numerals in the decimal number system, and certain special symbols such as the ampersan
Character codes
A show string is the encoded representation of a sequence of non-negative integers. Each of those integers is a Character Code. The interpretation of a show string depends on the associated font: some fonts imply a one-byte representation while others imply a multi-byte representation. Note: This is the PDF definition of character code and somewhat con-trary to normal usage. The same letter, for example a capital A, can have many different character codes on the same page.
Character Encoding
Character encoding is a table in a font or a computer operating system that maps character codes to glyphs in a font. Most operating systems today represent character codes with an 8-bit unit of data known as a byte. Thus, character encoding tables today are restricted to at most 256 character codes.
Not all operating system manufacturers use the same character encoding. For example, the Macintosh platform uses the standard Macintosh character set as defined by Apple Computer, Inc., while the Windows operating system uses another encoding entirely, as defined by Microsoft. Fortunately, standard Type 1 fonts contain all the glyphs needed for both these encodings, so they work correctly not only with these two systems, but others as well.
Column headers
@@
CMap
A CMap specifies the mapping from character codes to character selectors (CIDs, character names, or character codes) in one or more associated fonts or CIDFonts. It serves a function analogous to the Encoding dictionary for a simple font. A Cmap also specifies the writing mode - horizontal or vertical - for any CIDFont with which the CMap is combined.
Also a CMap (character map) file specifies the correspondence between character codes and the CID (character identifier) numbers used to identify characters. For composite (Type 0) fonts, it is the equivalent to the concept of an encoding in a simple font. A CMap can describe a mapping from multiple-byte codes to thousands of characters in a large CID-keyed font.
Concatenate
To combine character strings, to join together two or more files or lists to form one big one. Example: The Unix cat command can be used to concatenate files.
Content extraction
Content extraction is the conversion of the content of a PDF document into a different representation, such as text or XML.
Contrast
A subjective feeling that graphic elements (such as fonts) are different but work together well. This gives a feeling of variety without losing harmony. Within a particular font, contrast also refers to the variety of stroke thicknesses that make up the characters. Helvetica has low contrast and Bodoni has high contrast.
Crop box
The crop box defines the region to which the contents of the page are to be clipped (cropped) when displayed or printed.
Crop box markings
Crop box markings are cross-hairs marking the corners of the crop box for a page.
Cross page environment
@@
Data tables
@@
Encoding
Mapping from a character set definition to the actual code units used to represent the data. See also character encoding
Expansion
An expansion is the full representation of text that has only a partial representation in the document content. Abbreviations and acronyms are examples of such partial representations.
Form fields
A form field is an object in a document for gathering information interactively from the user. Examples include check boxes, radio buttons, text fields, list boxes and digital signatures.
Glyph
An image used in the visual representation of characters; roughly speaking, how a character looks. A font is a set of glyphs. In the simple case, for a given font (typeface and size), each character corresponds to a single glyph but this is not always the case, especially in a language with a large alphabet where one character may correspond to several glyphs or several characters to one glyph (a character encoding). A glyph can be an alphabetic or numeric font or some other symbol that pictures an encoded character. The following quote is from a document written as background for the Unicode character set standard. An ideal characterization of characters and glyphs and their relationship may be stated as follows: A character conveys distinctions in meaning or sounds. A character has no intrinsic appearance. A glyph conveys distinctions in form. A glyph has no intrinsic meaning. One or more characters may be depicted by one or more glyph representations (instances of an abstract glyph) in a possibly context dependent fashion. Glyph is from a Greek word for "carving."
Indexing value
@@
Line-break hyphen or Hard hyphen
Hyphens that you add explicitly by entering the dash character are called line-break or hard hyphens. A hyphen that is always set; for example, the hyphen in "cost-effective." A soft hyphen, by contrast, will only be set when a word that is not normally hyphenated falls at the end of a line, and must be broken for proper type spacing. Word processors use two basic techniques to perform hyphenation. The first employs an internal dictionary of words that indicates where hyphens may be inserted. The second uses a set of logical formulas to make hyphenation decisions. The dictionary method is more accurate but is usually slower. The most sophisticated programs use a combination of both methods. Most word processors allow you to override their own hyphenation rules and define yourself where a word should be divided.
Link text
@@
Logical structure tree
A PDF facility that allows the structure of a PDF file to be expressed via a Logical Structure. Ref: PDF Reference, section 8.4.3. The Logical Structure is separable from Tagged PDF, although Tagged PDF requires the use of Logical Structure. Every Tagged PDF is a Logical Structured PDF, but not all Logical Structured PDF files are Tagged PDFs.
MacRomanEncoding, MacExpertEncoding, or WinAnsi Encoding
The regular font encodings used for Latin-text fonts on mac OS and Windows systems are named MacRomanEncoding and WinAnsiEncoding, respectively. Additionally, an encoding named MacExpertEncoding is used with "expert" fonts that contain addiitonal characters useful for sophisticated typography. Complete details of these encodings and the characters present in typical fonts are found in Appendix D of the PDF Version 1.3 Reference Manual.
Map, mapped Needs Work
define a function that yields a unique output value for every input value.
Markup
@@
Non-proprietary Needs Work
not held in private ownership
Objects
An object is an identifiable, encapsulated entity that provides one or more services requested by a client. Objects can refer to the objects in OOP (object-oriented programming) or the objects in OLE (Object Linking and Embedding). In object-oriented programming, objects are the things you think about first in designing a program and they are also the units of code that are eventually derived from the process. In between, each object is made into a generic class of object and even more generic classes are defined so that objects can share models and reuse the class definitions in their code. Each object is an instance of a particular class or subclass with the class's own method or procedures and data variable. An object is what actually runs in the computer. An object can be a spell checker or a piece of a graphics program used to draw squares or circles. Do you remember the crazy story people used to try to tell about a word processer where you could pick all of your favorite pieces (favorite spell checker, grammar checker, text editor, font manager, etc.) and piece them together to form the ultimate customizable word processer? Well, those pieces are objects. In OLE, an object is a piece of a document, a graphic, or some multimedia. In general multimedia terms, an object is a stored data element, such as a video clip, an audio file, or a graphic representation of an object.
Output Format
The representation of a document.
Output method
@@
Page-content stream
A page's content stream contains operands and operators used to place "paint" on a page in selected areas. By executing the actions described in the page content stream, an application builds up the image of the page described by the stream.
Page oriented
Viewing a document as a sequence of pages of information, rather than as an unbroken sequence of information.
Pagination
The formatting of a document as a sequence of pages.
Protocols
A formal description of message formats and the rules two computers must follow to exchange those messages. Protocols can describe low-level details of machine-to-machine interfaces (e.g., the order in which bits and bytes are sent across a wire) or high-level exchanges between allocation programs (e.g., the way in which two programs transfer a file across the Internet).
Reflow
The reformatting of words on a page so that the words on each line fill that line as completely as possible.
ReverseChars
Font characteristics may suggest that right-to-left text be typeset left-to-right. The ReverseChars marked content indicates that the show strings within the marked content are individually reversed in reading order.
Running headers
Information that is repeated at the top of most or all of the pages in a document. Headers often contain information such as the page number, the name of the document, or the name of the chapter.
Showstring, Showstring operations
The PDF operation that shows text on a page. This includes both the showstring operator and the sequence of character codes that is its argument.
These strings that are the arguments to the PDF and Postscript text-showing operators that show text on a page. The show string is interpreted as a sequence of character codes identifying the glyphs to be painted.
Spacing characters
@@.
Soft hyphen
(a la Loretta) A character that is used to mark conditional hyphenation points. Unicode and ISO_Latin-1 code-point 0xAD.
A hyphen that will only be set if the word falls at the end of a line which is too long, and has to be broken. Hyphens inserted automatically by a hyphenation utility are called discretionary or soft hyphens. Word processors use two basic techniques to perform hyphenation. The first employs an internal dictionary of words that indicates where hyphens may be inserted. The second uses a set of logical formulas to make hyphenation decisions. The dictionary method is more accurate but is usually slower. The most sophisticated programs use a combination of both methods. Most word processors allow you to override their own hyphenation rules and define yourself where a word should be divided.
Tagged PDF
Is a version od PDF that provides structure and orders information to allow PDF documents to be read by screen-readers and to be reflowed to fit different display screen sizes. To accomplish this, Tagged PDF marks, or tags, the various elements that make up a page.
Tagging
The association of attributes of text with a point or range of the primary text. The value of a particular tag is not generally considered to be a part of the "content" of the text. A typical example of tagging is to mark the language or the font for a portion of text.
Trailing space character
A white space character inserted into the text for a page after the last word on a line. A trailing space character is not needed to produce the correct page image, but is important for determining word breaks in the text of the page.
Type 0 font, Type 1 font
Type 0 font: a composite font, that is, a font composed of other fonts, organized hierarchically.
Type 1 font: a font represented using the Adobe Type 1 Font Format. A Type 1 font program is a stylized PostScript program that describes glyph shapes.
Typographic style
The design decisions and conventions about how to represent information in a paginated document.
Unicode
A character coding scheme that uses 16 bits for each character, designed to extend the capabilities of ASCII, which uses seven bits. Nearly all letters and symbols in all languages can be represented in a standard way with Unicode. The first 128 characters of Unicode are identical to those in standard ASCII. Unicode is an entirely new idea in setting up binary codes for text or script characters. Officially called the Unicode Worldwide Character Standard, it is a system for "the interchange, processing, and display of the written texts of the diverse languages of the modern world." It also supports many classical and historical texts in a number of languages. Currently, the Unicode standard contains 57709 distinct coded characters derived from 24 supported language scripts. These characters cover the principal written languages of the world. Originally Unicode was designed to be universal, unique, and uniform, i.e., the code was to cover all major modern written languages (universal), each character was to have exactly one encoding (unique), and each character was to be represented by a fixed width in bits (uniform). Parallel to the development of Unicode an ISO/IEC standard was being worked on that put a large emphasis on being compatible with existing character codes such as ASCII or ISO Latin 1. To avoid having two competing 16-bit standards, in 1992 the two teams compromised to define a common character code standard, known both as Unicode and BMP. Since the merger the character codes are the same but the two standards are not identical. The ISO/IEC standard covers only coding while Unicode includes additional specifications that help implementation. Unicode is not a glyph encoding. The same character can be displayed as a variety of glyphs, depending not only on the font and style, but also on the adjacent characters. A sequence of characters can be displayed as a single glyph or a character can be displayed as a sequence of glyphs . Which will be the case, is often font dependent.
Unicode value, Unicode character
Unicode value or code point: The Unicode Consortium defined a set of sixteen-bit code points, 57709 of which are currently assigned and named Unicode Characters. The lowest 65536 code-points in ISO 10646-1 1993 are idential to the Unicode Standard and are sometimes called the Basic Multilingual Plane. See http://www.unicode.org
User agents
Software to access Web content, including desktop graphical browsers, text browsers, voice browsers, mobile phones, multimedia players, plug-ins, and some software assistive technologies used in conjunction with browsers such as screen readers, screen magnifiers, and voice recognition software.
User name (/TU key)
Any interactive form field may contain the optional /TU entry in its dictionary. This entry, known as the user name or short description, is used to identify this field when generating an error message or naming the field to a screen reader.
Word breaks
Applications divide the text of a page into words; word breaks are the points in the text stream that separate adjoining words. Different applications may use different rules for defining words; for example, one application may consider everything between white space characters to be a word. Another application may not include leading or trailing punctuation as part of a word.

 

 


PDF References


 

 

 

Last Updated: $Date: 2001/11/04 21:57:54 $ by: Katie Haritos-Shea