Techniques for WCAG 2.0

Skip to Content (Press Enter)

PDF Technology Notes

Introduction

The Portable Document Format (PDF) is a file format for representing documents in a manner independent of the application software, hardware, and operating system used to create them, as well as of the output device on which they are to be displayed or printed. PDF files specify the appearance of pages in a document in a reliable, device-independent manner. The PDF specification was introduced by Adobe Systems in 1993 as a publicly available standard. In January 2008, PDF 1.7 became an ISO standard (ISO 32000-1).

Of note for accessibility is PDF/UA (Universal Accessibility) which became an ISO Standard in July 2012 (ISO 14289-1:2012 (See PDF/UA (ISO 14289-1:2012).) The scope of PDF/UA is not meant to be a techniques (how-to) specification, but rather a set of guidelines for creating more accessible PDF. The specification describes the required and prohibited components and the conditions governing their inclusion in or exclusion from a PDF file in order for the file to be available to the widest possible audience, including those with disabilities. The mechanisms for including the components in the PDF stream are left to the discretion of the individual developer, PDF generator, or PDF viewing agent. PDF/UA also specifies the rules governing the behavior for a conforming reader.

PDF Accessibility Support

PDF includes several features in support of accessibility of documents to users with disabilities. The core of this support lies in the ability to determine the logical order of content in a PDF document, independently of the content's appearance or layout, through logical structure and Tagged PDF. Applications can extract the content of a document for presentation to users with disabilities by traversing the structure hierarchy and presenting the contents of each node. For this reason, producers of PDF files must ensure that all information in a document is reachable by means of the structure hierarchy.

Logical Structure

PDF's logical structure features (introduced in PDF 1.3) provide a mechanism for incorporating structural information about a document's content into a PDF file. Such information might include, for example, the organization of the document into chapters, headings, paragraphs and sections or the identification of special elements such as figures, tables, and footnotes. The logical structure features are extensible, allowing applications that produce PDF files to choose what structural information to include and how to represent it, while enabling PDF consumers to navigate a file without knowing the producer's structural conventions.

PDF logical structure shares basic features with standard document markup languages such as HTML, SGML, and XML. A document's logical structure is expressed as a hierarchy of structure elements, each represented by a dictionary object. Like their counterparts in other markup languages, PDF structure elements can have content and attributes. In PDF, rendered document content takes over the role occupied by text in HTML, SGML, and XML.

A PDF document's logical structure is stored separately from its visible content, with pointers from each to the other. This separation allows the ordering and nesting of logical elements to be entirely independent of the order and location of graphics objects on the document's pages.

The logical structure of a document is described by a hierarchy of objects called the structure hierarchy or structure tree. At the root of the hierarchy is a dictionary object called the structure tree root, located by means of the StructTreeRoot entry in the document catalog. See Section 14.7.2, ("Structure Hierarchy") in PDF 1.7 (ISO 32000-1): Table 322 shows the entries in the structure tree root dictionary. The K entry specifies the immediate children of the structure tree root, which are structure elements.

Tagged PDF

Tagged PDF (PDF 1.4) is a stylized use of PDF that builds on PDF's logical structure framework. It defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes. It is intended for use by tools that perform the following types of operations:

PDF File Production and Accessibility

PDF files may be produced either directly by application programs or indirectly by conversion from other file formats or imaging models. In addition, tools exist for inspecting, checking, and repairing PDF files for accessibility. The following sections provide representative lists of applications and tools typically used for these functions.

These notes do not, and cannot, provide an exhaustive list, nor do they endorse particular applications and tools. Rather they provide a snapshot of tools in fairly wide use at the time the WCAG Working Group undertook to review and publish techniques for producing PDF documents. As with any software, application support for PDF accessibility will vary with different versions, with the formatting requirements of specific PDF documents, and with actual usage of the application. That is, the tools can be used properly to produce appropriate tags, etc..

In general, newer tools will provide greater support than earlier ones. The tools' vendors are the source of authoritative information about their support for PDF accessibility.

Generating PDF Files

Many applications can generate PDF files directly, and some can import them as well. This direct approach is preferable, since it gives the application access to the full capabilities of PDF, including the imaging model and the interactive and document interchange features. Alternatively, applications that do not generate PDF directly can produce PDF output indirectly. There are two principal indirect methods:

Although these indirect strategies are often the easiest way to obtain PDF output from an existing application, the resulting PDF files may not make the best use of the high-level PDF imaging model relied upon to expose the semantics of the document. This is because the information embodied in the application's API calls or in the intermediate output file often describes the desired results at too low a level. Any higher-level information maintained by the original application has been lost and is not available to the printer driver or translator.

For example, since the printer driver or translator targets correct visual output, information about the semantics of the document may not be sent at all or may be ignored when creating the PDF output. As a result, headings may not be tagged as such, or link text may not be associated with its link object. Check with the vendor of any PDF authoring tool in order to understand how to use the tool in a way that produces the best tagged output.

PDF Authoring Tools that Provide Accessibility Support

Note: Care should be taken when choosing PDF creation tools from the many available, as some may not support creation of tagged PDF files.

Accessibility Checking and Repair

Adobe Acrobat Pro. Adobe Acrobat Pro is an application that creates and edits PDF files. It has a number of tools for evaluating and repairing the accessibility of PDF files, including access to the structure root through the tags panel, the ability to directly manipulate the reading order through the order panel, a built-in accessibility checker, and the Touch Up Reading Order tool which provides a graphical mechanism for assessing and repairing the accessibility of a PDF document.

Commonlook™ PDF. Commonlook PDF. Commonlook PDF is a plug-in for Adobe Acrobat Pro from Netcentric Technologies. CommonLook PDF helps identify, report and correct the most common accessibility problems, including the proper tagging of images, tables, forms and other non-textual objects.

PAC - the PDF Accessibility Checker. PAC is a free tool developed and distributed by the «Access for all» Foundation to evaluate the accessibility of PDF documents and PDF forms. PAC offers the added possibility of displaying a preview of the structured PDF document in a web browser. The PAC preview shows which tags are included in the PDF document and presents the accessible elements in the same way as they would be interpreted by assistive technologies (such as screen readers). PAC also provides an accessibility report which lists the detected accessibility errors. Clicking the links in the report displays the most probable source of the error within the document.

API Inspection Tools
  • aDesigner - a disability simulator from the Eclipse Foundation that helps designers ensure that content is accessible and usable by visually impaired users.

  • inspect32 - part of the Microsoft Windows Software Development Kit (SDK) that allows developers and testers to examine the accessible properties of UI components.

  • PDDOMView - part of Acrobat_Accessibility_9.1.zip which contains files that can be used by Windows clients of the accessibility interfaces described in the Accessibility API Reference document.

  • UISpy - part of the Microsoft Windows Software Development Kit (SDK) that allows developers and testers to view and interact with the user interface (UI) elements of an application.

User Agents

PDF User Agents with accessibility support include:

Adobe Reader and Acrobat Support for Accessibility APIs

Adobe provides methods to make the content of a PDF file available to assistive technology such as screen readers:

The DOM and MSAA models are related, and developers can use either or both. Acrobat issues notifications to accessibility clients about interesting events occurring in the PDF file window and responds to requests from such clients. Recent versions of Acrobat and Reader have enhanced the support for accessibility interfaces:

Assistive Technology Support

Related References