Skip to toolbar

Community & Business Groups

Charter PDF Open Data W3C Community Group

This Charter:
Previous Charter:
Start Date: 7 April 2017
Last Modified: 7 April 2017


The group will work to improve the ability of publishers of PDF documents to include Data in those documents in a way that consumers of data can easily extract the data with readily available tools.

Scope of Work

The group will focus on these tasks:

  • gather and document use cases and requirements for publishers of PDF who wish to support open linked data access, and for data users who wish to access data in PDFs produced by those publishers.
  • gather and produce test cases and example files.
  • explore capabilities available tools, libraries, services, software to accomplish these tasks.
  • Document findings and suggest best practices.

Specifically out of scope for now (to keep focus) are

  • Changes to PDF or XMP beyond PDF2 (ISO 32000-2:2017).
  • techniques for extracting structured data from image-only or unstructured PDFs (“OCR”)
  • Focus on any specific implementations.
  • Comparison or evaluation of other formats than PDF

These topics are interesting and useful topics, and can be  included by changing the charter, we think we can make good progress on  the narrower topic, and improve the situation for the substantial community whose needs can be  satisfied by the narrower scope.


The primary deliverable will be one or more Specifications, of ways of representing data within PDF files in ways that the data can be readily extracted or linked to, using the capabilities of current or earlier versions of PDF.

The group may produce other Community Group Reports within the scope of this charter but that are not Specifications, for instance use cases, requirements, or white papers.

Test Suites and Other Software

Test suites and example implementations (produced within the scope of the project) may be submitted as Contributions.

Specifications created in the Group must use the W3C Software and Document License. All other documents produced by the group should use that License where possible; exceptions must be stated clearly.

Contributions may come in many forms (eMail, Google Docs, GitHub) and should come with a license in the manner appropriate for that form.


Chairs are chosen from those who have volunteered, if there is general agreement. If there’s disagreement, we’ll follow the process in under “Chair Selection”.

Group Process

The group will not publish Specifications on topics other than those covered under Scope above.

Substantive Contributions to Specifications can only be made by Participants who have agreed to the W3C Community Contributor License Agreement (CLA).

Participants are encouraged to make all technical work and discussion public.

The Chairs moderate discussion, schedule meetings, set agendas, and ensure that the decision process is fair, respects the consensus of the CG, and does not unreasonably favor or discriminate against any group participant or their employer.

This charter may be amended if the chair(s) agree, and there is no objection from a group member within 7 days of notice given of a complete proposal. If there are significant objections but the chair(s) AND at least five  independent group members agree to the charter change, the objections are overruled.

Schedule, Meetings,  Process

. These are discussed in a separate document which can be changed without a  change of charter: