Page Source Identification

Final Community Group Report

This version:
https://www.w3.org/community/reports/publishingcg/CG-FINAL-page-source-id-20230314/
Latest published version:
https://www.w3.org/publishing/a11y/page-source-id/
Latest editor's draft:
https://w3c.github.io/publ-a11y/drafts/page-source-id/
Editor:
Matt Garrish (DAISY Consortium)
Feedback:
GitHub w3c/publ-a11y (pull requests, new issue, open issues)

Abstract

This proposal defines the pageBreakSource property to identify the source of page markers and the page list in EPUB publications.

Status of This Document

This specification was published by the Publishing Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Final Specification Agreement (FSA) other conditions apply. Learn more about W3C Community and Business Groups.

GitHub Issues are preferred for discussion of this specification.

1. Introduction

1.1 Background

Providing navigation to static page break markers is a key accessibility feature for digital publications that are used in both print and digital formats in the same environment (e.g., classrooms). But without a means of identifying what edition of a static work the page navigation corresponds to, it is impossible for users to determine if the publication will be sufficient for their needs. For example, if a class uses a softcover edition of a book and the EPUB publication pagination corresponds to the hardcover, digital users will not be able to access the same page break locations.

How to identify the source of pagination has been a continuing problem in the EPUB 3 metadata. The original idea was to use a dc:source element [epub-3] to specify the pagination. This method proved unreliable both for machine verification that the EPUB creator had set the method and to extract and present the information to users. Publishers sometimes specify multiple sources for their publications in multiple dc:source elements, making it impossible for a machine to determine which identifies the source.

To address this problem, the specification then introduced a "refinement" [epub-3] property called source-of [epub-3] whose only purpose was to indicate which dc:source property identified the source.

Since its addition, however, two additional problems have surfaced with this approach:

  1. It was defined in a way that makes it unique to EPUB 3's metadata format. When the refines attribute was nearly dropped in EPUB 3.1, it exposed that there was still no other way to express this information. Consequently, future formats cannot rely on the source-of property.
  2. Relying on dc:source confuses how to state that a digital-only publication does not have a source for its markers. EPUB creators have resorted to identifying the current publication as the source of itself or saying that the publication has a source of nothing for the pagination, neither of which makes much sense logically and are at best hacks of the metadata. Omitting a dc:source, while accurate in this situation, makes validation difficult as it cannot be determined whether the EPUB creator simply forgot to specify the source.

The pageBreakSource property proposed in this document is intended to provide a simple and reliable solution to these problems moving forward.

1.2 Terminology

This specification uses terminology defined in EPUB 3 [epub-3].

Note

Only the first instance of a term in a section links to its definition.

1.3 Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key word MUST in this document is to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. The pageBreakSource property

2.1 Definition

Name: pageBreakSource
Description:

Provides a unique identifier for the source of the page break markers in an EPUB publication.

The identifier should be expressed as a URN when the value conforms to a recognized scheme such as an ISBN.

If a unique identifier does not exist for the source, EPUB creators should use a text description that identifies the source as clearly as possible (e.g., the title of a word processing document).

If the page break markers are unique to the EPUB publication (e.g., for a digital-only edition), EPUB creators MUST specify the value "none".

Allowed value(s): xsd:string
Cardinality: Exactly one when the publication includes a page list and/or page break markers, otherwise 0.

2.2 Examples

A. References

A.1 Normative references

[epub-3]
EPUB 3. W3C. URL: https://www.w3.org/TR/epub/
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174