online & offline, original “canonicals” and individual copies
bounded package of media
a folder or archive, “internally complete”
in web-standard formats
HTML5, CSS, JS, images, video, audio...
accessible by standard Web protocols
and consumable by standard Web tools.
Web based user agents & apps—including browsers—based on them
This document describes the use cases that correspond to the requirements for a Portable Web Publication. It provides the basis for the technical considerations in the “Portable Web Publications for the Open Web Platform” document [pwp] companion document.
Status of This Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This is work in progress. The final version of this document planned to be published as an Interest Group Note in a few months. The current version is the first Public Working Draft.
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The Web emerged in 1994, based on a model of individual pages loosely joined by hyperlinks. Clustering within domains and with explicit navigation elements built into them, webpages evolved into websites. This model inherited very little from an existing, powerful and much older page-based media: books.
Over centuries, “books” have assumed many forms: journals, magazines, pamphlets of long-form articles and essays, newspapers, atlases, comics, notebooks, albums of all sorts. We can define these different manifestations as “publications”: bound editions of meaningful media, made public.
We believe there is great value in combining this older tradition of portable, bounded publications with the pervasive accessibility, addressability, and interconnectedness of the Open Web Platform (OWP). New models of economic sustainability, innovative experiences of knowledge and invigorated socio-cultural engagement depend on this.
It is the task of this W3C Digital Publishing Interest Group to explore the uniqueness, desirability, and feasibility of bringing these two great models of publishing together. This document explores requirements based on examples of real world use cases and scenarios. The fundamental, baseline requirements that form the heart of what is expected from a PWP are described first, followed by requirements and use cases that describe additional, strongly desired scenarios. The complete list of requirements is also collected in a separate table in
A.List of Requirements.
The terms “online” and “offline” are used in this document, but the borderline between these is not always clear cut. For example, a PWP can be on a local disc but accessed through a web server running on that machine (i.e., through a http://localhost URL). The behavior of the client in this case is identical to the situation when the publication is genuinely online, although, technically, the publication is clearly offline. Similarly, a remote file system can be mounted as a local disc, in which case a PWP can be accessed as a file though technically online. A more precise terminology would use the terms like “protocol” and “file” states, for what is colloquially called “online” and “offline” in this document.
2. PWP Fundamentals - Authors and Readers
2.1 Browser Readability
Req. 1: The publication should be readable in a browser.
Reading publications of any length should not be restricted to specific devices or applications; publications should be equally available in a browser.
Bob has some very specific plugins in his browser that he enjoys using for a variety of purposes, including creating annotations using his own annotation server, or quickly copy and publish extracts in a specialized social service. He wants to be able to read his textbook in this environment, rather than being forced in to a special application for the purpose of that specific publication
Alice reads a scholarly paper that refers to large datasets. The data can be presented interactively through some very specialized data visualization services relying on custom hardware on her computer. These visualization services are incorporated into her browser and can be used from within a web page. Alice wants to make use of these while reading her paper to gain a better understanding of the content.
2.2 Open Web Platform
Req. 2: PWPs should be able to make use of all facilities offered by the OWP.
There is a formidable development of visualization systems, interactive tools, and other powerful facilities that are built on top of the OWP, including accessing external services like Wolfram Alfa. These tools have been traditionally developed for browsers, and provide possibilities that traditional publications, such as books, magazines, scholarly papers, and educational materials, should also benefit from. That requires publications to become first class citizens on the web platforms.
A large, multidisciplinary, web-based journal relies on traditional web technologies like HTML and CSS for its content. The journal, responding to the evolving expectations of its audience, is increasingly using additional media such as video, audio, animated graphics, and very large images; the trend is to consider these as integral parts of the scientific output. The journal as a result needs access to the latest visualization and other data management tools that the OWP-based tools can offer.
Educational materials are increasingly making use of OWP facilities. They include runnable program snippets, interactive testing tools (linked to online evaluation facilities); in many respect, the borderline between these publications and web applications is becoming fuzzy.
BigBoxCo, a large technology company with extensive "in-house" documentation to support the technical and/or user documentation for their various products and administrative processes, develops all this material in digital-only formats. The quantity of documentation makes it impractical to produce these documents in print. Instead, the company publishes them on the company intranet, and/or provides them to their employees and contractors via specialized mobile devices. The production of these documents has similar challenges to scholarly publications including issues around accessibility, portability of annotations, and the possible inclusion of complex media.
Req. 3: It should be possible to see the publication in a “paginated” view.
Whereas a “scrolling” view is the dominating approach on the Web in Web browsers, publications must provide the possibility to switch to a paginated view if the user so desires or as the author suggests. Pagination may automatically adapt page sizes to the device’s or the browser’s viewport, and may contain separate headers, footers, and/or page numbers.
For more detailed requirements on pagination, see here.
Ann reads the War and Peace which, when printed, is over 1200 pages. In order to have a better sense of her progress in the book and to make navigation within the book easier (i.e., to support usability) she decides to switch her reading environment to paged view.
Susan uses CSS, images, flexboxes, and more to create a rich, interactive publication on the history of a city. Each major historical milestone is defined as a standalone unit that would be a single page when printed, with a timeline with the main events in the footer area of the page.
Fatima wants to choose between a scrolled view and a paginated view of content that extends across multiple HTML documents.
IndyPublisher wants to provide transition effects between pages, both within and across content documents.
Mr. Oayia, a classroom teacher, says, “turn to page 137 of your textbook.” Students with both print and digital editions need to find the same text.
2.4 Online and Offline
Req. 4: The same PWP should be available both online and offline.
The same content of the PWP should be accessible offline, if circumstances so dictate, without the necessity for the reader to take any particular, technical actions.
Omo, a student in a remote Nigerian village, is taking classes online. Connectivity to the village is unreliable and intermittent. Omo needs to have his textbooks available regardless of actual connectivity.
Public School 148 is part of a school system in a low- income area. While it has a computer lab for teachers and students to access material online, the school does not have the resources to update the equipment, either in terms of hardware or software. The publications being accessed must be available on older browsers.
Heather, a frequent international traveller, enjoys reading books and tour guides on her portable device, regardless of her physical location on any given day. Due to the high mobile network access roaming charges on her mobile network, she tends to download as much of her reading material as possible where she can avoid those additional charges.
Gemma, a private collector of digital publications, is building a private collections of publications that she expects to be available to her whether online or offline, over the public Internet, or within a private local area network (LAN).
“In house” documents may have to be accessed both online and offline, depending on the access point. While online access might be beneficial when done from the work floor (e.g., at an airplane production line), the same documents may need reliable offline access (e.g., in the cockpit).
Req. 5: There should be a smooth transition between offline and online states of the same publication.
Accessing a document online or offline should not require conversions from one storage format to the other. The transition should be as transparent as possible to the reader, requiring only a very minimal (ideally no) interaction from her part.
Gyongyi, selected as a peer reviewer for the Journal of Scholarly Publications, only has time to review her assigned publication while commuting on the train to her university. Her review process includes the creation of annotations, notes, highlights, and possibly changes on the content itself. These annotations and changes must be smoothly transferred back to the server of the journal when she is back online.
Klaas is reading an in-depth biography of Catherine the Great. As this is a lengthy document, it takes Klaas several weeks to finish reading the material. During this time, Klaas frequently alternates between reading on his portable device, which may or may be online, and reading on his desktop. Ensuring a smooth transition also means that bookmarks, current reading position, personal notes, and more, are stored without the need of a central server that may lead to privacy concerns.
2.6 Distribution of a Single Resource Unit
Req. 6: It should be possible to create and distribute a PWP as a uniquely identified single resource unit.
A PWP, no matter how many pieces, must be distributable to readers or consumers as a single unit for distribution so that users can consume the necessary content that is identified by the PWP.
HA, Ltd, a publisher of legal briefs, needs to distribute content in a consumable format to its clients.
Dalia, a patent lawyer, wants to consume content on a multitude of devices, some of which may not always have connectivity. Having all required content grouped in a logical structure that can be easily transferred will meet the her expectations.
Andreas is working on his first collaborative research paper with a fellow student. He wants to share a relevant publication that includes content, diagrams, and data sets with his writing partner. He does not have time to learn how to share each component so that his partner can access it all without much effort; he expects to be able to share this material as a single unit.
Req. 7: A publication may consist of a collection of resources.
Acme Publishing Company works with multiple authors to create an anthology and uses resources from different locations on the web. Following the current practice on the web, the publication consists of many different resources, including HTML, SVG, and CSS. Some of the HTML contents are formatted by different CSS files. The publisher needs the collection of all the resources as a unit to include into its business workflow.
Acme Publishing has published a book on wines that can be read from A-Z, or personalized to only read about red wines or wines from a specific region. Each wine may be a resource or a small chunk of data that is used to generate the corresponding HTML content on the fly.
Elin has access to materials only through an old computer in her local library. While she has time to read the entire copy of War and Peace, the system is unable to display the entire resource as one huge HTML file.
Motonori is reading a Japanese history book. This book includes components where the root element is vertical-rl and others where the root element is horizontal-tb. These root element styles must be preserved.
2.8 Multimedia PWPs
Req. 8: The notion of a PWP should enable specific publications like audio books, graphics books, and mixed media.
All concepts and structures related to a PWP should enable the creation and/or production of video or audio rendering; all the audio, video, and graphics content must be treated with the same attention as all other content.
Faye, a busy mother of five, wants to access audio books while commuting, jogging, doing dishes, or otherwise not able to use her eyes or hands.
Khoudia, a librarian focusing on the children's section of her local library, is looking exclusively for material rich in audio and video components so as to reach a wider age bracket.
BigBox Publisher must fulfill its legal requirements in creating an accessible Math textbook. The inclusion of math equation examples in both Braille as well as MathML is desired since not all reading systems fully support MathML. Nemeth Braille is primarily used and has therefore been added as a separate preformatted Nemeth Braille examples to my publication in addition to the MathML.
James a struggling blind artist requires that the musical score within this publication be rendered in braille music. James uses freely available assistive technology which does not have braille music translations built in and would need this already converted into braille music notation.
Lisa is a visually impaired child in a class of sighted peers, who are reading a book in which poems are presented in form of visuals of different shapes like a Christmas tree to provide the context to the children. She needs to access the replica of the book in preformatted braille with equivalent precision, to acquire the knowledge in the same way as her sighted classmates.
Req. 9: The reader must have the possibility to personalize his or her reading experience. This may include, for example, controlling such features as font size, choice of fonts, background and foreground color, tone of voice, etc. This should be done via a proper interactive dialogue and/or a choice among pre-defined possibilities.
Olga, a dyslexic student, downloads a textbook and proceeds to personalize the material with larger font and different contrast.
When reading a book in the sun, Fanta adjusts the background color to a dark color so that she can see the text.
While reading a book on computer programming, Ransheed wants to change the font into a local font. However, the code samples within the text should remain in a fixed-width font.
Mia choses to read in night mode because she finds it easier to sleep when she puts the PWP aside.
Req. 11: There should be a way to control versioning and revisioning.
There should be the capability for providing revisions or new versions to users. The online and offline version should be able to be in sync.
The publisher may have pushed a new version of a publication while Alice was accessing the offline version. When getting back on line, Alice has the option of automatically synchronizing her offline version with the update.
Req. 12: There should be a way to differentiate between essential and non-essential resources.
Preserving essential content is the job of the reading system. However, having a clear indication within the publication format to mark which items are critical, and which instead need a fallback for limited connectivity/storage situations, would provide critical input to the reading system to do it's job. This would also give more control to the publisher, allowing them to ensure a consistent user experience while consuming the publication. When changing the state of a PWP from, e.g., online to offline, an implementation knows which PWP resources are essential for the display of the content, and therefore must be included in the offline version, or which may be skipped (see 2.4Online and Offline and 2.5State Transitions)
Non-essential content, which is not required to be available in certain states, should have a predefined fallback that will allow the user to continue consumption (even in a potentially degraded, but author-controlled manner) (see also 4.States of a PWP).
Nick is reading a long-form narrative: a publication filled with text, images, sounds, and multimedia files. Nick is also a multi-device user who wishes to consume the publication on multiple devices. Some of those devices have limited storage, and some of them have limited connectivity. Nick also rides the subway, where he loses Internet connectivity frequently and without warning for long stretches of time. During offline or low-storage situations, there are still critical parts of the publication that are consumable, mainly the text (and possibly images). Having a reasonable fallback for video, such as a poster image or placeholder image, would allow Nick to read the content while offline or on a device with limited storage.
Nick knows he is going to be in a no-connectivity situation and may want to locally store the entire (even non-essential) contents of his textbooks. He will save space by only storing essential content for his graphic novels. It is up to the reading system to support different behaviors for essential versus non-essential content, ensuring that an entire package only downloaded when necessary.
Gösta is reading a treatise on the theory of functions. A mathematical font is essential for the proper display of the mathematical formula in this publication. It is essential to add that font to the set of resources that are made available offline.
A font used for aesthetic purposes only in display of a book may not be essential and an implementation may decide not to embed it for offline use.
Mary is unable to view images; she requires extended image descriptions to make sense of the material. These descriptions are essential for her, whereas they might not be for Nick who has indicated the images are, to him, nonessential.
While reading an article on a new spam analysis algorithm, Lars is primarily interested in the findings of the research. Since the research was funded by a government agency, the dataset, consisting of millions of anonymized log files, is also available. Lars does not consider the dataset to be essential for conveying the results of the paper and therefore indicates it should be skipped when he reads the publication offline.
2.13 Access Control and Write Protections
Req. 13: A PWP should allow for access control and write protections of the resource.
A library may loan the publication for two weeks; the PWP should provide means for its environment to control the period of the loan.
A university may make a textbook available for its students for the course of the year.
Alice is working on potentially Nobel prize winning research, and has drafted her paper describing her discoveries. She asks Bob to review the paper, but needs to make sure that the PWP retains specific protections, regardless of whether it is read online or offline.
3. PWP Fundamentals - Implementation Requirements
3.1 Horizontal Dependencies
Req. 14: The publication should conform to all the requirements of horizontal dependencies.
Web content has to be consumed under different circumstances: it must be available to the largest possible audience in a secure manner, providing the necessary protection of the reader’s privacy. Publication content must be able to answer to a number of principles like accessibility, internationalization, device independence, security, or privacy. (These are usually referred to, in the W3C context, as “horizontal” dependencies.) These principles are, in general terms:
People with disabilities should be able to access the content; they should be able to perceive, understand, navigate, and interact with it, as well as contribute to it. Accessibility encompasses all disabilities that affect access to the content, including visual, auditory, physical, speech, cognitive, and neurological disabilities.
Publications should be well adapted to any language, writing systems, region, or culture. This includes the usage, when appropriate, of left-to-right, right-to-left, horizontal or vertical writing; item numbering, or interactive forms specific to local cultures; usage of the right character sets and of local typographic conventions.
Content should be usable on a large number of devices with very different device characteristics: different screen types and sizes, various input modalities, varying level of processing power, etc. These different affordances should be automatic with no, or very little user intervention.
Publications should prevent malicious attacks, data theft, and other security incidents, even if they contain secure mashups, access to distributed services, etc. Consuming the content should not jeopardize the integrity of the underlying data, users’ privacy, or machine operations.
Content should maintain and support user privacy, in spite of the fact that the evolution of online technologies has increased the possibility for the collection and processing of personal, and possibly sensitive, data, as well as the ability to ubiquitously track a user's online activity.
These principles correspond to technical requirements on the underlying technologies (i.e., OWP, and its possible extension for PWP) insofar as the technologies must empower the authors (writers, editors, publishers, etc.) to produce content that follow them. Whether authors use the possibilities of these technologies or not is not addressed in this document.
All these constraints are usually formalized in the context of the usage on the Web, but they are also valid for publications in general regardless of whether they are online or offline, or whether the publication is distributed as a single unit or not. In some cases, for example due to legislative reasons, the demands on digital publications may be more stringent than for generic web sites. The use cases below provide some examples for the publication-specific situations.
(On Accessibility) Asmait uses a dedicated braille e-reader capable of using standard OWP technologies, that can render any WCAG-compliant content. She needs to be able to read her newspaper and contribute to the interactive discussion components on her dedicated braille reader, not just a standard browser and computer combination.
(On Accessibility) Legal Publishing Ltd. publishes all the official texts, as issued by the government of its country. Per local legislation, the publication must be accessible, following W3C’s WCAG Level AA requirements, to serve as official references in courts.
(On Privacy, Accessibility) EducationPublishing Ltd. publishes digital textbooks to cover BigUniversity’s curriculae. These (digital) educational publications also include access to interactive tests via specialized services on the web that regularly access the student’s progress. The privacy and the integrity of the student’s test data must be preserved (whether they are embedded in the publication or online). This, and the fact that digital textbooks must also abide to WCAG Level AA requirement in terms of accessibility, are such that EducationPublishing is liable in case they are not fulfilled.
(On Internationalization) PublicationInternational SA. publishes literary work all over the world and in many languages. In order to continue its business in Japan, it must be able to produce digital publications with right-to-left and vertical writing, and following the Japanese typesetting traditions, because that is the only way those publications are accepted by local customers.
(On Device Independence) Vera has several devices she uses to read her books, depending on where she is. She uses her table with a larger screen when sitting in an armchair at home, her phone when commuting to her work by train, and her laptop when adding more complex annotations to the book. The book is authored in a way that, while maintaining the integrity of data like bookmarks and annotations, the book is automatically adapted to a different screen size, or to the fact that her phone does not have a keyboard.
(On Security) BigAirline uses digital versions of the airline manuals. This means that the manuals are put “online” to keep them up-to-date, and the pilots have access to the latest version even when they are flying, i.e., they are off line. The way the manuals are authors and managed ensures that the integrity of the manuals and the relevant data is maintained at all times, to secure that the pilots operate based on correct information only.
3.2 Single Unit
Req. 15: User agents must treat a PWP, regardless of the number of components, as a single unit as opposed to individual documents.
User agent must be able to search a complete PWP as one unit.
User agent must be able to give value counters (page counters, section numbering, footnotes, endnotes) across the PWP (as opposed to individual components being numbered separately)
User preferences apply to complete PWP (users must be able to adjust display, e.g., font selection, font-size adjustment, background color)
Assistive Technology such as screen readers or voice dictation control needs to access a complete PWP.
A user should be able to access all linear content through simple UI gestures, independent of the document structure.
3.3 Constituent Resources
Req. 16: The information regarding the constituent resources of a PWP must be easily discovered.
A PWP will likely be composed of multiple web documents. A more complicated PWP may have many more components, meaning that extracting in advance all the references to other constituent resources may be prohibitive. It is therefore necessary for the reading system to have an easy access to the list of constituent resources, and some of their characteristics like their media types or sizes.
The system needs to identify which components must be downloaded to a local user’s device to support offline reading
The system needs to preload some document components in order to provide a more responsive reading experience.
When creating a packaged publication it must be clear which documents should or should not be added to the distribution package
The size of the components must be known in advance.
The reading systems needs to know if a publication contains/requires support for a specified media type (without processing the complete PWP).
3.4 Default Reading Order
Req. 17: Find the (default) reading order of the resources of a PWP easily.
A user agent needs to know the sequence in which to present components of a PWP to the user. A PWP will likely be composed of multiple web documents. A typical simple PWP will anywhere from one to fifteen HTML documents and several image files, in one location or many. A more complicated PWP may have many more components, meaning that extracting the exact order from within the resources (i.e., parsing them in advance to extract the information) may be prohibitive. It is therefore necessary for the reading system to have an easy access to the reading order constituent resources. In particular, the user agent should also have the information on what the starting point of the publication rendering is.
Moby Dick contains 136 chapters. Each chapter is a separate HTML document. The user agent must present the chapters sequentially, starting by the first chapter.
The Encyclopedia of Stuff includes 1348 articles, each one in a unique HTML document. The user agent must present the articles in order (alphabetically, by title).
3.5 Uniquely Identifying a PWP
Req. 18: There should be a way to uniquely identify a publication regardless of its state.
A unique identification of a specific publication, regardless of whether it is online or offline, or whether it is part of a web site or a single (packaged) file is essential. This unique identification should be mapped onto the “real” location of the publication smoothly, without requiring the author’s interaction.
Digital publications make use of interactive cross references within the publication (references to indices, images, etc). These references should be done once by the author and should work regardless of whether the publication is online or offline, for example.
Scholarly references demand a unique identification of the publication and, possibly, its internal structure. That unique identification must be available as a web link, to make it possible for other publications and other sites (e.g., the authors’ institutional sites) to unambiguously link to the publication. These features are essential in the scholarly community to make, for example, the assessment of individual researchers possible.
4. States of a PWP
During the consumption of a publication, a user may change the “state” of their PWP. The states of a PWP reflects whether the document is online or offline, or whether it is packed (i.e., all constituents are packaged, for example, in a ZIP file) or not. These different states require a different behavior from the user agent, while some of the characteristics of the publication may be invariant across states. The table below shows the same publication (PWP) in the most typical states:
A user starts reading on an Internet-enabled PC, move to an Internet-enabled portable device, which may enter offline-mode at some point, and then finish up on another PC.
See also Locating the Same Publication Across States, and State-Independent and State-Dependent Locators
4.1 Dynamic Content
Req. 19: The PWP needs to have an explicit “offline mode” alternative.
It is possible that certain items will be dynamic, requiring the use of an external resource (or server) to provide the data. In offline mode, the user may want to be alerted that content could not be obtained, or be shown some fallback set of data. In this case, being able to specify an explicit “no-connectivity” or "offline-mode" alternative would allow the publication author to have more control over the user's experience and replace a potential error-display with a limited subset of a good experience.
Nick is reading a publication about stock exchange. The current state of the stock is being fetched when Nick opens the publication. However, when Nick is on the train one week later and opens the stock exchange book, he sees the state of the stock exchange as it was the last time he opened his publication. A small alert is shown denoting that this data is a week old.
5. Distribution, Sharing and External Resources
5.1 Distribution and Versioning
Req. 20: The distribution of a PWP should not affect its versioning.
Simply distributing or sharing a PWP to multiple destinations and devices should not result in multiple versions of the PWP. Those items should not be different versions of the source PWP unless they contain modifications that make them different PWPs.
Publisher Corp. Inc. publishes a new PWP and sends it to its distributors and customers. This PWP is downloaded to devices, or synced across several devices, or made available to a customer-specific cloud. Customers can access this file from different retailers, through different applications, either directly or downloaded from a private cloud. Thus, the PWP is duplicated many times, resulting in a huge number of items. There is one source manifestation, therefore one canonical identifier for a large number of items spread across devices and buyers.
Mary creates a PWP and sends it to Dave and Kristin, who each send it to two friends, who then send it on to their friends. Ultimately there are many copies of Mary's PWP but they are not different versions unless they have been altered; one shall say that they are all “the same” PWP.
5.2 Distribution Process
Req. 21: The distribution of PWPs should conform to the standard processes and expectations of commercial publishing channels.
Ahmed acquires a PWP on an e-commerce platform. He expects to be able to receive the PWP as a file (rather than only having access to it) and to be able to load it onto its different reading devices.
Bridget buys (rather than subscribes to) a PWP. She expects to have permanent access to that PWP and to be able to make reasonable use of it.
Alice acquires a PWP through a subscription service and downloads it. When, later on, she decides to unsubscribe from the service, this PWP becomes unavailable to her.
Bill acquires a PWP through a rental service. After a certain period, this PWP becomes unavailable to him.
Slicendice Publishing publishes many PWPs, some of which are different versions or subsets or combinations of others. Slicendice needs not only to be able to uniquely identify each unique PWP but also to identify each "copy" or "delivery" ("item") of each of those PWPs so that it can track "what" has been sold and "how many" of each one have been sold. (See also "There should be a way to uniquely identify a publication regardless of its state")
Cobble-It Publishing creates PWPs from a mix of internal content and content licensed from other parties. Using PWP metadata, Cobble-It is able to keep track of the rights associated with the different components so that it can realize income from its own IP and appropriately compensate the parties whose IP it has licensed.
Req. 22: PWPs should support cross-references that can be resolved locally or externally.
A user should have an option to access their local copy of a PWP when there is a choice between a local copy and an external source.
Writer Annie writes a dissertation. She references her Master's thesis, which is published on the university website. Her colleague Bob has read her Master's thesis before and has annotated his copy. When he clicks the reference in Annie's dissertation, he gets redirected to his local copy of Annie's Master's thesis. Her friend, Charlie, hasn't read her Master's thesis before. Charlie needs to be online when clicking the reference in order to access Annie's Master's thesis on the university website.
5.4 Sharing External Resources
Req. 23: Several PWPs may share external resources.
In order to make serial publications lighter and speed up processing, a PWP should support the injection of external resources.
EsteemedPublisher creates a software application to distribute daily journals to readers. These journals share script libraries, CSS files, images and other resources, that are downloaded periodically by the reading application and cached.
5.5 External Data
Req. 24: PWPs should be able to access external data.
This is related to Req. 12: There should be a way to differentiate between essential and non-essential resources. That requirement states that essential resources must be included in the offline version; non-essential resources may be either included, too, or accessed online when possible. It is up to the packaging software to decide which resources will be included, and which will not. This requirement adds the possibility to specify that some data must stay external to the packaged publication.
Rosa has submitted an article to EsteemedJournal and provided her research data in CSV format. She and EsteemedJournal wish to provide users access to the CSVs when accessing her article, but they decide to inactivate the inclusion of the research data in the packaged publication. Albert reads Rosa’s article from a packaged version. When online, he can access the research data, but when offline, he cannot access it and the reading system fails gracefully.
In this document, 'manifest' refers to an abstract place, typically one or several files, that contain information necessary to the proper management, rendering, and so on, of the publication. This is opposed to metadata that contain information on the content of the publication like author, publication date, and so on.
Some fundamental use cases and requirements already imply the usage of manifests. For example:
Req. 25: Manifests should include the technical and descriptive metadata, and basic characteristics of the constituent resources.
A user agent requires information about the package and its components in order to process it. For example, performance and memory requirements may prevent a user agent from parsing a large number of content documents in order to discover the necessary components and their relationships. A user agent may need to make some decisions about how to present content before displaying it.
Some necessary features are listed among the fundamental features of a PWP (see 3.1Horizontal Dependencies); the use cases below provide a (non-exhaustive) list of further information.
A user agent may not be able to process files beyond a certain size due to bandwidth, storage or memory limitations, or processing time expectations; this information should be available in advance.
A user agent should be able to process permissions and rights information for each component identified in the package. For example, it needs to know whether it has the rights to place an image of the Mona Lisa owned by the Louvre in a package to be downloaded offline.
A user agent should be able to determine if captions are available for multimedia resources within the publication.
For further processing the user agent needs to have access to additional information including:
the accessibility of each component (see also "Accessible Metadata");
whether it needs additional processing instructions, such as with MathML;
the title, author(s), cover image, etc., to display the publication on a shelf without downloading all its content see also "Discoverability", "Accessible Metadata");
whether the content is unaltered (see also "Security");
the origin of the publication.
6.2 Streamlining a PWP
Req. 26: Manifest should make it possible to provide a streamlined access to disjoint parts of the publication.
It should be possible for the author to convey several potential reading orders that may go beyond the “default” for the content of the publication. This alternative reading order may only includes specific parts of the publication rather than the full content of the PWP
EsteemedJournalPublisher would like to offer the users of the EsteemedJournal of Chemistry App the opportunity to read only the abstracts of the journals in the app. The App Package must offer the user a list (table of contents) of abstracts (disjoint objects in the package with semantic information or metadata informing the package of the nature of the object).
The publisher wants to provide “teasers” for a book by providing a series of extracts that are meant to give an overview of the book without the necessity to read the whole publication. This can be typically used by a reseller allowing for a prospective client to access part of the publication free of charge.
6.3 Change Notices
Req. 27: Manifest should include information of new content.
The manifest should include information that makes it possible for a user agent to find out whether a specific content has changed since a last access or not. This may or may not directly reflect versioning, as referred to in the requirement on versioning), insofar as the granularity may be different.
Shoshana is an organic chemist. She has purchased the Esteemed Journal of Chemistry App. She downloads Organic Chem Quarterly in her lab and reads the first article over lunch. Shoshana begins the book reviews during office hours but must tend to her students' questions, so she closes the app. Shoshana opens the app on the train ride home to resume reading the book reviews. She is happy to find that the app opens to the exact location and opens quickly because most of the material does not need to be downloaded a second time.
Req. 28: Manifest should include means to use links to resources, regardless of location.
The fundamental requirement on identification already states that there should be a way to uniquely identify a publication (see 3.5Uniquely Identifying a PWP). This requirement should be extended to resources within a publication, such as a chapter, an image, or a mathematical formula. To achieve these, the manifest should contain the necessary information to make the mapping of URIs possible.
A scholarly mathematical publication refers to a specific theorem in another publication, and that reference must be unique, stable and retrievable on the Web. That reference should not depend on whether the publishes issues a new version of the target publication (thereby possibly change the section numbering) or whether the publication is available now for offline use.
An archival service wants to harvest (spider) a PWP, and expects to find in the manifest what it needs make sure it gets all the pieces of the PWP that must be archived, even if on separate servers.
6.5 Alternative Reading Orders
Req. 29: The manifest may include alternative reading orders.
One of the fundamental requirements is that the default reading order should be made available to the user agent (see 3.4Default Reading Order). In addition, the manifest should also provide means to define alternative reading orders. It is up to the user agent how this alternative reading order is conveyed to the reader (via some suitable user interface techniques).
EducationalPublisher publishes a complex textbook. The textbook is created is such a way that it could be used both for beginner and advanced level. The default reading order corresponds to beginners, but the goal is that advanced students can follow a different path through the material, corresponding to their level of knowledge. EducationPublisher adds, therefore, alternative reading orders to the publication that advanced users can follow
6.6 Multiple Access Options
Req. 30: The access methods for retrieving a manifest should allow for significant flexibility.
A manifest may be a file in some predefined format, such as XML or JSON. There should be several ways to get hold of that manifest--sourcing the file from a well known location, using an HTML link element, etc. A PWP should provide a high level of flexibility, including possible alternatives on how that manifest should be found.
BigRetailer receives a publication from EsteemedPublisher that it intends to add to its catalogue. BigRetailer wants to change or extend some general information (typically contained in the manifest) regarding the publication, like the local bibliographic metadata, or its own “teaser” in terms an alternative reading order (see 6.5Alternative Reading Orders). To achieve that, BigRetailer provides its own version of the publication manifest, and makes it sure that the user agent takes that one as the publication’s manifest (e.g., by providing a reference to the alternative manifest via an HTTP LINK header).
6.7 Combining Manifests
Req. 31: There should be a possibility to combine manifests from several origins.
The default approach for a user agent is to get hold of the manifest (file) as one unit via some flexible means (see 6.6Multiple Access Options). However, an even more flexible means is to provide possibilities to provide several manifests file that the user agent would combine, following some rules, to yield the final, overall manifest for the publication.
The Interest Group recognizes that the proper definition/implementation of this requirement may lead to major technical complications and therefore may not be fulfilled.
The simple solution for the use
case involving BigRetailer requires the creation of a complete manifest, involving duplicating the information provided the EsteemedPublisher. This is obviously error prone if the publication’s manifest is large. If it was possible to combine manifests, BigRetailer can only provide its own additional (and possibly replacement) information and let the user agent combine the manifests of BigRetailer and of EsteemedPublisher to yield the final manifest.
7.1 State-Independent and State-Dependent Locators
Req. 32: A common, state-independent locator needs to exist. This is necessary to connect the same publication across states. This state-independent locator (also known as the “canonical” locator [rfc6596]) should be part of the PWP.
Req. 33: There must also be a separation between state-independent and state-dependent locators. It must be possible (and necessary) to use, for all cross-references, the canonical locator.
Annie sends Bob the state-independent locator to the second chapter of ‘Moby Dick’ while reading it online. Bob opens that state-independent locator while he is offline in his e-reader with his copy of ‘Moby Dick’ loaded. The e-reader can direct him to the correct chapter, with no additional online connection needed.
7.2 Relative Locators
Req. 34: It should be possible to use, in all circumstances, a relative locator to refer to content within a PWP. Relative locators are abundant on the Web; state-independent locators should not break this mechanism. A PWP processor must be able to combine a relative locator with the canonical as well as state dependent locators of a PWP.
Nick creates his own short story. Based on choices of the reader, the story is different (e.g.: “If you want to go into the rabbit hole, see chapter 3, otherwise, see chapter 4”). He publishes this PWP online with relative links between the different chapters. When Annie downloads this publication as a package, all the relative links still work as expected, regardless of the state the publication is in when Annie reads it.
7.3 Location Across States
Req. 35: When providing a pointer to any or all of a publication, this should be robust across states. Locating a resource within a PWP should not depend on the PWP's state.
Some use cases, documented separately [dpub-annotation-uc] for the purpose of annotations, imply this issue:
Writer Annie has her book published on her own web space as a PWP. Reader Bob opens it online using a dedicated PWP reader, and highlights a nice quote. When Bob reads the same book later on, the highlight is retrieved, whether the PWP is opened using the same PWP reader, opened in a browser, downloaded locally, etc.
Annie sends Bob the state-independent locator to the second chapter of ‘Moby Dick’ while reading it online. Bob can use that state-independent locator for his own copy of ‘Moby Dick’, even though it is loaded from a local package.
Highlights are, in this case, Web Annotations [annotation-model], i.e., stored online with links to the highlighted text.
Req. 36: Identifiers must be persistent and usable across states, and not conflict with locators.
Identification of a publication is orthogonal to the issue of locators. There are no specific requirements on identification of PWPs in this document in general; however, any such identifier must be usable across states and must not conflict with locators. Identifiers need to be persistent across PWP instances.
Zoltán publishes a scholarly article in PWP format. The publisher assigns a DOI to the publication, which becomes its unique identifier. Zoltán’s colleague, Sean, downloads a copy of the article to his own user space; the copy receives a new locator corresponding to Sean’s Web space. However, the paper’s DOI remains unchanged, and will be used as an unambiguous reference.
We take for granted the relative durability of print artifacts, many of which have survived with little more than benign neglect. In contrast, digital documents are unlikely to persist without more active interventions, such as making copies, monitoring software dependencies, and validating integrity. Since future consumers of publications represent the most open-ended user group, it is desirable that digital documents be instilled with more of the inherent durability that characterizes print artifacts. The PWP offers this potential, by making it easier for archiving services to locate, harvest, update, and describe digital publications. Long-term preservation of digital publications ensures that they may continue to be accessible, beyond the tenure of individual authors, file formats, publishers, or publishing platforms.
8.1 Archival Discovery
Req. 37: The locations of all PWP components should be discoverable.
An archiving service needs a reliable way to learn where all of the components that constitute a PWP are located in order to be able to archive it. Without such a mechanism, the archiving service will have to develop and maintain publisher- and/or platform-specific heuristics for packaging collections of interlinked resources into discrete publications, making archiving more expensive and error-prone.
An archiving service sets out to conduct an initial harvest of a book. They are able to locate all of the images, markup, scripts, and style and layout instructions that constitute the object. The archiving service retrieves these Resources and package them into a logical archival unit for ingest into a preservation repository.
See also Discovery
8.2 Newly Published Versions
Req. 38: There should be a way to discover that a new version of one or more PWP components have been published.
In order to be able to archive a PWP, an archiving service needs a reliable way to learn that a new version of one or more PWP components have been published at the same locations as previously published. Without such a mechanism, the archiving service will need to periodically re-download and re-checksum all PWP components to determine whether any updates have transpired, unnecessarily increasing the load on archiving service and publisher servers and delaying updated PWP components from being archived.
An archiving service regularly polls for changes to an article that it has already archived. One such poll indicates that several resources that constitute the object have changed. The archiving service retrieves these resources and stores them as incremental updates to the appropriate archival unit in a preservation repository.
8.3 New Components
Req. 39: There should be a way to discover that one or more new components have been added to a PWP.
An archiving service needs a reliable way to learn that one or more PWP components have been added to a PWP in order to be able to archive them. Without such a mechanism, there is a possibility that the archiving service will not know that the new component belongs to the PWP, because the publisher- and/or platform-specific heuristics have not been updated.
An archiving service regularly polls for changes to an article that it has already archived. One such poll indicates that several resources have been added to the object. The archiving service retrieves these resources and store them as incremental updates to the appropriate archival unit in a preservation repository.
8.4 Deleted Components
Req. 40: There should be a way to discover that one or more PWP components have been removed from a PWP.
An archiving service needs a reliable way to learn that one or more PWP components have been removed from a PWP in order to be able to propagate this change to the archive. Without such a mechanism, it is possible that the archiving service will mistakenly make PWP components accessible that should not be.
A publisher issues a retraction for a published article, resulting in the addition of new resources to the object (i.e., the retraction notice) and the removal of others (i.e., the article content). An archiving service regularly polls for changes to this article, which it has already archived, and discovers the retraction. The archiving service retrieves the new resources and record those that are no longer accessible, carrying over the cumulative updates to a preservation repository.
A copyright dispute results in the takedown of a published book. An archiving service regularly polls for changes to this book, which it has already archived, and discovers that it has been taken down. It records that the resources that constitute the object are no longer accessible and propagates this update to a preservation repository.
8.5 Embedded Metadata
Req. 41: There should be a way to indicate whether one or more PWP components contain structured descriptive metadata.
An archiving service needs a reliable way to determine which, if any, PWP components contain structured descriptive metadata. Without such a mechanism, the archiving service will have to develop and maintain publisher- and/or platform-specific heuristics for locating or parsing out descriptive metadata, making archiving more expensive and decreasing the reliability of reporting.
An archiving service sets out to conduct an initial harvest of an article. Along with the images, markup, scripts, and style and layout instructions that constitute the object, it is able to locate a file containing descriptive metadata. The archiving service retrieves these resources and packages them into a logical archival unit for ingest into a preservation repository. A related process identifies and parses the descriptive metadata structure and saves its contents into an associated management database.
Some fundamental use cases and requirements already imply the usage of accessibility. For example:
Req. 42: Accessibility of a PWP must be discoverable. This ensures that users know how (or whether) a PWP or any of its parts is accessible. Granular accessibility of a PWP must be discoverable so that users know how (or whether) a specific chapter or element within a PWP is accessible. See also 6.1Manifests, Metadata, and Resources.
Ferdous wants to buy a book about a museum exhibit and wants to guarantee before he does that his screen reader will describe the images to him in great detail.
Alejandra must ensure that the equation editors in her textbook have non-mouse functionality before she purchases the textbook.
A university professor is developing a course and the professor knows that he is required to use accessible digital materials. The professor uses the search capabilities of available publications to determine which titles are accessible and therefore suitable for his use.
Anita is a school student who knows only uncontracted Hindi Braille. Her parents should be able to discover the language and the braille codes of the publications before purchasing the publications for her.
Jessie is taking a class and the professor has assigned Chapter 5 in an art textbook for review, but before Jessie purchases the book she wants to ensure that Chapter 5 is accessible.
9.2 Adding Alternative Media
Req. 43: A PWP must support the ability to include multiple renditions of a publication. Within their publication, in addition to the print rendition, a publisher may include a fully narrated rendition, or a video with described audio and captioning.
BigBox Publisher needs to add content such as a braille style sheet, image descriptions, or video captioning (text/descriptive audio) to a PWP previously published by a third party.
Frank has dyslexia and finds reading printed books a very frustrating experience. Having a book read to him with synchronized highlighting of the text not only allows him to comprehend the book but helps him become a better reader.
Julia has ADHD and is frequently distracted when trying to read. Her teachers' determined she is a visual learner and scores better when the material is presented in video format.
Jacob has poor depth perception and is more of a tactile learner. As such, he has a difficult time conceptualizing three dimensional objects. Having an embedded three dimensional image available to be printed alongside the visual image will greatly improve his understanding of the object being presented.
9.3 Building a Custom PWP
Req. 44: A PWP needs to support the ability to construct a limited package with only a subset of the necessary content.
Buffy is Deafblind. Every morning she downloads her daily newspaper. Like most news sites, it provides many rich multimedia presentations. As a high-quality, accessible news site, it's multimedia presentations come with captions and transcripts. Buffy does not want to waste her data plan on the useless-to-her audio and video content. She'd like to build a PWP which contains the captions and transcripts, but not the data-heavy videos.
Smitha has limited hand mobility, and controls her e-reader by voice. She subscribes to her monthly comics online. The comics contain interactive dialogue sections where, instead of advancing pages the a more conventional method, readers click-and-drag the panels around the page. The fallback method to make this accessible is the more conventional page-turning method. Smitha wants to download her monthly comics in the smaller fallback format, rather than download the to-her unusable packages which she is then required to individually reconfigure.
Req. 45: A PWP needs to support both time-based media and text.
A PWP needs to support time-based media, such as synchronized video, audio, captions or transcript, or sign language interpretation. A PWP must enable a synchronized media experience while navigating through the book, with sufficient level of granularity.
Illyés has a cognitive disability and uses accommodated texts in the classroom, to help learn the content while improving his reading. His assistive technology uses combined audio and text, turning the page for him while reading along in sync with the page currently open.
Bogi has auditory processing disorder. He is reading an annotated screenplay showing the author's original notes, along with synchronized video clips. Bogi needs to be able to view the captions on the video clips, even though they are associated with snippets of the screenplay.
9.5 Accessible Annotations
Req. 46: When annotations are distributed and associated with a PWP, the content of the annotation must be compatible with assistive technology.
A teacher annotates a PWP used in the classroom with specific text items the students should focus upon. Once distributed to the students and associated with the PWP, all students must be able to read the annotations.
A person wants to annotate in the margin of a PWP and share it with others. A digital pen is used for the annotation. The hand writing is captured as an image to be displayed back to the writer.
10.1 Limiting a PWP’s features
Req. 47: User agents must be allowed to limit the capabilities of a PWP.
The compatibility and interoperability requirements of the specification must not prevent user agents from taking measures to protect the security, safety, or privacy of the user. Security-conscious systems that interlink an unusually large number of important services and have an unusually large attack surface (such as web browsers or web services) have stricter security requirements than standalone apps that are siloed from the web. They must be allowed to continue to fulfill their pre-existing security requirements as they implement support for the PWP format.
Alice opens a malicious PWP in her browser. The malicious PWP tries to exploit a flaw in a social media service to take over her account but blocked by her browser’s security measures.
Joe, a student, has completed his classroom assignment which was to use PWP annotations to complete the text’s quizzes, and uploads the document to the school’s classroom management web service. The teacher can safely read the PWP on the web because the service has automatically stripped the document of all untrusted scripts.
10.2 Capabilities Discovery
Req. 48: It should be possible to discover the capabilities a PWP will have access to. A document’s access to features and APIs will vary from platform to platform, app to app. Document authors benefit from being able to discover these capabilities.
Luke has written a nonfiction book on web development. To discover which APIs he can demonstrate directly in the PWP and which he cannot, he uses a permissions API combined with feature detection to let the text discover what capabilities it has access to at each read. standardizing an "always available" safe subset of the platform's capabilities, or documenting specific methods progressive enhancement or graceful degradation that the author should be able to use to safely create their documents.
Jill has written a detailed data analysis report for her manager that uses complex interactive graphs to display the results. These graphs work in the manager’s user agent because the script driving them uses a previously specified ‘always safe’ API subset that user agent’s have committed to enabling even when the capabilities of the document are otherwise limited, e.g. for security.
Jack has written an interactive children’s book. Because he has followed published guidelines on how to make texts that progressively enhance or gracefully degrade, all of his readers can experience the text, even if they don’t have access to all the features, all the time.
10.3 Preserving Integrity
Req. 49: PWP authors should be able to embed guidance policies in their documents that inform the user agent of their preferences as to how the integrity and security of the document itself should be preserved. Indeed, scripted documents are dynamic by nature; long-lived authored documents are vulnerable to alteration by a variety of external factors. This mechanism should be based on the pre-existing Content Security Policy (CSP) [csp2] and Subresource Integrity [sri] specifications and not be a new invention incompatible with web browser CSP implementations.
Jill’s data analysis report used an externally hosted script to generate her interactive graphs. To ensure that these scripts do not violate her manager’s privacy, she includes a CSP whitelist in the document that tells the User Agent that it should only load that external script and not any of the third party services the script wants to load as well.
After Jill’s report has been in the archive for a year, Simon, the manager, wants to revisit the analysis for his own end-of-year report. But because the domain for the external web service has lapsed, it has been taken over by a malicious actor who replaced its scripted graphs with graphs that display random numbers and nonsense. But because Jill’s authoring system used Subresource Integrity attributes the User Agent refuses to load the malware and instead the document degrades gracefully to showing static—correct—graphs.
10.4 Escalating trust
Req. 50: User agents may provide a method for escalating trust. By providing such methods, user agents may regain access to more capabilities, while otherwise the agent would impose limitation for security reasons. Platform vendors have sometimes offered methods for otherwise untrusted local scripts to become trusted and regain API privileges that the had lost while untrusted.
Luke has written another book, this time using all of the capabilities of the web platform that he can think of. He submits the book for review to a PWP retail platform. After review, the book is listed in the retailer’s store. When bought, the book runs with the full capabilities of the platform because it has been code signed by the platform vendor.
User agents may provide a method for escalating trust
The following people have been instrumental in providing thoughts, feedback, reviews, content, criticism, and input in the creation of this document:
Boris Anthony (Rebus Foundation), Luc Audrain (Hachette Livre), Nick Barreto (Invited Expert), Baldur Bjarnason (Rebus Foundation), Timothy Cole (University of Illinois at Urbana-Champaign), Garth Conboy (Google), Dave Cramer (Hachette Livre), Romain Deltour (DAISY Consortium), Brady Duga (Google), Heather Flanagan (Invited Expert), Markus Gylling (IDPF), Ivan Herman (W3C), Deborah Kaplan (Invited Expert), Bill Kasdorf (BISG), George Kerscher (DAISY Consortium), Peter Krautzberger (Invited Expert), Charles LaPierre (Benetech), Laurent Le Meur (EDRLab), Vladimir Levantovsky (Monotype), Mia Lipner (Pearson), Christofer Maden (University of Illinois at Urbana-Champaign), Shane McCarron (Spec-Ops), William McCoy (IDPF), Hugh McGuire (Rebus Foundation), Ben De Meester (iMinds), Liam Quin (W3C), Leonard Rosenthol (Adobe), Nicholas Ruffilo (Invited Expert), Rob Sanderson (Stanford University), Avneesh Singh (DAISY Consortium), Alan Stearns (Adobe), Ayla Stein (University of Illinois at Urbana-Champaign), Tzviya Siegman (Wiley), Nicholas Taylor (Stanford University), Benjamin Young (Wiley), and Daniel Weck (DAISY Consortium)