Publications, from corporate memos to newsletters to electronic books to scholarly journal articles, must be considered first-class content on the Web, equal to the more common forms of Web pages available today. This document describes the various use cases highlighting the problems users and publishers face when these publications are to be used in a digital, Web environment. The requirements that come from those use cases provide the basis for the technical considerations in a companion document, currently entitled “Web Publications” [pwp].
Status of This Document
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This is work in progress. The final version of this document planned to be published as an Interest Group Note.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the
5 February 2004 W3C Patent
Policy. The group does not expect this document to become a W3C Recommendation.
W3C maintains a public list of any patent
disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains
Essential
Claim(s) must disclose the information in accordance with
section
6 of the W3C Patent Policy.
The Web emerged in 1994, based on a model of individual pages loosely joined by hyperlinks. Clustering within domains and with explicit navigation elements built into them, webpages evolved into websites. This model inherited very little from an existing, powerful and much older page-based media: books.
Over centuries, “books” have assumed many forms: journals, magazines, pamphlets of long-form articles and essays, newspapers, atlases, comics, notebooks, albums of all sorts. We can define these different manifestations as “publications”: bound editions of meaningful media, made public.
Another form of publication that also has a long history in both the printed as well as the digital world are documents. These are publications that are written and distributed in a more ad-hoc manner, such as legal briefs, corporate memos, and even the definitions of standards, such as this content currently being read.
We believe there is great value in combining this older tradition of portable, bounded publications with the pervasive accessibility, addressability, and interconnectedness of the Open Web Platform (OWP). New models of economic sustainability, innovative experiences of knowledge, and invigorated socio-cultural engagement depend on this.
It is the task of the W3C
Digital Publishing Interest Group to explore the uniqueness, desirability, and feasibility of bringing these two great models of publishing together. This document explores requirements based on examples of real world use cases and scenarios. Requirements for publications on the Web are explored first, without referring to any packaging aspect that would correspond to current practices like EPUB. This is followed by requirements of those packaging aspects, as a structure on top of a purely Web based distribution. The complete list of requirements is also collected in a separate table in
A.List of Requirements.
This document uses the term user agent, as used by the Web community; see, for example, the WAI
glossary entry. The publishing community often uses the term “reading system” for roughly the same notion; while there may be subtle differences, it is better to stick to a single term for the purposes of this document.
A Web Publication (WP) is a collection of one or more constituent resources, organized together in a uniquely identifiable grouping, and presented using standard Open Web Platform technologies.
A Packaged Web Publication (PWP) is a Web Publication whose constituent resources are combined into a single distributable file, using some standard packaging format.
In this document, manifest refers to an abstract means to contain information necessary to the proper management, rendering, and so on, of a publication. This is opposed to metadata that contains information on the content of the publication like author, publication date, and so on. The precise format of how such a manifest is stored is not considered in this document.
This section describes the use cases specific to Web Publications and the Open Web Platform, including associated publishing needs, and discoverability.
Req. 1: Web Publications should be able to make use of all features offered by the OWP.
There is a remarkable development of tools and frameworks built on top of OWP that make it possible to develop powerful interactive layers on top of OWP. These include, for example, data visualization systems (e.g., d3, built on top of SVG), possibilities to access external services like Wolfram Alpha, or tools to create and store (possibly as part of the publication) annotations. These tools have been traditionally developed around browsers, and provide possibilities that publications should also benefit from. That requires that Web Publications become first class citizens on the Web platform.
A large, multidisciplinary, Web-based journal relies on traditional Web technologies like HTML and CSS for its content. The journal, responding to the evolving expectations of its audience, is increasingly using additional media such as video, audio, animated graphics, and very large images; the trend is to consider these as integral parts of the scientific output. The journal as a result needs access to the latest visualization and other data management tools that the OWP-based tools can offer.
Educational publications are increasingly making use of OWP features. They include interactive exams (possibly linked to online evaluation facilities), visualization of data or of algorithms, built-in interpreters for various languages (e.g., for courses on programming); in many respects, the borderline between these publications and Web applications is becoming fuzzy.
BigBoxCo, a large technology company with extensive “in-house” documentation to support the technical and/or user documentation for their various products and administrative processes, develops all this material in digital-only formats. The quantity of documentation makes it impractical to produce these documents in print. Instead, the company publishes them on the company intranet, and/or provides them to their employees and contractors via specialized mobile applications. These documents, as a type of publication, require accessibility, portability of annotations, and the possible inclusion of complex media.
Req. 2: A Web Publication should conform to the requirements of all horizontal dependencies.
Web content has to be consumed under different circumstances: it must be available to the largest possible audience in a secure manner, providing the necessary protection of the reader’s privacy. Publication content must be able to answer to a number of principles like accessibility, internationalization, device independence, security, and privacy. (These are usually referred to, in the W3C context, as “horizontal” dependencies.) These principles are, in general terms:
Accessibility:
People with disabilities should be able to access the content of a publication. They should be able to perceive, understand, navigate, and interact with it, as well as contribute to it. Accessibility encompasses all disabilities that affect access to the content, including visual, auditory, physical, speech, cognitive, and neurological disabilities.
Internationalization:
Publications should be well adapted to any language, writing systems, region, or culture. This includes the usage, when appropriate, of left-to-right, right-to-left, horizontal or vertical writing; item numbering, or interactive forms specific to local cultures; usage of the right character sets and of local typographic conventions.
Device Independence:
The content in a publication should be usable on a large number of devices with very different device characteristics: different screen types and sizes, various input modalities, varying level of processing power, etc. These different affordances should be automatic with no, or very little, user intervention.
Security:
Publications should be presented by a User Agent using a security model that is at least (if not more) secure than the standard Web security model. Doing this will prevent publications that contain malicious attacks, data theft, and other security incidents from impacting users by jeopardizing the integrity of the underlying data or machine operations.
Privacy:
The content in a publication should maintain and support user privacy, in spite of the fact that the evolution of online technologies has increased the possibility for the collection and processing of personal, and possibly sensitive, data. However, since a publication may use any part of the OWP, it may choose to use functionality such as the ability to track a user's activity within the publication.
These principles correspond to technical requirements on the underlying technologies (i.e., OWP, and its possible extension to Web Publications) insofar as the technologies must empower the authors (writers, editors, publishers, etc.) to produce content that follow them. Whether authors use the possibilities of these technologies or not is not addressed in this document.
All these constraints are formalized in the context of the usage on the Web and by extension Web Publications. This means that they are valid for publications in general. In some cases, for example due to legislative reasons, the demands on publications may be more stringent than for generic Web sites. The use cases below provide some examples for the publication-specific situations. Note also that some aspect of horizontal dependencies (e.g., accessibility or security), are also the subject of further use cases and requirements elsewhere in this document.
(On Accessibility) Legal Publishing Ltd. publishes all the official texts as issued by the government of its country. Per local legislation, the publication must be accessible, following W3C’s WCAG Level AA requirements, to serve as official references in courts.
(On Privacy, Accessibility) EducationPublishing Ltd. publishes digital textbooks to cover BigUniversity’s curriculae. These (digital) educational publications also include access to interactive tests via specialized services on the Web that regularly access the student’s progress. The privacy and the integrity of the student’s test data must be preserved. This, and the fact that digital textbooks must also abide to WCAG Level AA requirement in terms of accessibility, are such that EducationPublishing may be liable in case they are not fulfilled.
(On Internationalization) PublicationInternational SA. publishes literary work all over the world and in many languages. In order to continue its business in Japan, it must be able to produce digital publications with right-to-left and vertical writing, and following the Japanese typesetting traditions, because that is the only way those publications are accepted by local customers.
(On Privacy) Thomas has written a pamphlet advocating a government overthrow. The government has decreed that the author of the pamphlet as well as its readers of the pamphlet shall be jailed. Thomas needs to distribute the pamphlet in ways that preserve his anonymity and allow the public to read without fear of the government cyber-police.
(On Device Independence) Yoshio usually reads a book on his tablet when he is at home, but he does not carry his tablet around while commuting on the train. Instead, he prefers to use his phone to continue reading. Publications must be able to adapt to the consumption environment, so as to provide a good reading experience regardless of the device.
(On Security) LocalLibrary receives publications from a variety of sources that they then make available to their members. It is imperative that none of these publications can cause any damage to their own systems or those of their members.
Req. 3: The notion of a Web Publication should enable specific publications like audio books, graphics books, and mixed media.
All concepts and structures related to a Web Publication should enable the creation and/or production of alternative renderings for visual and auditory content.
Faye, a busy mother of five, wants to access audio books while commuting, jogging, doing dishes, or otherwise not able to use her eyes or hands.
Khoudia, a librarian focusing on the children's section of her local library, is looking exclusively for material rich in audio and video components so as to reach a wider age bracket.
James, a musician, requires that the musical score within a publication come preformatted in braille music notation in order to read it, as he uses freely available assistive technology which does not have braille music translations built in.
Req. 4: A Web Publication needs to support both time-based media and text.
A Web Publication needs to support time-based media, such as synchronized video, audio, captions or transcript, or sign language interpretation. A Web Publication must also be able to enable a synchronized media experience while navigating through the publication, with sufficient level of granularity.
Illyés has a cognitive disability and uses accommodated texts in the classroom, to help learn the content while improving his reading. His assistive technology uses combined audio and highlighted text, which it obtained from the UA through the information provided in the Web Publication, to turn the page for him while reading along in sync with the page currently open.
Req. 5: Web Publications should be able to include data as resources, just as it does with text, images, etc.
Rosa has submitted an article to EsteemedJournal and provided her research data in CSV format. She and EsteemedJournal provide users access to the CSVs when accessing her article in any situation by including the CSV data, as well as the Javascript library to display the content in human friendly form, as part of the Web Publication.
Req. 6: A Web Publication should also be available offline.
The same content of the Web Publication should be accessible offline, if circumstances so dictate, without the necessity for the user to take any particular, technical actions.
Omo, a student in a remote Nigerian village, is taking classes online. Connectivity to the village is unreliable and intermittent. Omo needs to have his textbooks available regardless of actual connectivity.
Heather, a frequent international traveller, enjoys reading books and tour guides on her portable device, regardless of her physical location on any given day. Due to the high mobile network access roaming charges on her mobile network, she tends to download as much of her reading material as possible where she can avoid those additional charges.
Gemma, a private collector of digital publications, is building a private collections of publications that she expects to be available to her whether online or offline, over the public Internet, or within a private local area network (LAN).
“In house” documents may have to be accessed both online and offline, depending on the access point. While online access might be beneficial when done from the work floor (e.g., at an airplane production line), the same documents may need reliable offline access (e.g., in the cockpit).
Gyöngyi, selected as a peer reviewer for the Journal of Scholarly Publications, only has time to review her assigned publication while commuting on the train to her university, where she does not have connectivity. Since her review process includes the creation of annotations, notes, highlights, and possibly changes on the content itself, it is important that these changes must be smoothly transferred back to the server of the journal when she is back online.
Req. 7: User agents must treat a Web Publication as a single logical resource with its own URL, beyond the references to individual, constituent resources.
Marwin wants to search for a term on the publication. As a reader, he does not know the internal structure of the book, i.e., whether the content is one or several HTML files; he wants to search to be executed on the whole (logical) content, regardless of its internal representation.
Svetlana sets her preferences in terms of font selection and size, background color, etc, for a particular book. She wants those to be in effect on all chapters of the book automatically.
User agents that support value counters (page counters, section numbering, footnotes, endnotes), should do so across the entire Web Publication (as opposed to individual components being numbered separately)
Assistive Technology such as screen readers or voice dictation control needs to have the Web Publication presented to it as if it was a single unit.
Req. 8: There should be a way to uniquely identify a Web Publication.
A unique identification of a specific Web Publication, is essential. If not expressed as a URL, there should be a way to map this unique identification onto a Web Address.
Scholarly references demand a unique identification of the publication and, possibly, its internal structure. That unique identification must be available as a Web link, to make it possible for other publications and other sites (e.g., the authors’ institutional sites) to unambiguously link to the publication. These features are essential in the scholarly community to make, for example, the assessment of individual researchers possible.
2.1.9 Uniquely Identify the Constituent Resources §
Req. 9: All constitutent resources, and their contents, should be uniquely identifiable.
The requirement on identification already states that there should be a way to uniquely identify a publication (see 2.1.8Uniquely Identify a Web Publication). This requirement can be easily extended to constituents of a Web Publication, as well as the fragments, parts, sections, etc, of those resources. Those idenfications should be stable, resilient to changes and new iterations of the publication.
Markus refers to a specific mathematical theorem in a publication. That reference must be unique, stable and retrievable on the Web, and it should not depend on whether the publisher issues a new iteration of the target publication (thereby possibly change the section numbering).
Judit uses an annotation tool to comment on a publication authored by Pablo. She puts an annotation against a sentence in a particular paragraph, anchoring that annotation to the sentence using a reliable way of identifying it. That identification should not be invalidated by a subsequent change of the document by Pablo (unless he, e.g., removes that sentence).
Req. 10: It should be possible to see the Web Publication in a “paginated” view.
Whereas a “scrolling” view is the dominating approach on the Web in browsers, a user or author may wish to view their publications in a paginated view. As such, it should be possible for an individual publication or user agent to provide the ability to switch to pagination view. This pagination may automatically adapt page sizes to the device’s or the browser’s viewport, and may contain separate headers, footers, and/or page numbers.
This is distinct from the need to retain original page numbering (often from the print edition) which must be available on demand and must be usable to discover specific locations in the publication.
For more detailed requirements on pagination, see here.
Ann reads War and Peace which, when printed, is over 1200 pages. In order to have a better sense of her progress in the book and to make navigation within the book easier (i.e., to support usability) she decides to switch her reading environment to paged view.
Susan uses a flexible CSS layout, that includes images, to create a rich, interactive publication on the history of a city. Each major historical milestone is defined as a standalone unit that would be a single page when printed, with a timeline with the main events in the footer area of the page.
IndyPublisher wants to provide transition effects between pages, both within and across content documents.
Mr. Oayia, a classroom teacher, says, “turn to page 137 of your textbook.” Regardless of layout and font size, students reading digital editions need to find the same location in the textbook as one another and as students reading the print edition.
Req. 11: The user must have the possibility to
personalize his or her reading experience. This may include, for example, controlling such features as font size, choice of fonts, background and foreground color, tone of audio, etc.
Olga, a dyslexic student, downloads a textbook and proceeds to personalize the material with larger and/or a specialized dyslexic font, as well as different contrast that, for her particular case, makes the text easier to consume.
When reading a book in the sun, Mia adjusts the background color to allow for a stronger contrast so that she can see the text.
While reading a book on computer programming, Ransheed wants to change the font into a local font. However, the code samples within the text should remain in a fixed-width font.
Buffy is Deafblind. Every morning she downloads her daily newspaper. Like most news sites, it provides many rich multimedia presentations. As a high-quality, accessible news site, it's multimedia presentations come with captions and transcripts. Buffy does not want to waste her data plan on the useless-to-her audio and video content, so she instructs her user agents to ignore them.
Req. 12: There should be a means to indicate the author’s preferred navigation structure among the resources of a Web Publication.
A user agent needs to know the sequence in which to present components of a Web Publication to the user, including the starting point.
Moby Dick contains 136 chapters. Each chapter is a separate HTML document, with a logical order for reading them. It should be possible for the publication to inform the user agent that the proper order for consumption of the HTML documents is sequentially, starting by the first chapter.
The Encyclopedia of Stuff includes 1348 articles, each one in a unique HTML document. The publication must be able to indicate to the user agent that the standard way to consume the articles is alphabetical order, by title.
Req. 13: Authors of a Web Publication should be able to provide the user agent with information to access random parts of the publication.
It should be possible for the author to convey several potential reading orders that may go beyond the “default” for the content of the publication. This alternative reading order may only includes specific parts of the publication rather than the full content of the Web Publication.
A user agent should be able to access the resources of the publication in whatever order it chooses—beyond the order provided by the publication itself.
EsteemedJournalPublisher would like to offer the users of the EsteemedJournal of Chemistry App the opportunity to read only the abstracts of the journals in the app. The publication would therefore provide the user a list (table of contents) of abstracts (disjoint objects in the package with semantic information or metadata informing the package of the nature of the object).
A publisher wants to provide “teasers” for a book by providing a series of extracts that are meant to give an overview of the book without the necessity to read the whole publication. This can be typically used by a reseller allowing for a prospective client to access part of the publication free of charge.
EducationalPublisher publishes a complex textbook. The textbook is created is such a way that it could be used both for beginner and advanced level. The default reading order corresponds to beginners, but the goal is that advanced students can follow a different path through the material, corresponding to their level of knowledge. EducationPublisher adds, therefore, alternative reading orders to the publication that advanced users can follow
Acme Publishing has published a book on wines that can be read from A-Z, or personalized to only read about red wines or wines from a specific region.
A specialized user agent wishes to find all images in a publication that do not already have alternative text and automatically provide it using an image identification service such as LabelMe.
Req. 14: The information regarding the constituent resources of a Web Publication must be easily discovered and there should be a way to differentiate between essential and non-essential resources.
A Web Publication will likely be composed of multiple Web documents and their resources. A more complicated Web Publication may have many resources, some of which are essential and some of which are not. Because of this complexity, extracting in advance all the references to some or all constituent resources may be prohibitive. It is therefore necessary for the user agent to have an easy access to the list of constituent resources and some of their characteristics such as whether they are essential, their media types, or their sizes.
In a publication, some content is essential to the user being able to consume it while other content could be either absent or have a provided fallback for situations such as limited connectivity or storage. This information, provided by the author or publisher of the Web Publication, would enable a user agent to provide a better experience to the user. For example, the user agent can ensure that essential resources are made available when offline (see 2.1.6Going Offline).
Nick is reading a long-form narrative on a device with limited storage: a publication filled with text, images, sounds, and multimedia files. Nick also rides the subway, where he loses Internet connectivity frequently and without warning for long stretches of time. During offline or low-storage situations, there are still critical parts of the publication that are consumable, mainly the text (and possibly images). Having a reasonable fallback for video, such as a poster image or placeholder image, would allow Nick to read the content while offline or on a device with limited storage.
Gösta is reading a treatise on the theory of functions. A mathematical font is essential for the proper display of the mathematical formula in this publication, so the author has marked the font as essential so it is made available offline.
While reading an article on a new spam analysis algorithm, Lars is primarily interested in the findings of the research. Since the research was funded by a government agency, the dataset, consisting of millions of anonymized log files, is also available. Because of its size, the researchers have marked the dataset as non-essential for conveying the results of the paper and therefore indicates it can be skipped when reading the publication offline.
Sarah is reading a publication about the stock exchange. The current value of the stock is fetched (from a remote resource) when she opens the publication. However, when she is on the train (without a connection) one week later and opens the stock exchange publication, she will continue to see the value of the stock as it was the last time she opened the publication. It should be possible for either the content itself or the user agent to provide some user experience that notifies her that the currently presented data is a week old.
Risha publishes an article which includes an interactive component that accesses a database, exposed to the Web via a RESTful API. The interactive component is implemented as a JavaScript library. Such data cannot be included in a packaged publication and the interactive module is of no use without such data. Risha therefore marks the Web Publication component relative to the interactive module as not relevant when offline.
Req. 15: A Web Publication should be able to express the access control and write protections of the publication.
A library may loan the publication for two weeks or a university may make a textbook available for its students for the course of the year. A Web Publication should provide a means to inform user agents about the availability period to enable the UA to control access accordingly.
Alice is working on potentially Nobel prize winning research, and has drafted her paper describing her discoveries. She asks Bob to review the paper, but needs to make sure that the Web Publication retains specific protections on what Bob is able to do with the publication and its content.
Req. 16: Web Publications should include technical and descriptive metadata as well as any additional characteristics of the constituent resources.
A user agent may require information about the publication and its components in order to process it. For example, performance and memory requirements may prevent a user agent from parsing a large number of content documents in order to discover the necessary components and their relationships. A user agent may need to make some decisions about how to present content before displaying it.
A Web Publication should be able to include additional information that the user agent can use like:
the rights of various components identified in the package
captions belonging to multimedia resources within the publication;
whether there is a need for additional processing, such as with MathML;
the title, author(s), cover image, etc., to display the publication on a shelf without downloading all its content;
Marla is writing an art book and wish to include the rights relevant to the image of the Mona Lisa owned by the Louvre, so that a user agent will know whether it is permissible to download the image for offline use.
Ferdous wants to buy a book about a museum exhibit, but before he does that, he wants to guarantee that the images and videos about the exhibits have detailed descriptions, to ensure that he will be able to read it with a screen reader or refreshable Braille display. This can be done because the publisher provided that information as part of the metadata assigned to the Web Publication.
A university professor is developing a course and the professor knows that he is required by the university's policy to use digital materials that conform to WCAG 2.0 level AA. The professor searches to determine which titles are accessible and therefore suitable for his use. This can be done because the publishers have added the Schema.org Accessibility Metadata to the Web Publication, describing the accessbility characteristics for each constituent resource.
Req. 17: It should be possible to create and distribute a Web Publication as a single unit over different protocols or physical media. This can done through the usage of Packaged Web Publications.
HA, Ltd, a publisher of legal briefs, needs to distribute content in a consumable format to its clients via secure email.
Dalia, a patent lawyer, wants to consume content on a multitude of devices, some of which may not always have connectivity. In order to meet her expectations, it is necessary to have all required content grouped in a logical structure that can be easily transferred between devices.
Andreas is working on his first collaborative research paper with a fellow student. He wants to share a relevant publication that includes content, diagrams, and data sets with his writing partner. He does not have time to learn how to share each component so that his partner can access it all without much effort; he expects to be able to share this material as a single unit via the chatting system that they use to collaborate.
Dave is reading Moby Dick on his tablet (at home with network connectivity). He then jumps on a plane with his good friend Tzviya. After having finished reading the book, he wants to lend it to Tzviya, so that she can start reading on her own tablet. They are both offline, but can exchange data with SD cards or Bluetooth.
Req. 18: The distribution of a Packaged Web Publication should not affect its iterations.
Simply distributing or sharing a Packaged Web Publication to multiple destinations and devices should not result in (technically) different iterations of the Web Publication unless they contain modifications that make them different Web Publications.
Publisher Corp. Inc. publishes a new Packaged Web Publication and sends it to its distributors and customers. This Packaged Web Publication is downloaded to devices or made available to a customer-specific cloud. Customers can access this file from different retailers, through different applications, either directly or downloaded from a private cloud. Thus, the Web Publication is duplicated many times, resulting in a huge number of copies. There remains a single source manifestation, and therefore one canonical identifier for all of the items spread across devices and buyers.
Mary creates a Packaged Web Publication and sends it to Dave and Kristin. Kristin simply sends it along to two other friends, but Dave adds some comments first to his copy before sending to two friends. By doing so, Dave has created a new Web Publication with its own canonical identifier while the version used by Mary, Kristen and her friends remains the same as the original.
Slicendice Publishing publishes many Packaged Web Publications, some of which are different iterations or subsets or combinations of others. Slicendice needs not only to be able to uniquely identify each unique Web Publication but also to identify each “copy” or “delivery” (“item”) of each of those Web Publications so that it can track “what” has been sold and “how many” of each one have been sold.
BigRetailer receives a Web Publication from EsteemedPublisher that it intends to add to its catalogue. BigRetailer wants to adds it own “teaser” via an alternative reading order. To achieve that, BigRetailer provides its own version of the publication’s manifest, that the user agent will use instead of the publisher’s manifest.
Req. 19: The distribution of Packaged Web Publications should respect the existing processes and expectations of professional publishing channels as well as ad-hoc methods of distribution (eg. email).
Ahmed acquires a Packaged Web Publications on an e-commerce platform. He expects to be able to receive the Web Publication as a file (rather than only having access to it) and to be able to load it onto its different reading devices.
Alice acquires a Packaged Web Publication through a subscription service and downloads it. When, later on, she decides to unsubscribe from the service, this Web Publication becomes unavailable to her.
Leila has just written a report for school as a Web Publication, but she is required to email it to her teacher. She takes advantage of the fact that it is possible to package up a publication and then sends it off.
We take for granted the relative durability of print artifacts, many of which have survived with little more than benign neglect. In contrast, digital documents are unlikely to persist without more active interventions, such as making copies, monitoring software dependencies, and validating integrity. Since future consumers of publications represent the most open-ended user group, it is desirable that digital documents be instilled with more of the inherent durability that characterizes print artifacts. Packaged Web Publications offer this potential, by making it easier for archiving services to locate, harvest, update, and describe digital publications. Long-term preservation of digital publications ensures that they may continue to be accessible, beyond the tenure of individual authors, file formats, publishers, or publishing platforms.
Fundamental use cases and requirements already help aid our archiving requirements. For example:
Req. 20: There should be a way to indicate whether one or more Packaged Web Publication components contain (embedded) descriptive metadata.
An archiving service needs a reliable way to determine which, if any, Web Publication components contain descriptive metadata, such as that described in metadata and resources. Without such a mechanism, the archiving service will have to develop and maintain publisher- and/or platform-specific heuristics for locating or parsing out descriptive metadata, making archiving more expensive and decreasing the reliability of reporting.
An archiving service sets out to conduct an initial harvest of an article. Along with the images, markup, scripts, and style and layout instructions that constitute the object, it is able to locate a file containing descriptive metadata. The archiving service retrieves these resources and packages them into a logical archival unit for ingest into a preservation repository. A related process identifies and parses the descriptive metadata and saves its contents into an associated management database.
Req. 21: There should be a way to discover that one or more new components have been added to or deleted from a Web Publication.
An archiving service needs a reliable way to learn that one or more Packaged Web Publication components have been added to or removed from a Packaged Web Publication in order to be able to update the associated archive of the publication.
An archiving service regularly polls for changes to an article that it has already archived. One such poll indicates that several resources have been added to the object. The archiving service retrieves these resources and store them as incremental updates to the appropriate archival unit in a preservation repository.
A publisher issues a retraction for a published article, resulting in the addition of new resources to the object (i.e., the retraction notice) and the removal of others (i.e., the article content). An archiving service regularly polls for changes to this article, which it has already archived, and discovers the retraction. The archiving service retrieves the new resources and record those that are no longer accessible, carrying over the cumulative updates to a preservation repository.
A copyright dispute results in the takedown of a published book. An archiving service regularly polls for changes to this book, which it has already archived, and discovers that it has been taken down. It records that the resources that constitute the object are no longer accessible and propagates this update to a preservation repository.
Req. 22: A PWP should include means to map the identification of a constituent resource between the Web and its equivalent in a package
In order to allow a Web Publication to be packaged without any changes to the content, it may be necessary to provide a mapping from the (absolute) URLs present in the publication to URLs that point to the constituent resources inside the package.
An archival service wants to harvest (spider) a Web Publication and not have to modify the OWP content during the process. In order to achive that goal, its manifest would incorporate a mapping from the URIs present in the OWP content to their new location inside the archive.
Req. 23: The publisher should be able to provide information in a Portable Web Publication proving that the publication has not been tampered with during delivery.
LegalPublisher Ltd. regularly publishes the official legal texts and regulation as decided by the local government. Michael, who is a lawyer, has access to these documents via his law firm, and uses them for his cases; to do so, he must be 100% sure that the publication he accesses faithfully reproduces the latest governmental decisions. This can be done because LegalPublisher Ltd. adds the necessary cryptographic information to the Web Publication that becomes invalid if any resource of the Web Publication changes.
Req. 24: The publisher should be able to provide information in a Portable Web Publication that can be used to check the origin of the publication and its authenticity.
Michael, who is a lawyer, and uses the publications of LegalPublisher Ltd., must be 100% sure that the publication he uses for his case has indeed been published by LegalPublisher Ltd., and not by a possible third party. This can be done because LegalPublisher Ltd. adds the necessary cryptographic information to the Web Publication proving its own identity.
Req. 25: User agents may provide a method for escalating trust for a specific publication. Some publications may require additional capabilities (for example, access to camera or geolocation) that a user agent might normally not enable. Today, some platform and UA vendors offer methods for otherwise untrusted local scripts to become trusted and regain API privileges, a similar ability needs to exist for publications as well.
Luke has written another book, this time using all of the capabilities of the Open Web Platform that he can think of including using the readers location to adapt the content. He submits the book for review to a Web Publication retail platform, where the book is signed by the publisher. When purchased, the UA detects that the book came from a trusted source and has not been modified, therefore allowing it to use the full capabilities of the web platform.
The information regarding the constituent resources of a Web Publication must be easily discovered and there should be a way to differentiate between essential and non-essential resources
The distribution of Packaged Web Publications should respect the existing processes and expectations of professional publishing channels as well as ad-hoc methods of distribution (eg. email).
The publisher should be able to provide information in a Portable Web Publication proving that the publication has not been tampered with during delivery.
The publisher should be able to provide information in a Portable Web Publication that can be used to check the origin of the publication and its authenticity.
Ensuring accessibillity is a strong requirement for publishers. Instead of dealing with accessibility features as separate requirements and use cases, this document provides use cases for requirements whenever appropriate. This table lists those requirements, and also the use case number(s) within that section that are related to accessibility.
The following people have been instrumental in providing thoughts, feedback, reviews, content, criticism, and input in the creation of this document:
Boris Anthony (Rebus Foundation), Luc Audrain (Hachette Livre), Nick Barreto (Canelo, Invited Expert), Baldur Bjarnason (Rebus Foundation), Marcos Caceres (Mozilla), Timothy Cole (University of Illinois at Urbana-Champaign), Garth Conboy (Google), Dave Cramer (Hachette Livre), Romain Deltour (DAISY Consortium), Brady Duga (Google), Heather Flanagan (IETF, Invited Expert), Hadrien Gardeur (Feedbooks), (Markus Gylling (IDPF), Eric Hellman, Ivan Herman (W3C), Deborah Kaplan (Invited Expert), Bill Kasdorf (BISG), George Kerscher (DAISY Consortium), Peter Krautzberger (MathJax, Invited Expert), Charles LaPierre (Benetech), Laurent Le Meur (EDRLab), Vladimir Levantovsky (Monotype), Mia Lipner (Pearson), Christofer Maden (University of Illinois at Urbana-Champaign), Shane McCarron (Spec-Ops), William McCoy (IDPF), Hugh McGuire (Rebus Foundation), Ben De Meester (iMinds), Liam Quin (W3C), Leonard Rosenthol (Adobe), Nicholas Ruffilo (Ingram, Invited Expert), Rob Sanderson (Stanford University), Avneesh Singh (DAISY Consortium), Mike Smith (W3C), Alan Stearns (Adobe), Ayla Stein (University of Illinois at Urbana-Champaign), Tzviya Siegman (Wiley), Nicholas Taylor (Stanford University), Daniel Weck (DAISY Consortium), and Benjamin Young (Wiley).