DPUB IG Telco, 2015-02-23: Identifiers, packaging, & manifests

Author(s) and publish date

By:
Published:

(Meta comment: the W3C Digital Publishing IG has weekly teleconferences. The minutes of the meetings, as well as a short summaries, are available on line. However, to give a greater visibility, from now on these summaries will be published on this blog rather than just putting them on the wiki.)

The meeting mostly concentrated on some technical issues around the EPUB-WEB vision. See the minutes online for a more detailed record of the discussions.

Metadata Task force and identifiers

Some of the crucial issues related to EPUB-WEB are around identifiers, fragments, etc. It was suggested that the former Metadata Task Force would concentrate on these, identifying use cases and requirements primarily in the area of fragment identifiers. While the problem area around fragments is relatively clear, the issues on identifiers, and how that would affect EPUB-WEB are more complex. Indeed, many identifiers used out there are based on registries and are only loosely coupled with HTTP URI-s; also, many discussions in that space are happening outside this group. The way forward is probably to “reset” the Metadata Task Force, essentially by creating a new task force to make the intentions clear.

(There are some very initial thoughts on identifiers and EPUB-WEB on the epubweb wiki.)

Overview of the Web Packaging draft

The W3C Web Packaging draft was discussed to see how it would fit in the EPUB-WEB vision (as a possible alternative to ZIP). Ivan Herman has prepared some notes on the document on a wiki page.

Three main areas of attention in the draft are:

  1. Packaging itself, based on (essentially) a multipart Mime approach. The important point is that, conceptually, a package is a concatenation of HTTP responses, including HTTP Headers, for specific resources into one package resource; the package itself may also have its own HTTP Header. This approach brings the package very close to current Web technologies, and provides a rich possibility of metadata on each resource as defined in the HTTP standard. (E.g., and ePub “spine” can be implemented through these headers)
  2. Fragment identifier, as defined in the document, is based on the idea of:
    1. define a set of “candidate” parts within the package (listing a set of possible URL-s, for example)
    2. choose among the candidates using some filters (essentially content negotiations based on type or lang).
    3. use a fragment as defined for that specific media type; i.e., EPUB-WEB can rely on existing and evolving fragment identifications for different media without having to reinvent its own.
  3. “Link relations”, either in form of an HTTP Link header or an HTML <link> element. These provide a suitable entry point to an EPUB-WEB document: e.g., a landing page refers to the package (i.e., the possibly offline document).

Subsequent discussions looked at the question where such a packaging would be advantageous compared to ZIP. The document mentions facilities of streaming, tooling support, and richer per-part metadata; the feeling on the call was that the last argument is the strongest in favor of Web Packaging (although the availability of HTTP related tooling when handling the content of a package was also deemed to be important).

It is worth mentioning that Dave Cramer made a test on how the (ubiquitous) Moby Dick could look like in a package. The package can be downloaded from the Web (note that the fact that it is a “ZIP” file is just a means to make the file smaller in an email; the package itself can be looked at in a text editor.)

It was emphasized that the Digital Publishing community is in a unique position to strongly influence the evolution of Web Packaging, because the work is at its starting phase; joining the relevant Working Group, possibly acting as editor, is in a window of opportunity right now.

Overview of the Manifest draft

The W3C Manifest draft was also discussed to see its relations to EPUB-WEB. Tzviya Siegman has prepared some notes on the document on a wiki page.

The question, from the EPUB-WEB point of view, is whether that manifest format can be used as a manifest for EPUB-WEB documents.

The manifest is a JSON-LD file that can be associated to a resource via a specific <link> element. It has a number of metadata term that are currently aimed at web applications (icons with their sizes, display formats, etc.). Three specific issues were brought forward:

  1. The manifest has a notion of “scope”: a URL that represents the scope of URLs that can be navigated within context (note that web packaging also has the notion of a “scope”). It is not clear whether that functionality is enough for EPUB-WEB to help in identification
  2. Display mode: this is one of the terms defined by the manifest and may be very important for personalization
  3. Openness (or closeness) of the manifest terms: is it possible to add/define additional terms that are more important to the publishing community. It was felt that some sort of an extension structure, whereby various communities could add their own terms, would be a way forward, rather than cast a specific set of terms in concrete.

Related RSS feed

Comments (0)

Comments for this post are closed.