W3C

Apps or Documents? Manifests, JSON, and the Future of Publications

21 Sep 2016

See also: IRC log

Attendees

Present
dauwhe, garth, tzviya, mike, ivan, leonard, kenneth, glazou, romain, heather, brady, benjamin, boris
Regrets

Chair
Tzviya
Scribe
HeatherF

Contents


dauwhe: suggested the session last week. See summary: https://www.w3.org/wiki/TPAC2016/SessionIdeas#What.27s_new_in_pubrules_and_automated_publishing

correction: https://www.w3.org/wiki/TPAC2016/SessionIdeas#Apps_or_Documents.3F_Manifests.2C_JSON.2C_and_the_Future_of_Publications

dauwhe: how should the web address discrete collection of things?

<tzviya> https://www.w3.org/wiki/TPAC/2016/SessionIdeas#Apps_or_Documents.3F_Manifests.2C_JSON.2C_and_the_Future_of_Publications

dauwhe: higher level than the doc object? Is a web app manifest the way?
... this has implications for digital publishing, and more than that
... may want to bundle web content

<leonardr> https://mikewest.github.io/origin-policy/#app-manifest

leonard: you also talked about metadata over a series of docs.
... "origin policy and origin policy manifest" which describes a way to define a set of metadata (or anything), common response headers,
... so it is a concept that might be tweaked for some of our needs
... leveraging a collection-wide metadata mechanism

dauwhe: is there a possibility of having a collection DOM element, above the document object?

mike: the group that's working on problems around publications

tzviya: DPUB is working on something for web publications. A publication is a collection of web documents which, for our purposes today, is published on the web
... a digital book may have many elements--video, images, text--and these may be actually separate files. In a book, you have a table of contents (TOC)
... and can easily jump from TOC to page $foo
... we talk about books, journals, magazines, self-published newletters, etc

mike: at a high level, we should assume that we can make the web do that's needed fo rthese problems
... making the user experience of books on the web a better user experience
... arguably, books are not a first class citizen of the web, and we want users on the web to have the best reading experience we can, even of long format content

<dauwhe> http://www.clickhole.com/blogpost/time-i-spent-commercial-whaling-ship-totally-chang-768

mike: in thinking in terms of scripting, or programmatically, what's possibly lacking that we could think about further is some reputation of DOM above document
... the highest level of representation in DOM is document
... for fulltext search, we have a bounded set of documents, and we don't have a great story about how to do this. We need to optimize for full text search against a collection of docs.
... aside: the word collection is not great; we deprecated objects in the DOM with the word "collection" in objects. Currently developers are using "sequence", which are ordered
... I think you want things to be ordered; a book is an ordered set of documents (usually)
... this makes the case different from a website, which may not need to be ordered
... another case that's important is TOCs. We don't have a good mechanisms to easily generate a TOC/outline. We have the outline algorithm in the HTML spec, but browsers don't implement it
... it is good for accessibility, and would be good for other things. Regardless, that's the single doc case
... we need a way in may books to generate an outline for more than h1 to h6; that's what the outline algorithm is for
... but many books aren't a single doc; how do you generate an outline for a sequence of documents? There are ways of doing these things, but we don't have a standard way to do it
... we need a standard way to represent a book in the DOM

dauwhe: aside: while generating an outline from a sequence of documents, there is is also value in having an additional navigation document
... EPUB does something like this and it comes close to what we're looking for

leonardr: about human curation, and extending the idea further, while this also needs to show up in the DOM, there needs to be a declarative model for that organization/sequence as well as the outline and subsections
... these are two of the aspects we look at when we talk about manifests
... Another point, it's not just the structure and navigation, but also to represent additional aspects
... e.g., here are elements in the reference and why they are important so a user agent doesn't have to parse the entire document; those items are called out up front (fetch this early, make sure this is cached)

duga: Are you intended that an entire pub or book would be loaded in the DOM at once?

Mike: no. You have these docs already, so no, you don't want to construct a DOM object for the whole book

duga: so how would this work?

Mike: magic (we don't know yet)
... a big component that needs to still happen is the offline case. The case where the user doesn't have a network but they want to continue to read the same book without losing content
... that's a solved problem with Service Worker. We will have implementation in all user agents soon.
... outside the document sequence idea, it would be a good idea to think of solutions in terms of building on top of SW
... we're already assuming that the solution involves using HTML, CSS, and JavaScript, so we should also assume it will involve SW

ivan: the usage of SW is one of the reasons why our community considered that listing all the resources a publication may need...

???: you can already do that in the SW; there is tooling that goes through all the resources

Kennethe: you do this when you create the app, it discovers the resources you need, what you need in what order

???: these need to be fetched, these need to be installed

ivan: it's good that you say that, because we may have some terminology issues to clear up
... when me as an author creates a book, at that point, I have a place where I put the list for the SW to use

Kennethe: it is an array in the SW file

ivan: what we are talking about up til now is that this is info in the manifest

Kennethe: marcos is almost convinced that you'll have a SW in a manifest

ivan: to clarify, there were some email from marcos this morning that he refered to SW API in the manifest - is that it? Y

Kennethe: yes. For a long time there was a discussion if the SW be included in the manifest.
...: installing a SW means to download the SW and run an install step
... if that succeeds, then the pieces are installed/downloaded
... this is not yet in the editor's draft.
... SW is more stable now, and there are several ways you can call it. So, now the next step is to do this with manifest as well.

tzviya: there is an experimental SW reading systems; it needs attention, but maybe more than one person shoudl work on this. Any volunteers of people who have experience with manifests and SWs, that would be appreciated!!!

Dave and Kennethe - FTW!

<tzviya> ac krd

rdeltour: want to make sure we are done with SW? yes, so web client
... it was promising, but it failed. It made assumptions on the level of the headings based on hierarchical position isntead of name
... was one of the few people that tried to implement it, but it has some value
... using the TOC generation will be even harder in a multidoc concept. Since browser vendors have moved on, is it even worth pursuing?

<tzviya> https://www.w3.org/TR/html51/sections.html#the-h1-h2-h3-h4-h5-and-h6-elements

Mike: the outline algorithm is a consequence that we changed HTML to allow h1 elements to be arbitrarily nested. As long as we have an HTML section article, we're stuck.
...: we could say "don't use h1 anymore"
... what they fixed in HTML5.1 is to say "don't nest h1"
... on the UA processing side, they still have to deal with the nested h1 case
... accessibility software doesn't use the outline algorithm either
... if all you've used in your doc is h1, then the screen reader wont' see any structure

rdeltour: should we just move on, or should we put more effort into this?

Mike: don't know how to move on from this; in the multi-doc case, this doesn't change anything.

rdeltour: just ignore automatically generated TOC and manually create it?

Mike: that's what we are doing now

dauwhe: while talking about the collection DOM element, this might be a mechanism for solving another problem in the multidoc space
... There is info, esp CSS stuff, that needs to persist past doc boundaries, such as counters
... we need to resume from last known value, and there's no place to keep track of that info. If there's a higher level object, that might allow counter values to persist
... The other question is, if there is such an element in the DOM, can we have an element in meatspace? If there is an element in a doc, should it point to a file entity that enstatiates this doc element?

ivan: to add to the list of things we don't understand, doesn't that mean that the CSS processing should also have this notion of multidoc?
... if we have to have list counters that jump from one doc to another, then this goes to the same way. It's not only the DOM in HTML, but also the way CSS processes things?

astearns: I don't think so; we have longform docs in multiple chapters, and people figure out how to get list counters to persist. Let's keep doing that.

<tzviya> acl le

leonardr: a conversation this morning about storage and security, this is an interesting problem for a security perspective
... If we assume that want each publication to be unique (unique origin) because as we add rich scripts, we don't want them to be able to influence other publications
... that then influences how we want their storage to go (both temporary and persistent storage)
... things like cookies, local storage, things we can do today - there's the simple case of local storage, but now span that to a collection of books (e.g., collection of Harry Potter that you'd want ot have info available across the collection)
... whatever we do has to abide to the security model of the web. In our design, we need to ensure we are designing around the security model fo the web.
... some of these questions, esp. as we think about collections, become interested and complex to fit into the security model of the web

tzviya: what we're tlaking about today is that we have ten files; you are talking about files of files

duga: please don't make us rely on previous chapter's counters to render current chapters. Don't want to have to render the whole book just to know this is list item #74
... it's a lot of work.

tzviya: if I am a publisher, if I have 3k footnotes in a book, please dno't make me manually number them

duga: if you spread 500 footnotes per file, just note at the start of each file that it starts at 500/1000/etc

tzviya: my author may change the footnote numbers the day before publication

Kennethe: why footnote numbers? why not identifiers?

tzviya: that would be lovely, but that's not how the scholarly publishing community functions (and we can't change that, as much as we'd like to)

ivan: this is market reality
... we can't say "change how you've done thing for centuries"

dauwhe: there are also human usability things here. If I'm reading Moby Dick in print, I get a visceral sense of where I am in the ocean of text
... as humans, we like having guideposts to things

<astearns> Areference number in printed material will be useful much longer than a URL will be, given current standards of web archivability

dauwhe: our reading systems have created analogs to these things (e.g., character counts, progress bars)

<astearns> so the conservative academic community might be on to something, at least for now

dauwhe: having a one dimensional indication of progress through a thing is of value
... what the best way to do that in an Internet environment is, not sure
... the system digesting the content is going to have to do some figuring out of how to do this

leonardr: yes and no. You cannot realistically consume an entire large book. There are cases where the current state of reading is not over the entire thing; it might just be the current piece you are viewing
... maybe you started in the middle of the content.
... Current models of navigation is evolving.

dauwhe: just saying there has to be a model, but saying thta it's hard for computers is a cop out.
... it will have to take into the entire environment

leonardr: strongly disagree

glazou: listening to everything so far, long ago we wanted to have multi views, now we want multi doc, single view.
... we have to look at what remains constant. What is constant is the browser context and the viewpoint.
... it could be a way to preserve data across the rendering of the documents, because it is about the rendering
... the browsing context and the view port has the concept of scrolling, which takes into account the "size" of the book. This can be rendered in different ways
... it's worth investigating if we can glue something here to solve the collection problem, the CSS persistency problem
... earlier I heard need for search, manipulation, materializing into a document instance a collection. That's probably an HTML document, but the content is still TBD

Kennethe: if you want to use the manifest, it would probably make sense to have an entry called the TOC, and it refers to the doc that has an HTML element - where you start

tzviya: this is what we have now in the EPUB space
... any other thoughts from the silent observers in the room?

*crickets*

*brief rabble rousing moment*

<leonardr> https://w3c.github.io/manifest/

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.143 (CVS log)
$Date: 2016/09/21 15:15:31 $