W3C

– DRAFT –
EPUB 3 Locators Task Force

09 June 2021

Attendees

Present
avneeshsingh, dauwhe, dlazin, ivan, Laurent, laurent__, pilarw, pilarw2000
Regrets
tzviya
Chair
dauwhe
Scribe
dauwhe

Meeting minutes

dlazin: we're working on what might be reasonable to solve
… we appear to be heading towards a defined fall-back for page numbers
… a way to ensure pagelist is always present, even if not defined by the author
… we do want to encourage a pagelist in all EPUBs
… but have something to fall back to that is easy to implement
… Laurent and Hadrien had good comments about page numbers via email
… we don't think CFI is a bad idea, but we think it's a solution for a different set of problems
… these problems are around traditional page numbers

avneeshsingh: CFI is also a problem for future for HTML EPUB
… these are quite broad use cases
… did you decide to put some of them out of scope

dlazin: there's a spreadsheet for the use cases
… the gdoc is out of date

<dlazin> https://docs.google.com/spreadsheets/d/1KO-HyLGUUw36F-ruAARHNiPO1aUJCCNeTv3zxGtjuHw/edit#gid=0

dlazin: in column I is whether things are in scope
… those are the page number things

pilarw2000: dan, were you there last week?
… I missed it
… I did not see in the minutes: Mary and I had been in an indexing conf
… we asked about terms about what to call page numbers in ebooks that don't have page numbers
… one suggestion was "pointer"
… the thing a reader would click on to get to content
… the thing in the index that might look like a section/page/para number that you click
… "pointer" is a good word for that
… we'd talked about translating a CFI into something readable; that would be a pointer

dlazin: don't forget that word :)
… anything that we name has to be worldwide
… users are familiar with pages
… the concept is functionally a page
… if we were to give it some other name, that might lead to irrelevancy
… the email said, I'm in favor of calling these things pages
… and encouraging reading systems to call their auto-numbering "screens"
… pages don't change as you change font size, but the screens do move

pilarw2000: would we call them screen numbers?

dlazin: we would do page numbers, RSs would do screen

laurent__: First, we cannot ignore the Kindle
… which has locations
… but it's not simple
… in Readium we call it position
… we didn't want to call it page
… I would never speak about screens
… screens is what you see, it's what's in the viewport
… no two experiences are the same
… to find something transportable, screens is not good
… we could keep the name page but there is confusion with pagelist
… we have to choose a name that is adequate

dlazin: we don't bother with the queues here :)
… to clarify, what we're trying to introduce is not dependant on viewport or screen size
… the problem is that reading systems already have a variable thing, and they won't drop those
… we do want to call the fixed thing pages
… for one thing, if there's a pagelist we'll use that
… and we want the same functionality
… and we're recogniing there's an existing variable thing
… I do believe screen is a bad name
… becaause a11y

laurent__: RMSDK has been using position with a basic calculation
… size of zipped resource in bytes, divided by 1024
… because they are using zipped html resource
… it has no real semantic meaning
… but it is easy to calculate
… we are wondering about using zipped vs unzipped HTML resource
… if we want to keep with RMSDK and the reading toolkits
… this is simple and adequate--zip divided by 1024

dlazin: brady said Google does something similar, unicode code points divided by 1000
… once we have a recommendation, we should ask a RS to prototype a few things

laurent__: we have tried with uncompressed by 2500 but is about the same as compressed / 1024
… you won't have interop until we agree
… if we try we can take everything

dlazin: I think we agree it's arbitrary
… and we're trying to set a standard

ivan: Laurent, could your team and google team write down their algo.
… so we could compare
… I don't know the result of this task force
… to have something documented as a first step is good
… I don't know what other reading systems do

laurent__: apple recalculates every time the font size or viewport changes

ivan: it depends on screen size, font size, etc
… that's not what we're looking for

laurent__: exactly

dlazin: if you provide a pagelist, Apple will use your pagelist and the screen number; you used to be able to toggle
… it will have both variable and fixed

ivan: who else can we contact? Kobo?

dauwhe: it may even depend on which Kobo implementation

dlazin: I can't get info from Google for a bit, as Brady and Garth are out

laurent__: yes, we can write down our heuristic algo

ivan: perfect

ivan: what about bluefire

laurent__: old bluefire is RMSDK, new bluefire is Readium

ivan: can someone ask about Japanese reading systems on Thursday night?

laurent__: I have to leave now for a readium call

dlazin: the time for this is arbitrary, we're open to moving it

ivan: the epub call is 4pm CET / 10AM ET
… it's simpler for me if that slot is taken by EPUB

dlazin: we are proposing to move to Friday 10AM ET

ivan: the first would be June 25

dlazin: we'll propose it

pilarw2000: are we still doing wednesday nights?

ivan: yes, but laurent and I can't participate

dlazin: we won't move those

ivan: we'll try and see what others say

dlazin: with dave and ivan here it's a good time to talk about strategy
… what should we be producing
… and how do we get adopted by reading systems?
… let me lead the discussion
… maybe it's an experimental thing where we need adoption
… it's an attempt not a spec

ivan: there is an intermediate thing
… if I become administrative
… a WG can publish WG notes
… about any subject
… sometimes it could be a design that is not final
… in this case my option would be that we have a doc published as w3c note on locator issue in general
… describing the problems and solutions
… and talk about the role of EPUB CFI
… and republish CFI with a note
… and the same note would document this algo
… this would not be a w3c recommendation
… would raise more interest; more people would look at it than a CG doc

pilarw2000: that's what Wendy has said

dauwhe: this sounds like incubation, and we could sell this

avneeshsingh: is the ultimate objective the algo?

pilarw2000: are we?

dlazin: I think so? It's a part of what we're providing. It's like an appendix.
… we are striving for an agreement that all RSs should have a fallback
… we need to decide about backward compatibility
… should we address all the existing epubs in the world
… I don't think we need to
… I think it's acceptable to be an initiative for the future
… it would be OK for us to handle backwards compat

ivan: what do. you mean by backward compatibilty
… today, if author provides a page list, it's clear what the RS should do
… if an EPUB does not have a pagelist, then the RS may do something but there is no incopatibility

dlazin: do we need to support all existing EPUBs?
… is this in authoring, ingestion, or reading system?

ivan: no author will calculate the bytes in the xip
… from my perspective, ingestion and RS is identical

dlazin: I think RS makes the most sense
… you could do it in authoring
… it could be every 1k character
… and indesign could implement it

avneeshsingh: as far as algo is concerned
… why do we worry about this?
… if there is an algo, it could be used anywhere

dlazin: some are only practical in RS

avneeshsingh: there are complexities in vertical writing
… how would that work?

pilarw2000: indexing is done and embedding in doc. that's part of authoring
… I'm not sure I'd agree it's RS-dependent

dlazin: if you have an index in a book, and the index header is "carrot"
… 12, 38, 44
… indexer needs to put links in

pilarw2000: it's flipped around
… I've embedded the index entry in the text
… and then some process results in the pointer
… the page number comes after

ivan: this means that whatever algo we come up wiht
… should be oblivous to whether it is done in RS or in authoring tool
… so shouldn't be length of zip file
… could be number of unicode characters

dlazin: there is another alternative
… that's hard, though, for reasons avneesh explains

ivan: I don't think avneeshsingh is right here
… unicode is there; how they are displayed is a different question

avneeshsingh: it is different
… may be a small number of chars between pages

dlazin: you need page numbers for xref or index
… so we could say
… the pagelist is primary, algo is fallback
… you can only do indexing and xrefs with a pagelist
… that lets algo be deferrred to the RS

pilarw2000: you're also talking about TOC, endnotes

dlazin: TOC doesn't need page numbers
… more important for index

ivan: I'm not sure I follow
… the term authoring is too broad

pilarw2000: I'm working with the book right before it's final

ivan: the content is already there

dlazin: if Pilar finds a book without a pagelist, I'd expect her to request a new version with a pagelist or make one herself
… there could be a fallback at authoring time

ivan: that's the algo doable for both author and RS

dlazin: if Pilar is inserting page numbers, she can use arbitrary numbers

ivan: if indexing is not there, and RS still does it
… then you solve the use cases around shared locatio
… if the algo covers severaal use cases that's important

pilarw2000: I don't generate the page numbers
… the publisher creates the page numbers for all the use cases
… InDesign is great for that

dlazin: I still don't agree
… there's no problem if we use the same algo but I don't think it's necesary
… one case is a compiled ebook we don't touch, reading system generates positions
… the other case is where the page numbers are written into the files

ivan: is it better if it's the same algo

dlazin: the algo is an implementation of the goals. It is an appendix.
… we derive the algo from our goals and requirements
… we want it to be useable at both authoring time and RS time
… and needs to work for all languages
… but doesn't need to work identically

avneeshsingh: should be consistent within a language

pilarw2000: as we go from spanish to english

avneeshsingh: if you have ten books in that language it should work consistently

dlazin: from strategy perspective
… we need to remember is to reach out to adobe and other authoring systems
… this will be vastly more useful
… the best case is for every epub to have a pagelist
… and for that we need ID and scrivener and Word

ivan: yeah
… I have a script that produces EPUBs of W3C docs
… I start with HTML
… for me to generate a pagelist doesn't work
… I'd need a script

dlazin: that's not the primary case

pilarw2000: when I'm indexing, instead of using page numbers I have distinct text for every hit under a concept
… every instance has a unique chunk of text

ivan: yes

pilarw2000: we do need to tell people that's what you should do

dlazin: next time, we should start working on the note
… we have things to write down

ivan: is that all?

avneeshsingh: thanks for helping me catch up

dlazin: thanks for coming!

Minutes manually created (not a transcript), formatted by scribe.perl version 136 (Thu May 27 13:50:24 2021 UTC).

Diagnostics

Succeeded: s/2???/2500

Succeeded: s/pagelist/positions