EPUB 3 Working Group Telco — Minutes

Date: 2021-11-19

See also the Agenda and the IRC Log

Attendees

Present: Ivan Herman, Rick Johnson, Avneesh Singh, George Kerscher, Toshiaki Koike, Masakazu Kitahara, Dave Cramer, Matthew Chan, Romain Deltour, Aimee Ubbink, Victor Lopes, ben, Victoria Lee, Ben Schroeter, Gregorio Pellegrino, Matt Garrish, Murata Makoto, Dan Lazin, Wendy Reid, Bill Kasdorf

Regrets:

Guests:

Chair: Wendy Reid

Scribe(s): Matthew Chan

Content:


Avneesh Singh: one of the best scribes has taken up!.

Dave Cramer: any new member introductions?.

Rick Johnson: rick johnson, vitalsource.

Murata Makoto: Welcome back!.

Ivan Herman: +1.

Dave Cramer: rickj has been major player in epub for a long time, glad to have him back.

1. Proposal for base URLs to be used for URL parsing (pr epub-specs#1898)

See github pull request epub-specs#1898.

Romain Deltour: basically the intent of the PR is to clarify how URLs are passed to epub.
… basic idea is to more clearly define root URL of the OCF container, and based on that precisely say that all URLs are to be passed relative to this root.
… one addition is that we do not define the root URL of the container, we say it is implementation specific.
… we just say RS must implement it such that it has certain properties.

Murata Makoto: :-) I know that it’s difficult..

Dave Cramer: we want certain behaviours to come out of this definition, but we don’t want to control RS implementation.

Ivan Herman: implication of this PR is that the problem of relative URIs being able to leak out of the container is gone.
… this PR defines a hard stop at the top of the container.

Murata Makoto: Good point.

Romain Deltour: to be clear, this is enforced in the RS spec.
… RS must implement the root URL in such a way that whatever the root URL looks like, it is something inside the container.
… so far we say that any relative URL is valid.
… we may limit this in the next agenda item.
… re. these properties we say must be defined on the root URL, it is not certain that all RS are currently done this way, so this PR may make some existing RS non-conforming.

Dave Cramer: can we test for this?.

Romain Deltour: I have created a test epub that uses js to display the URL of a content document, and based on this we may be able to infer the RS’s root URL for the container.
… based on this naive test (done on Readium, iBooks, and ADE), ibooks and Readium do not implement the root URL internally the way we have it defined in the PR.
… not yet applied this test to other RS.
… to answer the question, no, it’s not really testable.
… we can only try to infer it.

Ivan Herman: I think we do the right thing if we show there is a discrepancy between RS, that’s why we’re here, that’s what CR is for.
… also, a warning for the wg, if we get a green light to merge today, there will be some follow up issues raised afterwards, mostly editorial issues.
… these may require further discussion, but they are mostly minor and editorial.

Romain Deltour: yes, i agree.

Ivan Herman: so don’t be alarmed if you see follow up issues, it is intentional.

Dave Cramer: agree. Better to split the issue into manageable pieces.

Romain Deltour: in this PR we use one way to characterize the root URL, there may be something more minimal that would result in more RS being conforming.
… but we haven’t heard for that many RS folks.
… the benefit of this PR is that it clarifies a lot of things. We may need to lose some of this precision later..

Ivan Herman: we had one RS contribute in the PR.

Dave Cramer: any other comments?.

Proposed resolution: Merge #1898. (Dave Cramer)

Ivan Herman: +1.

Gregorio Pellegrino: +1.

Romain Deltour: +1.

Matthew Chan: +1.

Toshiaki Koike: +1.

Dan Lazin: +1.

Murata Makoto: +1.

Masakazu Kitahara: +1.

Dave Cramer: +1.

Murata Makoto: +i.

Ben Schroeter: +1.

Rick Johnson: +1.

Resolution #1: Merge #1898.

Ivan Herman: we should thank Romain for this. This situation has existed for a long time in the spec before Romain proposed a solution.

Dave Cramer: thank you Romain.

2. Allow characters in the emoji tag sequence in file names (pr epub-specs#1899)

See github pull request epub-specs#1899.

Dave Cramer: historically epub has been focused on interop and we’ve had some limits on characters allowed in file names.
… makes epubs portable across different OSes.
… as part of i18n review we’ve gotten a lot of feedback about allowing authors to use their own languages in file names.
… this has exposed more edge cases, e.g. allowing unicode chars in the emoji tag sequence.
… a sequence of unicode characters that renders out as the icon of a flag.
… this PR allows this, while still excluding some even more problematic characters.
… tested this with a file named as the Welsh flag, and ADE wasn’t happy with the result.

Matt Garrish: wasn’t just emoji characters, it was languages with variation selectors (e.g. mongolian script).
… not sure how often authors use this, but if its possible to author it i suppose we should allow it.
… after consulting with i18n group, we’ve identified 2 chars that are deprecated and which we still exclude.

Dan Lazin: seems like the primary concern here is that we want to support this, but we’re wary that its not supported today and we don’t want to give authors bad advice.
… physical readers either can’t or in practice don’t get updates, its not practical for an author to make an ebook using these emojis in file names.
… but then there’s the issue of these characters displaying in the stores.
… even if they work in RS.
… recommend that we make MUST statement in RS spec, SHOULD NOT statement in core, with the possibility that we change to MAY in core in future.

Murata Makoto: We might want to have a look at https://unicode.org/reports/tr51/#EmojiVersions.

Ivan Herman: how would we test this?.

Dan Lazin: in practice i think RS will support UTF-8 or not, but expecting that UTF-8 support will exist throughout the store ecosystem and legacy readers is hard.
… we can test, but we might not get 100%.

Matt Garrish: are we restricting this because these file names might not work in certain North American stores? Can’t the stores themselves decide their own policies for what they will allow?.

Ben Schroeter: +1 to Matt.

Dave Cramer: agree, let’s let unicode be unicode.

Romain Deltour: there are quite a few standards/api that define file names, some only exist as editor drafts or cg documents.
… the web generally lacks a unified model of a file system, but that means we don;.
… don’t have precedent on the web to rely on.
… and most of these APIs don’t restrict file names a lot.
… just say path specific characters can’t be part of the file name.
… to be safe, until we have a unified file system model for the web, we should loosen the restriction by changing the MUST NOT to SHOULD NOT, at least in the authoring system spec.

Romain Deltour: for the record, some of the standards I looked at:

Rick Johnson: we seem to be saying this is a supply chain issue, can we pass this over to the business group? Meanwhile we let unicode be unicode.

Avneesh Singh: after getting such nice feedback from i18n, I think this is a sign that we should not be restrictive here. Maybe a note that these characters are now allowed, but that some RS may not support it. At least for this revision..

Matt Garrish: i’m almost positive that there’s a note about zip tools that authors should stay within the ASCII range.
… to avneeshsingh’s point, perhaps we could generalize this note.

Murata Makoto: mgarrish where is this note you just referred to?.

Matt Garrish: it’s bottom of 6.1.3.

Dave Cramer: i think we should merge the PR. This part is uncontroversial. It satisfies i18n and keeps with our philosophy.
… do we need an additional note about the supply chain?.

Matt Garrish: or can we just expand the existing note we were talking about just now?.

Dan Lazin: i think we need a note that says caution when using unicode characters.
… if you’re distributing to only one store and you know that your store supports it, then go ahead, otherwise stay away.

Murata Makoto: i don’t like that. It discourages non-ASCII characters.

Dan Lazin: i want to encourage the use, just not sure it is safe to do so today.

Murata Makoto: i’ve heard that argument for 20 years, haha. That argument endangers the use of non-ASCII characters.

Matt Garrish: if we change the restriction from MUST NOT to SHOULD NOT, would that work MURATA?.

Murata Makoto: this issue is about emoji characters, if we start talking about non-ASCII we may be opening a can of worms.

Matt Garrish: i think the issue is just about what is allowed in file naming, and how restrictive the spec should be.

Dave Cramer: The current note says “Some commercial ZIP tools do not support the full Unicode range for File Names. EPUB Creators who want to use ZIP tools that have these restrictions may find it best to restrict their File Names to the [US-ASCII] range.”.

Dave Cramer: can we re-write above to avoid referring to US ASCII?.

Murata Makoto: this issue is about emoji only, so why are we writing a note about non-ASCII.

Proposed resolution: merge #1899. (Dave Cramer)

Dave Cramer: +1.

Ivan Herman: +1.

Dan Lazin: +1.

Romain Deltour: +1.

Matthew Chan: +1.

Matt Garrish: +1.

Avneesh Singh: +1.

Wendy Reid: +1.

Ben Schroeter: +1.

Victoria Lee: +1.

Rick Johnson: +1.

Dan Lazin: #1899 just changes “don’t use wide range of unicode” to “don’t use the two deprecated characters”.
… not controversial, I don’t think.

Murata Makoto: +1.

Resolution #2: merge #1899.

Dave Cramer: mgarrish do you want to try to reword that note a little?.

Matt Garrish: we’ll open an issue about this note?.

Dave Cramer: yes, please.

3. should we disallow “out-of-container” relative URLs? (issue epub-specs#1912)

See github issue epub-specs#1912.

Dave Cramer: there is a related problem of ../ repeated until it leaks out of the epub.
… should we just disallow such URLs?.
… i think yes.

Ivan Herman: isn’t it correct that we don’t need to do anything about this problem because what we just merged avoids any sort of security issue.
… this sounds like adding a “good practice” to the text, but its an extra measure to guard against RS that don’t follow what we merged in #1898.

Romain Deltour: this would add an authoring conformance requirement for URLs.
… such leaky URLs will likely create interop issues with legacy/non-conforming RS. Indeed, the result of #1898 is that these two are identical:

<item href="../../../../EPUB/content.xhtml"/>
<item href="EPUB/content.xhtml"/>

Romain Deltour: but to avoid conforming but non-interop friendly epubs, we just ban such URLs.
… #1898 makes the two URLs above equivalent, but i’m doubtful that in practice all RS will handle this properly.

Matt Garrish: an epub2 RS must still be able to open epub3. And because of that it makes sense to keep things consistent from an authoring perspective.

Rick Johnson: seems like this obligates a change in epubcheck? How does that change happen?.

Romain Deltour: epubcheck will be updated to implement epub 33, don’t worry.

Rick Johnson: thank you!

Dave Cramer: alignment of epub 33 spec and epubcheck will be smooth, thanks to mgarrish and romain.

Ivan Herman: there is already an alpha version of epubcheck for epub spec 33.

Romain Deltour: See Epubcheck alpha release for 3.3.

Dave Cramer: i think we just disallow out of container URLs.

Romain Deltour: in the issue i propose an algorithm to identify what is an out of container URL string.
… please comment in issue if you spot error there.

Proposed resolution: disallow “out-of-container” relative URLs. (Dave Cramer)

Dave Cramer: +1.

Rick Johnson: +1.

Ben Schroeter: +1.

Matt Garrish: +1.

Dan Lazin: +1.

Victoria Lee: +1.

Matthew Chan: +1.

Ivan Herman: +1.

Romain Deltour: +1.

Masakazu Kitahara: +1.

Bill Kasdorf: +1.

Toshiaki Koike: +1.

Wendy Reid: +1.

Murata Makoto: +1.

Resolution #3: disallow “out-of-container” relative URLs.

4. How to handle deprecation during CR.

Dave Cramer: i propose we push this to a different meeting given time restriction.
… it’s a complex topic.

Wendy Reid: can we outline the issue and give people time to think during the adjournment?.
… as part of our CR prep, we have to have a plan for what to do when we discover features that are problematic.
… once we go through the testing process we will likely discover some of these.
… it’s up to us what to do about these, but generally you deprecate these.
… however, we have an obligation to backwards compatibility.
… so what is our plan for these at risk features?.
… we’ve said “deprecate” in the past, but HTML spec uses various flavors of “obsolete”.

Dave Cramer: See HTML’s definition of “obsolete”.

Wendy Reid: we’re thinking of a similar model - deprecated features that might still have some life in them vs deprecated features that are pretty much done.

Ivan Herman: usually, in other WGs, if a feature doesn’t have the required number of implementations, that feature is just cut from spec.
… we are saying we can’t do that because of our backwards compatibility in the charter, and we have to convince the director of an alternative.
… re. vocab, personally my preference would be to just adopt the HTML spec’s vocab and definition.
… the content in the epub is HTML anyway, so being compatible seems to make a lot of sense.
… that’s something that we’d have to review in detail. It might just be an editorial change, maybe more..
… also, we have to think about how that maps to report from epubcheck.

Wendy Reid: okay, we can discuss this in the next meeting.

Dave Cramer: okay, thank you all.

Murata Makoto: Bye now!.

Ivan Herman: happy thanksgiving to all us friends!.


5. Resolutions