EPUB 3 Working Group Telco — Minutes

Date: 2021-04-23

Attendees

Present: Ivan Herman, Matthew Chan, Masakazu Kitahara, Toshiaki Koike, Dave Cramer, Ben Schroeter, Wendy Reid, Matt Garrish, Brady Duga, Charles LaPierre, Deborah Kaplan, Rachel Osolen, Gregorio Pellegrino, Dan Lazin, Garth Conboy, Hadrien Gardeur, George Kerscher

Regrets: Tzviya Siegman, Avneesh Singh

Guests:

Chair: Dave Cramer

Scribe(s): Matthew Chan

Content:

1. i18n Issues
2. xpointer shorthand for fragment identifiers too restrictive?
3. TF review
4. Resolutions

1. i18n Issues

Dave Cramer: https://github.com/w3c/epub-specs/issues/1628

1.1. CSS files encoding

See github issue #1628.

See github pull request #1645.

Dave Cramer: first, encoding of CSS files
… we say CSS files must be encoded in UTF8 or UTF16
… web is leaning towards UTF8 for everything
… i18n WG suggests we use UTF8 for everything
… but some existing epubs might already use UTF16
… proposal would deprecate UTF16 CSS files
… issue is the epubcheck would be obligated to display warning if it found UTF16 CSS
… not sure how big of a thing this is
… CSS WG has recommended UTF16 in past
… i’m okay with deprecating UTF16
… the proposed phrasing is in the issue

Ivan Herman: not sure how much books currently use UTF16, but epubcheck having to warn about this means that some changes will have to be made to epubcheck
… but this isn’t too big an obstacle
… there is already a PR
… ready to go
… it keeps the current sentence but adds “but UTF16 is deprecated”
… also, the exact same thing is happening in the spec for XML char encoding
… so the PR makes the conforming change to XML too

Matt Garrish: i’m not sure how much epubcheck even looks at CSS
… we’ve talked about deprecating UTF16 before, but we opted to keep it at the time for backwards compatibility reasons

Brady Duga: are there enough positive benefits to go ahead with this?
… also, deprecation feels like it should be a statement of intent (i.e. that we will remove in future), but the truth is that we intend to keep this compatibility forever

Dave Cramer: we’re trying to work with i18n WG

Brady Duga: is it deprecated in CSS?

Dave Cramer: they say you SHOULD use UTF8

Brady Duga: i would say “you should use UTF8”, but i’m not sure about deprecating UTF16

Ivan Herman: i suspect there are tools and scripts which simply do not work with UTF16
… so we minimize the chance of people trying to use UTF16 CSS epubs with these tools

Brady Duga: i’d be surprised if there are a lot of tools out there that don’t support both encodings

Matt Garrish: we are trying to align with the web, and what the other specs are saying
… i.e. guiding people towards only using UTF8

Dave Cramer: i understand the concerns with deprecation
… we could say CSS must be either UTF8 or UTF16, but it should be UTF8

Matt Garrish: that puts us more or less back in the same place, there’s going to be an epubcheck warning

Dave Cramer: and then people also have to go look up the definition of “deprecated”, so maybe not saying that is more straightforward

Ivan Herman: for sake of argument, Rust for example, which is gaining popularity, only uses UTF8

Proposed resolution: Change the PR from ‘deprecated’ to ‘should’ and then merge (Ivan Herman)

Charles LaPierre: Brackets is also only UTF8, complains it can’t read UTF16 files.

Ivan Herman: but yes, let’s change the PR

Matt Garrish: +1

Wendy Reid: +1

Ivan Herman: +1

Charles LaPierre: +1

Matthew Chan: +1

Dave Cramer: +1

Deborah Kaplan: +1

Ben Schroeter: +1

Masakazu Kitahara: 0

Brady Duga: +1

Toshiaki Koike: +1

Gregorio Pellegrino: 0

Resolution #1: Change the PR from ‘deprecated’ to ‘should’ and then merge

1.2. Length of file and path names

See github issue #1629.

Dave Cramer: file names must not exceed 255B, path name not more than 65535B
… so what happens if these limits are exceeded?
… my initial reaction is to say if you have some massively long file name and the RS knows what to do, then I don’t think we should ask RS to fail artificially

Dan Lazin: so maybe, “file names must abide, if not RS must truncate”?

Matt Garrish: we’ve talked about this in the past
… truncation leads to having to modify all the content
… but then how we enforce?
… maybe we could point to a method of truncation that RS can use
… but at best this is note territory
… we shouldn’t specify normative stuff to address this

Dave Cramer: basically if you go outside of what the rules are, things may break
… if the file names are truncated, the RS may not know what to do, may not be able to identify the file

Charles LaPierre: as long as epubcheck throws an error for long file names, most distributors won’t accept those epubs
… most people won’t see those books

Ivan Herman: a few weeks ago we had a discussion like this about content document processing (see spec intro for content docs)
… i think we should use the same text for the package document processing
… this is all the same sort of thing, where spec says what you should do, and we’re concerned with what RS should do if content does not conform
… if we change our mind here, then we should go back and change what we did for content document processing too

Dave Cramer: can we just say that if RS encounters file name or path exceeding limit that it should try to process it?
… and if it fails then it doesn’t have to do anything else?
… i.e. no obligation to truncate, but continue to process if it can

Brady Duga: we don’t say what RS should do, so whether I try to continue to process or not, I’d be in spec
… let’s just explicitly say that RS should try

Matt Garrish: should there be a note as well that if you try truncation, then you may have to go into the content as well?
… should we note the problems with truncation?

Dave Cramer: i think we’re kind of dealing with that by advising people not to use long file names in the first place
… but no strong feelings, either way

Wendy Reid: okay, so we’re leaving in the requirement about file name size

Ivan Herman: in the issue, Fuxiao gave i18n link that says spec should not try to limit length of names

Brady Duga: good point
… if you are JP using UTF8, that’s really not a lot of characters
… maybe we should just follow whatever the zip specification says and not add extra requirements

Matt Garrish: these requirements exist to try to maintain cross-compat
… we don’t want to be creating OS-based compatibility issues
… i’m concerned about dropping these requirements

Brady Duga: i think the 255B number is based on some legacy system (e.g. Pascal)
… somewhat arbitrary to enforce this today

Dave Cramer: isn’t this a limit of Windows? Not a limit of winzip or the zip spec?

Brady Duga: Mac paths apparently have a limit of 260 char, not byte

Dave Cramer: i am fine with keeping the limit in order to make epubs more universal
… maybe just add something that says RS should still try to process document if it encounters file name or path exceeding limits

Proposed resolution: Add a note to clarify that a RS should do what it can with filenames that exceed the limit (Wendy Reid)

Ivan Herman: +1

Dave Cramer: +255

Deborah Kaplan: +1

Matthew Chan: +1

Toshiaki Koike: +1

Masakazu Kitahara: +1

Ben Schroeter: +1

Brady Duga: +1

Matt Garrish: +1

Wendy Reid: +1

Gregorio Pellegrino: +1

Charles LaPierre: +1

Matt Garrish: it will be a SHOULD, not a note. It will be normative.

Resolution #2: Add a note to clarify that a RS should do what it can with filenames that exceed the limit

1.3. Case folding of the file names

See github issue #1631.

Dave Cramer: “file names must be unique following case normalization”

Matt Garrish: i think Addison’s note (in issue) about case folding is fine
… he also asked why we have a SHOULD for normalization, but a MUST for case folding
… not sure how to deal with these

Dave Cramer: i think we can have MUST for both?
… would we have something about which algorithm we’re using?

Proposed resolution: Change requirement for case normalization to a MUST and specify which algorithm we will use (Wendy Reid)

Brady Duga: +1

Dave Cramer: +1

Wendy Reid: +1

Ivan Herman: +1

Toshiaki Koike: +1

Matt Garrish: +1

Deborah Kaplan: +1

Masakazu Kitahara: +1

Ben Schroeter: 0

Garth Conboy: +1

Resolution #3: Change requirement for case normalization to a MUST and specify which algorithm we will use

1.4. Character restrictions on the file names

See github issue #1632.

Dave Cramer: character restrictions on file names
… there were typos
… already a PR in to fix those
… i think all we have to do is resolve to accept the PR here

Matt Garrish: the typo PR was already merged, actually

Dave Cramer: as far as i know, we now ban all non-characters

Matt Garrish: yes
… Fuxiao is asking why we banned certain parts of the Arabic Presentation Forms, but not other unassigned chars in other blocks

Dave Cramer: and I answered that the Arabic Presentation Forms thing was because those were non-characters
… we haven’t said anything about unassigned character blocks

Ivan Herman: we should ping Fuxiao to ask if it is okay to close the issue now

Dave Cramer: do we need to resolve to close issue?

Ivan Herman: I think we should, because we asked i18n WG

Proposed resolution: Close issue 1632 (Wendy Reid)

Deborah Kaplan: +1

Ivan Herman: +1

Garth Conboy: +1

Wendy Reid: +1

Matt Garrish: +1

Brady Duga: +1

Matthew Chan: +1

Toshiaki Koike: +1

Masakazu Kitahara: +1

Resolution #4: Close issue 1632

Dave Cramer: +1FFFF

2. xpointer shorthand for fragment identifiers too restrictive?

See github issue #1586.

Dave Cramer: does anybody know why we referenced xpointer in the first place?

Matt Garrish: maybe they were trying to make sure it was a CFI?
… but we’ve since thought about allowing CFIs

Brady Duga: does XHTML still require NCNames?

Matt Garrish: no
… can follow up with Marisa and Daniel?
… to clarify intent here

Dave Cramer: agree

Ivan Herman: I propose we table the container constrained scripts to next time

3. TF review

Wendy Reid: FXL TF, ken put together notes on review of Daisy KB
… also started working on outline for best practices doc
… virtual locators have reviewed CFI as solution
… looking at use-cases to make documentation more clear
… CFI seems to do most of what we want, but we also need CFIs to be something more human readable

Ivan Herman: for both TF, are we heading at some features that might influence the normative specs? Or are we producing additional documents?

Wendy Reid: for FXL, we’re looking at producing a WG note, and then updating things in Daisy KB
… for virtual locators, we’re not currently making normative changes, but we’ve not taken that off the table
… maybe as simple as elevating CFI to WG level

Ivan Herman: is the FXL being just a note good enough for the a11y community compared to all the normative things we have in the normative document

Wendy Reid: none of the best practice statements are going to be normative
… we’re just reiterating best practices from existing spec (maybe with references to those spec)

Ivan Herman: still kind of unclear about FXL being on the side when we have a separate a11y spec on rec track

Matt Garrish: its like the a11y techniques document, which will be a note

Ivan Herman: that means that whatever discussion they have on FXL will not influence any statement in core a11y spec?

Matt Garrish: yes

Ivan Herman: so, why not merge this with the a11y techniques document?

Matt Garrish: i assume this would be more focused on FXL, rather than having it mixed up with general a11y best practice

Ivan Herman: concerned that this will confuse users

Matt Garrish: we can clarify in the documents

Charles LaPierre: Sending Avneesh prayers for a speedy recovery for himself and his entire family as India is getting hit hard by COVID.

4. Resolutions

Resolution #1: Change the PR from ‘deprecated’ to ‘should’ and then merge
Resolution #2: Add a note to clarify that a RS should do what it can with filenames that exceed the limit
Resolution #3: Change requirement for case normalization to a MUST and specify which algorithm we will use
Resolution #4: Close issue 1632