Publishing Working Group Telco — Minutes

Date: 2019-04-15

See also the Agenda and the IRC Log

Attendees

Present: Ivan Herman, Tzviya Siegman, Deborah Kaplan, Mateus Teixeira, George Kerscher, Dave Cramer, Wendy Reid, Matt Garrish, Gregorio Pellegrino, Ben Schroeter, Laurent Le Meur, Avneesh Singh, Marisa DeMeglio, Joshua Pyle, Franco Alvarado, Brady Duga, Bill Kasdorf, Tim Cole, Garth Conboy, Jun Gamou, Ric Wright

Regrets: Luc Audrain, Benjamin Young, Romain Deltour

Guests:

Chair: Wendy Reid

Scribe(s): Dave Cramer

Content:


Wendy Reid: let’s get started

Wendy Reid: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2019/2019-04-08-pwg

Wendy Reid: last week’s minutes
… any comments?
… minutes approved

Resolution #1: last week’s minutes approved

1. publishing CG

Wendy Reid: we now have a Publishing Community Group!
… chaired by Mateus and Jeff Xu
… doing incubation
… please join

Wendy Reid: we’re going to talk about audio again

2. audiobook issues

2.1. duration

Wendy Reid: Issue: https://github.com/w3c/wpub/issues/420

Wendy Reid: Open Pull Request: https://github.com/w3c/wpub/pull/421

Wendy Reid: should duration be required?
… duration will become part of the core spec

Ivan Herman: there are 2 issues
… one is, if we take duration for one resource, the question that came up was what is the format for the value?
… one possibility would be to use the ISO 8601 Value, which is used by schema
… or we could use RFC 7826
… which is used in the media world
… the majority seem to favor the RFC value, as it’s more readable than ISO
… we had an issue #307 a while ago where that was decided, but it wasn’t clean in the doc
… we can reinforce this decision
… this is one part of it

Wendy Reid: I’ve tried to talk to danbri about this, no response yet
… the RFC value fits better with what we want to do, especially if we also want to reference media fragments
… can we merge the PR and close this issue?

Ivan Herman: do we want to make a new resolution, or the already decided one?

Wendy Reid: let’s stick with the NPT?

Geoff Jukes: I’m confused about the intent
… specifying the duration of the resource
… it’s the file, effectively
… that duration is only specified in seconds
… never anything else
… my concern is that putting in media fragments at the resource level doesn’t make sense
… if the intent is to conform to schema.org, and we should just use ISO
… so why NPT instead of using a double?
… I don’t know why it’s a consideration

Wendy Reid: media fragments will be a thing, although maybe more in TOC etc than in resources

Geoff Jukes: that’s not describing a resource, but metadata

Wendy Reid: we don’t want two different formats for these things

Geoff Jukes: I disagree
… and I’m having trouble with these very long discussions
… I think of a media fragment as a different thing than a resource

Deborah Kaplan: geoffjukes: +1 for calling out our confusing conversations as confusing. Thanks.

Ivan Herman: the NPT format is defined in a way that it can have only a number, which is seconds
… the author may choose to use raw seconds

Tzviya Siegman: i missed last week. is it possible to summarize the discussion?

Ivan Herman: in a way we jumped ahead
… we have make a choice between ISO and RFC
… then during the discussion a third option became possible, just taking the number of seconds
… those are the three options
… so the question is which of the three?

Geoff Jukes: in addition to that, what is the desire to conform to schema? Is that a design principle?

Ivan Herman: we want the contents of the manifest to be accessible to the knowledge graph
… it’s mostly important for bibliographic metadata

Geoff Jukes: the desire to conform to schema is high, so we can obtain cross-vendor parsing capability
… is that correct?

Ivan Herman: yes

Laurent Le Meur: here we are speaking on duration of resource, not duration of audiobook. It’s not a property of a book.
… so it’s not tied to a need to express audiobook metadata for schema
… so we could use seconds, with a name other than duration (like runtime or length)
… and the audiobook industry would be happy with that

Geoff Jukes: I’d be happy with a new thing called length or whatever, that’s just a double
… it’s what we already do

Ivan Herman: to be clear I am just a messenger
… whatever the group decides is fine

Tim Cole: the decision could be made, that in this community we would use duration but constrain the value of seconds
… i think this is OK
… it could be enforced via a context document
… we could also define our own property, and connect it to duration
… there are ways to express constrained versions of other properties

Ivan Herman: I don’t think that works
… schema uses the ISO format, and it doesn’t allow a simple number
… a number can be a subset of RFC, but not of ISO

Wendy Reid: the reason we were leaning on RFC it has only two ways to express time, including only seconds

Ivan Herman: I would propose to move on
… we define that property to have a value being a float consisting of the number of seconds
… with a new term like length

Proposed resolution: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource. (Wendy Reid)

Ivan Herman: +1

Laurent Le Meur: +1

Dave Cramer: +1

Franco Alvarado: +1

Tim Cole: +1

Geoff Jukes: +1

Joshua Pyle: +1

Marisa DeMeglio: 0

Avneesh Singh: +-0

Avneesh Singh: no strong opinion :)
… waiting for feedback from media sync people

Wendy Reid: does this impact sync media

Marisa DeMeglio: I don’t think so… this is just properties of resources

Ben Schroeter: +1

Brady Duga: Abstain (don’t plan to use the value)

Marisa DeMeglio: this issue doesn’t need to get more complicated

Resolution #2: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource.

Ivan Herman: the other issue that came up is more controversial
… there may be a notion of duration of the whole audiobook
… it turned out that having that as book-level metadata is something that implementors may ignore
… they may deduce that from the individual resources
… but it may be helpful as a hint to the user, as a value in the catalog etc
… what I did, mostly to generate discussion, was to
… put a global property there, with the same format
… primarily defined for a user interface
… do we need this, or should we remove it from the PR doc?

Laurent Le Meur: I would say that the audiobook schema.org object supports the duration with ISO 8601
… it’s there and it’s optional, and it is what we want
… it’s descriptive metadata
… we could just adopt this and move on

Ivan Herman: +1 to laurent

Geoff Jukes: +1 to laurent

Wendy Reid: simple descriptive metadata

Proposed resolution: Schema.org’s Duration will be a required metadata descriptor for audiobooks (Wendy Reid)

Ivan Herman: +1

Ben Schroeter: +1

Laurent Le Meur: I thought the idea was to keep it optional, as in schema.org

Proposed resolution: Schema.org’s Duration will be a recommended metadata descriptor for audiobooks (Wendy Reid)

Laurent Le Meur: and it’s a ‘duration’ property (of type ‘Duration’)

Ivan Herman: +1

Laurent Le Meur: +1

Marisa DeMeglio: is this a different property?

Ivan Herman: yes

Marisa DeMeglio: -1

Deborah Kaplan: +1

Tim Cole: +1

Joshua Pyle: +1

Wendy Reid: this is schema.org duration descriptor for audio book, the length of the entire work, the sum of all the parts

Laurent Le Meur: see https://schema.org/Audiobook

Laurent Le Meur: and https://schema.org/duration

Geoff Jukes: it’s not the same concept
… it might be the sum of all resources, but it might be different, for example if there’s non-book audio resources

Ivan Herman: +1 to geoffjukes

Geoff Jukes: +1

Geoff Jukes: so it’s ok to have a different name and format, and it’s good for it to be in schema.org so it’s universally digestable

Wendy Reid: it would be called duration, it would be the total length of the book, provided by the publisher

Deborah Kaplan: are we voting on making this required?

Ivan Herman: there is a mess-up
… there are 2 things here
… one is, what is the global descriptive metadata, and what value it takes
… and the only resolution we are proposing is to use duration with ISO as in schema.org
… and then there’s the question of whether this metadata item is required

Geoff Jukes: I would happy for it to be required
… we have to send it to our publishers/distributors

Wendy Reid: when I said required I meant for the audiobook profile

Laurent Le Meur: q for geoffjukes. Why is it required?

Geoff Jukes: when we send ONIX we include runtime
… and they like to cross-reference to check they make sure they got the right book
… if it’s not required we’ll supply it anyway

Tzviya Siegman: +1 to limited metadata!

Dave Cramer: The web platform requires very little metadata, we should require the important things (title, author), this does not seem like required metadata
… I suggest we make it optional

Ivan Herman: +1 to dauwhe

Laurent Le Meur: +1 to dauwhe

Wendy Reid: for an audiobook it’s almost as important as title
… for a user to understand what they’re getting into
… to find out if it’s abridged or unabridged
… or if my phone will keep it
… I think it should be required

Ivan Herman: there is no requirement to provide metadata for the number of book pages
… but the same argument applies, ish
… I agree it is recommended
… but “must” is too far

Tzviya Siegman: I hate to prolong this discussion
… when we were deciding on EPUB metadata, lots of people said that title should be required
… but then you get into lots of nuance with what titles means, but most systems don’t pay attention
… we should look into how systems work with information about length
… and how this will play out in the real world
… maybe the implementors can tell us more about this information is used

Dave Cramer: It strikes me as many of the arguments for the utility of the information is about file size not chronological duration, this information can be useful, but requiring them is not traditionally how the web works
… we run into issues of validation
… are we then going to get to a point where validators takes the values and compares them
… requiring this is complicated

Laurent Le Meur: I agree on principle we shouldn’t require descriptive metadata
… and we should keep properties required for user agent functioning or content identification
… so we should recommend this, underlining all the advantages of using this

Brady Duga: when considering required metadata, we should ask if it’s impossible to create a book without this metadata.
… if it’s not impossible, we shouldn’t require it

Wendy Reid: I’m OK with recommended, even though y’all are completely wrong :)

Bill Kasdorf: vendors can still require it

Ivan Herman: we have 2 resolutions to take
… we never closed the previous resolution

Proposed resolution: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional. (Ivan Herman)

Ivan Herman: +1

Tim Cole: +1

Laurent Le Meur: +1

Bill Kasdorf: +1

Deborah Kaplan: +1

Geoff Jukes: +1

Ben Schroeter: +1

Dave Cramer: +1

Brady Duga: +1

Garth Conboy: +1

George Kerscher: +1

Resolution #3: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional.

Joshua Pyle: +1

Proposed resolution: Schema.org duration value is recommended metadata for the audiobooks profile (Wendy Reid)

Ivan Herman: +1

Laurent Le Meur: +1

Marisa DeMeglio: +1

Tim Cole: +1

Bill Kasdorf: +1

Ben Schroeter: +1

Deborah Kaplan: +1

Resolution #4: Schema.org duration value is _recommended_ metadata for the audiobooks profile

Wendy Reid: can we move on and never speak of this again?

Ivan Herman: no
… I will make the edits according to the resolutions, is it OK to then merge?

Wendy Reid: +1

Ivan Herman: and then the PR and the issue can be closed then?

everyone: YES

2.2. file hashes

Wendy Reid: https://github.com/w3c/wpub/issues/398

Laurent Le Meur: we just need a name for the resource level property …

Wendy Reid: the issue is around file hashes, so content creators can provide identifiable hashes to individual resources
… the proposal is to use SRI

Ivan Herman: what term should we use
… this is not in schema, so we need to pick a term

Dave Cramer: Garth brought up the question of requirements on reading systems, it’s a problem in RSs, EPUB has signatures but RSs don’t always understand them
… if an integrity hash is present, the UA must check it and terminate processing if it does not pass

Brady Duga: hashes are great. If you want to pretend that these have anything to do with security or integrity I object.
… they do not provide this at all.
… they do not provide security.

Laurent Le Meur: I agree with the objection about security. I think it says something about integrity.
… I’m worried that some user agents might not be able to deal with any algorithms that is expressed
… is there a closed list of algorithms?

Dave Cramer: Can someone educate me as to why the SRI spec exists?

Ivan Herman: the big difference between SRI on HTML is that there it is mainly used for the JS you bring in when you use external JS
… I can’t really answer brady’s concerns
… if I trust what I get from a URL as JS, has the same hash that I expected, then I can believe it’s the correct JS
… but it may be different for audio files

Garth Conboy: I was going to disagree with Dave. I have no objection to this, but don’t want user agents to have to deal with this.

Geoff Jukes: it’s doesn’t provide security or integrity. we use it to communicate to our apps that a file was downloaded completely.
… we just use it to detect bad downloads.

Wendy Reid: do we want to include this?

Ivan Herman: how important is this?

Geoff Jukes: our apps rely on this utterly. We deliver to cellphones. Not everyone has 5G. We have to deal with unreliable delivery. We’re OK with this in the spec and optional.
… . we will use this

Wendy Reid: this sounds like something that a distributor/reading system can handle on its own
… perhaps we ask other distributors/UAs?

Ivan Herman: isn’t that the definition of an optional thing?
… we know someone uses it.
… is it important to have a standard format?

Proposed resolution: add the optional integrity property for linked resources, using the subresource integrity format (Ivan Herman)

Wendy Reid: let’s add it as optional

Wendy Reid: +1

Garth Conboy: +1

Brady Duga: +1

Geoff Jukes: +1

Laurent Le Meur: +1

Ivan Herman: +1

Bill Kasdorf: +1

Tzviya Siegman: +1 (i think)

Joshua Pyle: +1

Tim Cole: +1

Resolution #5: add the optional integrity property for linked resources, using the subresource integrity format

Geoff Jukes: thanks everyone


3. Resolutions