Publishing Working Group Telco — Minutes
Date: 2019-04-15
See also the Agenda and the IRC Log
Attendees
Present: Ivan Herman, Tzviya Siegman, Deborah Kaplan, Mateus Teixeira, George Kerscher, Dave Cramer, Wendy Reid, Matt Garrish, Gregorio Pellegrino, Ben Schroeter, Laurent Le Meur, Avneesh Singh, Marisa DeMeglio, Joshua Pyle, Franco Alvarado, Brady Duga, Bill Kasdorf, Tim Cole, Garth Conboy, Jun Gamou, Ric Wright
Regrets: Luc Audrain, Benjamin Young, Romain Deltour
Guests:
Chair: Wendy Reid
Scribe(s): Dave Cramer
Content:
Wendy Reid: let’s get started
Wendy Reid: https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2019/2019-04-08-pwg
Wendy Reid: last week’s minutes
… any comments?
… minutes approved
Resolution #1: last week’s minutes approved
1. publishing CG
Wendy Reid: we now have a Publishing Community Group!
… chaired by Mateus and Jeff Xu
… doing incubation
… please join
Wendy Reid: we’re going to talk about audio again
2. audiobook issues
2.1. duration
Wendy Reid: Issue: https://github.com/w3c/wpub/issues/420
Wendy Reid: Open Pull Request: https://github.com/w3c/wpub/pull/421
Wendy Reid: should duration be required?
… duration will become part of the core spec
Ivan Herman: there are 2 issues
… one is, if we take duration for one resource, the question that came up was what is the format for the value?
… one possibility would be to use the ISO 8601 Value, which is used by schema
… or we could use RFC 7826
… which is used in the media world
… the majority seem to favor the RFC value, as it’s more readable than ISO
… we had an issue #307 a while ago where that was decided, but it wasn’t clean in the doc
… we can reinforce this decision
… this is one part of it
Wendy Reid: I’ve tried to talk to danbri about this, no response yet
… the RFC value fits better with what we want to do, especially if we also want to reference media fragments
… can we merge the PR and close this issue?
Ivan Herman: do we want to make a new resolution, or the already decided one?
Wendy Reid: let’s stick with the NPT?
Geoff Jukes: I’m confused about the intent
… specifying the duration of the resource
… it’s the file, effectively
… that duration is only specified in seconds
… never anything else
… my concern is that putting in media fragments at the resource level doesn’t make sense
… if the intent is to conform to schema.org, and we should just use ISO
… so why NPT instead of using a double?
… I don’t know why it’s a consideration
Wendy Reid: media fragments will be a thing, although maybe more in TOC etc than in resources
Geoff Jukes: that’s not describing a resource, but metadata
Wendy Reid: we don’t want two different formats for these things
Geoff Jukes: I disagree
… and I’m having trouble with these very long discussions
… I think of a media fragment as a different thing than a resource
Deborah Kaplan: geoffjukes: +1 for calling out our confusing conversations as confusing. Thanks.
Ivan Herman: the NPT format is defined in a way that it can have only a number, which is seconds
… the author may choose to use raw seconds
Tzviya Siegman: i missed last week. is it possible to summarize the discussion?
Ivan Herman: in a way we jumped ahead
… we have make a choice between ISO and RFC
… then during the discussion a third option became possible, just taking the number of seconds
… those are the three options
… so the question is which of the three?
Geoff Jukes: in addition to that, what is the desire to conform to schema? Is that a design principle?
Ivan Herman: we want the contents of the manifest to be accessible to the knowledge graph
… it’s mostly important for bibliographic metadata
Geoff Jukes: the desire to conform to schema is high, so we can obtain cross-vendor parsing capability
… is that correct?
Ivan Herman: yes
Laurent Le Meur: here we are speaking on duration of resource, not duration of audiobook. It’s not a property of a book.
… so it’s not tied to a need to express audiobook metadata for schema
… so we could use seconds, with a name other than duration (like runtime or length)
… and the audiobook industry would be happy with that
Geoff Jukes: I’d be happy with a new thing called length or whatever, that’s just a double
… it’s what we already do
Ivan Herman: to be clear I am just a messenger
… whatever the group decides is fine
Tim Cole: the decision could be made, that in this community we would use duration but constrain the value of seconds
… i think this is OK
… it could be enforced via a context document
… we could also define our own property, and connect it to duration
… there are ways to express constrained versions of other properties
Ivan Herman: I don’t think that works
… schema uses the ISO format, and it doesn’t allow a simple number
… a number can be a subset of RFC, but not of ISO
Wendy Reid: the reason we were leaning on RFC it has only two ways to express time, including only seconds
Ivan Herman: I would propose to move on
… we define that property to have a value being a float consisting of the number of seconds
… with a new term like length
Proposed resolution: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource. (Wendy Reid)
Ivan Herman: +1
Laurent Le Meur: +1
Dave Cramer: +1
Franco Alvarado: +1
Tim Cole: +1
Geoff Jukes: +1
Joshua Pyle: +1
Marisa DeMeglio: 0
Avneesh Singh: +-0
Avneesh Singh: no strong opinion :)
… waiting for feedback from media sync people
Wendy Reid: does this impact sync media
Marisa DeMeglio: I don’t think so… this is just properties of resources
Ben Schroeter: +1
Brady Duga: Abstain (don’t plan to use the value)
Marisa DeMeglio: this issue doesn’t need to get more complicated
Resolution #2: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource.
Ivan Herman: the other issue that came up is more controversial
… there may be a notion of duration of the whole audiobook
… it turned out that having that as book-level metadata is something that implementors may ignore
… they may deduce that from the individual resources
… but it may be helpful as a hint to the user, as a value in the catalog etc
… what I did, mostly to generate discussion, was to
… put a global property there, with the same format
… primarily defined for a user interface
… do we need this, or should we remove it from the PR doc?
Laurent Le Meur: I would say that the audiobook schema.org object supports the duration with ISO 8601
… it’s there and it’s optional, and it is what we want
… it’s descriptive metadata
… we could just adopt this and move on
Ivan Herman: +1 to laurent
Geoff Jukes: +1 to laurent
Wendy Reid: simple descriptive metadata
Proposed resolution: Schema.org’s Duration will be a required metadata descriptor for audiobooks (Wendy Reid)
Ivan Herman: +1
Ben Schroeter: +1
Laurent Le Meur: I thought the idea was to keep it optional, as in schema.org
Proposed resolution: Schema.org’s Duration will be a recommended metadata descriptor for audiobooks (Wendy Reid)
Laurent Le Meur: and it’s a ‘duration’ property (of type ‘Duration’)
Ivan Herman: +1
Laurent Le Meur: +1
Marisa DeMeglio: is this a different property?
Ivan Herman: yes
Marisa DeMeglio: -1
Deborah Kaplan: +1
Tim Cole: +1
Joshua Pyle: +1
Wendy Reid: this is schema.org duration descriptor for audio book, the length of the entire work, the sum of all the parts
Laurent Le Meur: see https://schema.org/Audiobook
Laurent Le Meur: and https://schema.org/duration
Geoff Jukes: it’s not the same concept
… it might be the sum of all resources, but it might be different, for example if there’s non-book audio resources
Ivan Herman: +1 to geoffjukes
Geoff Jukes: +1
Geoff Jukes: so it’s ok to have a different name and format, and it’s good for it to be in schema.org so it’s universally digestable
Wendy Reid: it would be called duration, it would be the total length of the book, provided by the publisher
Deborah Kaplan: are we voting on making this required?
Ivan Herman: there is a mess-up
… there are 2 things here
… one is, what is the global descriptive metadata, and what value it takes
… and the only resolution we are proposing is to use duration with ISO as in schema.org
… and then there’s the question of whether this metadata item is required
Geoff Jukes: I would happy for it to be required
… we have to send it to our publishers/distributors
Wendy Reid: when I said required I meant for the audiobook profile
Laurent Le Meur: q for geoffjukes. Why is it required?
Geoff Jukes: when we send ONIX we include runtime
… and they like to cross-reference to check they make sure they got the right book
… if it’s not required we’ll supply it anyway
Tzviya Siegman: +1 to limited metadata!
Dave Cramer: The web platform requires very little metadata, we should require the important things (title, author), this does not seem like required metadata
… I suggest we make it optional
Ivan Herman: +1 to dauwhe
Laurent Le Meur: +1 to dauwhe
Wendy Reid: for an audiobook it’s almost as important as title
… for a user to understand what they’re getting into
… to find out if it’s abridged or unabridged
… or if my phone will keep it
… I think it should be required
Ivan Herman: there is no requirement to provide metadata for the number of book pages
… but the same argument applies, ish
… I agree it is recommended
… but “must” is too far
Tzviya Siegman: I hate to prolong this discussion
… when we were deciding on EPUB metadata, lots of people said that title should be required
… but then you get into lots of nuance with what titles means, but most systems don’t pay attention
… we should look into how systems work with information about length
… and how this will play out in the real world
… maybe the implementors can tell us more about this information is used
Dave Cramer: It strikes me as many of the arguments for the utility of the information is about file size not chronological duration, this information can be useful, but requiring them is not traditionally how the web works
… we run into issues of validation
… are we then going to get to a point where validators takes the values and compares them
… requiring this is complicated
Laurent Le Meur: I agree on principle we shouldn’t require descriptive metadata
… and we should keep properties required for user agent functioning or content identification
… so we should recommend this, underlining all the advantages of using this
Brady Duga: when considering required metadata, we should ask if it’s impossible to create a book without this metadata.
… if it’s not impossible, we shouldn’t require it
Wendy Reid: I’m OK with recommended, even though y’all are completely wrong :)
Bill Kasdorf: vendors can still require it
Ivan Herman: we have 2 resolutions to take
… we never closed the previous resolution
Proposed resolution: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional. (Ivan Herman)
Ivan Herman: +1
Tim Cole: +1
Laurent Le Meur: +1
Bill Kasdorf: +1
Deborah Kaplan: +1
Geoff Jukes: +1
Ben Schroeter: +1
Dave Cramer: +1
Brady Duga: +1
Garth Conboy: +1
George Kerscher: +1
Resolution #3: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional.
Joshua Pyle: +1
Proposed resolution: Schema.org duration value is recommended metadata for the audiobooks profile (Wendy Reid)
Ivan Herman: +1
Laurent Le Meur: +1
Marisa DeMeglio: +1
Tim Cole: +1
Bill Kasdorf: +1
Ben Schroeter: +1
Deborah Kaplan: +1
Resolution #4: Schema.org duration value is _recommended_ metadata for the audiobooks profile
Wendy Reid: can we move on and never speak of this again?
Ivan Herman: no
… I will make the edits according to the resolutions, is it OK to then merge?
Wendy Reid: +1
Ivan Herman: and then the PR and the issue can be closed then?
everyone: YES
2.2. file hashes
Wendy Reid: https://github.com/w3c/wpub/issues/398
Laurent Le Meur: we just need a name for the resource level property …
Wendy Reid: the issue is around file hashes, so content creators can provide identifiable hashes to individual resources
… the proposal is to use SRI
Ivan Herman: what term should we use
… this is not in schema, so we need to pick a term
Dave Cramer: Garth brought up the question of requirements on reading systems, it’s a problem in RSs, EPUB has signatures but RSs don’t always understand them
… if an integrity hash is present, the UA must check it and terminate processing if it does not pass
Brady Duga: hashes are great. If you want to pretend that these have anything to do with security or integrity I object.
… they do not provide this at all.
… they do not provide security.
Laurent Le Meur: I agree with the objection about security. I think it says something about integrity.
… I’m worried that some user agents might not be able to deal with any algorithms that is expressed
… is there a closed list of algorithms?
Dave Cramer: Can someone educate me as to why the SRI spec exists?
Ivan Herman: the big difference between SRI on HTML is that there it is mainly used for the JS you bring in when you use external JS
… I can’t really answer brady’s concerns
… if I trust what I get from a URL as JS, has the same hash that I expected, then I can believe it’s the correct JS
… but it may be different for audio files
Garth Conboy: I was going to disagree with Dave. I have no objection to this, but don’t want user agents to have to deal with this.
Geoff Jukes: it’s doesn’t provide security or integrity. we use it to communicate to our apps that a file was downloaded completely.
… we just use it to detect bad downloads.
Wendy Reid: do we want to include this?
Ivan Herman: how important is this?
Geoff Jukes: our apps rely on this utterly. We deliver to cellphones. Not everyone has 5G. We have to deal with unreliable delivery. We’re OK with this in the spec and optional.
… . we will use this
Wendy Reid: this sounds like something that a distributor/reading system can handle on its own
… perhaps we ask other distributors/UAs?
Ivan Herman: isn’t that the definition of an optional thing?
… we know someone uses it.
… is it important to have a standard format?
Proposed resolution: add the optional integrity property for linked resources, using the subresource integrity format (Ivan Herman)
Wendy Reid: let’s add it as optional
Wendy Reid: +1
Garth Conboy: +1
Brady Duga: +1
Geoff Jukes: +1
Laurent Le Meur: +1
Ivan Herman: +1
Bill Kasdorf: +1
Tzviya Siegman: +1 (i think)
Joshua Pyle: +1
Tim Cole: +1
Resolution #5: add the optional integrity property for linked resources, using the subresource integrity format
Geoff Jukes: thanks everyone
3. Resolutions
- Resolution #1: last week’s minutes approved
- Resolution #2: The time length property (unnamed) will only use a float consisting of the number of seconds of the resource.
- Resolution #3: duration is a descriptive metadata for WP, whose value is the ISO format (as used in schema.org). It is optional.
- Resolution #4: Schema.org duration value is recommended metadata for the audiobooks profile
- Resolution #5: add the optional integrity property for linked resources, using the subresource integrity format