<NickRuffilo> scribenick: nickruffilo
Tzivya: A number of regrets - so
a change in the agenda. We thought we would have Josh and
Franco to talk about use cases and requirements. Franco is on
Vacation and josh sent regrets. 2nd half of the meeting will be
wendy following up on audiobook issues.
... Lets approve the minutes from the asia-friendly timezone.
as well as our meeting from 2 weeks ago
<mateus> I'll scribe if needed
<tzviya> https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2019/2019-02-08-pwg
<tzviya> https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2019/2019-02-11-pwg
Tzivya: Any comments? I hope you read the summary, but minutes approved
RESOLUTION: meeting of the 8th (asia) approved
Tzivya: any comments on the meeting from 2 weeks ago? Minutes approved
RESOLUTION: meeting of the 11th approved
<tzviya> https://github.com/w3c/wpub/issues/376
Tzivya: Next item on the agenda
is the TOC. We need to come to a resolution on this issue.
Github link posted. This is about if the TOC is directly
encoded in the manifest. We came close to a resolution but
didn't quite get there. Continued on github (and
twitter...)
... the last point of discussion in our meeting was that - with
audiobooks they don't necessarily have a TOC, but we don't make
our specification audiobook specific or exclusive. So if
serializing HTML or JSON were really very different...
... then some discussion about if the TOC is required. There's
a link in the github comments... If the TOC isn't required then
it doesn't matter if it's HTML or JSON. I believe what we're
coming around to is the TOC is required to be in HTML but it
still remains optional
<garth> Proposal: for AudioBooks, ToC is optiinal; serialized in HTML.
Tzivya: the proposal is for WP, not just Adiobooks
Garth: we can have different rules for different profiles.
Wendy: I want to state, we
discussed this last time, and I believe it was Jeff - for
audiobooks, we're having the opposite problem. They don't want
HTML. The current environment for audiobooks, no one is using
HTML. None of the readers are using HTML for audiobooks, so
introducing it could cause alot of issues.
... it's still good that different profiles will have a
different implementation, I just want to say that it shouldn't
have to be in HTML - Audiobooks have TOCs, but requiring HTML
is the issues.
Ivan: the fact that the TOC is optional is already in the document.
Tzviya: thank you for clarifying
Brady: Is JSON already used by audiobook publishers? For reading systems/listening systems?
Wendy: Blackstone and Kobo use JSON.
Garth: I was under the impression
that the proposal I put in was where we were in agreement. If
it's optional, then we don't really care ab out the
serialization - so any system ingesting audiobooks can read any
TOC. And we'd have rules as to how to serialize a TOC in
HTML...
... I am not in favor of widening it to additional
serialization
Jeff: We don't use HTML for TOC,
we use JSON - the main reason is that we don't want our
publishers to define how things are displayed. It's optional
and it's use is optional - so we don't mind it being part of
the spec. If the TOC is optional, but having it requires we use
it as such, that will not be OK.
... our applications are not epub3, they are audiobooks, and
different.
... It would require additional application development for us
to unserialize the HTML. It would also provide no value to us
or our customers, so it seems crazy to think about.
Tzviya: I will point out that one
of the things that comes up when working on specs is that we
have to make compromises. It might require changes to the tool
chains and how they operate. What we're hearing is that google
creates audiobooks in one way and blackstone another. One will
have to change.
... The value to creating the table of contents in HTML is that
it would be useful in more than 1 implementation, not just
audiobooks. All types of publications would have to allow for
that possibility.
... instead of one publisher saying "i have to adjust my tool
chain" all might have to think about the options. One
possibility is we make a change that affects only audiobooks -
but we're trying to make things generic if we can.
<mateus> +1 duga
<bigbluehat> +1 to HTML has more (i.e. i18n, a11ly, etc)--and that's good
Brady: This has nothing to do with existing tool chain - we can parse HTML. If we adopt JSON it's trivial. My note is that HTML is a better representation because it can do things like ruby which allows for more languages
Laurent: the reading system would ahve to handle both - just to clarify, not the publishers.
Tzviya: we do have to make a decision. Consensus is difficult, but compromise is key
<garth> +1 Ivan
Ivan: What we can try to do as a pay forward, in general, for web publication and the various profiles. We do not define a TOC in JSON. Some profiles may do something specific to their profile. The Audiobook profile could say: "for audiobooks, it's possible to do it in JSON or HTML. It's up to the publishers which way to go." What I do not want to see is that Audiobooks say: 'you must not do it in HTML, you must do it in JSON' That will really [CUT]
Wendy: I agree with Ivan. I understand why HTML is important. It would be interesting to see audiobooks adopt rich-text experience. In terms of adoption as the industry stands today would be better with JSON. A JSON TOC within the manifest, then an HTML document with a rich version of the TOC.
<laudrain> +1 to wendyreid
Wendy: I know giving publishers options can create issues, but in the very least - if the publisher wants to create a rich-text verison, they can.
Ivan: What this means is that in
the audio profile, you'll have to spec out not only the format,
we'll have to define if the publisher has two types of TOCs,
which one wins? For the general web publication, what we have
is what we have.
... We'll have to spec this out for audiobooks only, and what
it means for something to be audiobook and not audiobook.
Garth: I don't think of the HTML as a rich-text format, as I don't see it as something that is displayed. It's meant to be machine readable. What I'd be proposing in the lines of compromise. In Audiobook land, the TOC can be JSON or HTML. If there are both, the HTML version wins.
<Zakim> bigbluehat, you wanted to ask if metadata on LinkedResource is sufficent https://w3c.github.io/wpub/#linkedResource
<Avneesh> +1 Garth, html should win
Benjamin: Looking at linked resource, this is more an ask for wendy and Jeff, but is that sufficient for the TOC you're producing now? Or are you needing a TOC that is more complex and doesn't map to the reading order?
Wendy: Right now, the average audiobook we recieve has a very basic TOC. Chapter 1 is this, and X long; chapter 2, etc... Sometimes we don't even get chapters. Having a detailed TOC for the audio industry might be a push...
Benjamin: So that could be ranges of parts of files, right?
Wendy: Yes, it could point to timings within files.
<tzviya> scribenick: mateus
<garth> Proposal: For AudioBooks ToC may be provided as serialized in either HTML or JSON, if both are present, only the HTML will be processed by the Reading System.
NickRuffilo: Could we have two
different designations for a TOC? Machine readable &
visual? JSON could be machine-readable, for example.
... And, forgot my other point.
<NickRuffilo> scribenick: nickruffilo
Geoff: We don't use the JSON to
define parts of resources - we don't include timestamps. Our
dataset is very minimal. It's covered by the resource duration,
the name of the file, the label - although we deal primarily in
english...
... From a business-to-business perspective, I don't have
issues passing HTML as long as it is strictly defined - as the
JSON is. My assumption is that when we tell users they can
produce HTML, they will send anything.
<ivan> Current structure for TOC in HTML https://w3c.github.io/wpub/#app-toc-structure
Geoff: if it is strictly defined, then I don't have a problem with it. For our existing applications, it's of no use, but we'd translate that from B2B to something our apps use. As long as it's strictly defined, I need it machine parsable.
<geoffjukes> I think I just got booted :(
Garth: Ivan just pasted in a relevant link for Geoff. The TOC we have spec'd in WP for HTML serialization. There are rules about pulling out anchors, chapter names, and it's purely machine processible. Either HTML or JSON, if it comes in the package with the audiobook, it's a huge improvement.
<garth> Proposal: For AudioBooks ToC may be provided as serialized in either HTML or JSON, if both are present, only the HTML will be processed by the Reading System.
<geoffjukes> yes, I'm back in - I heard "spreadsheet" and now I'm worried
Garth: It's a big improvement
than just a collection of files with an external TOC. Either of
these - I think we'd be in a pretty good shape with this.
... It is designed to be machine-processable TOC, not
necessarily a visual representation
Tzviya: In summary, our spec has very strict rules with HTML.
Garth: The spreadsheet is the worst of all possible worlds, but it does exist with some large publishers.
Brady: We have experience using
machine readable TOCs. We dont' have a huge issue with
publishers going to town with the HTML. We convert it to
something else. It can be converted to JSON and sent to
clients.
... I wanted to add: I would rather have an ascii text based
format than 2 formats. I dont' want to have JSON and HTML. I
just want one. I prefer the HTML.
Tzviya: Geoff, we very much want
to include the existing toolchain. I want to make it clear what
our intent is with HTML. It's a restriction on what is allowed
in HTML - and it is very limited. I'm hoping we can come around
to some agreement.
... We want this to work for the audiobook retailers that exist
today. To comment on Nick's proposal for rendered vs machine
readable. That existed in epub2 - but we fought to get rid of
it, so we aren't going back. Nav was created to address
that...
<garth> Proposal: For AudioBooks ToC may be provided as serialized in either HTML or JSON, if both are present, only the HTML will be processed by the Reading System.
Tzviya: Lets just review what Garth proposed, that way we can continue to review that and come to resolution on this.
Avneesh: All the good things about HTML has already been said. The 2nd part - B2B and B2C - two use cases. Are we targeting B2B? It may shape up the specification in a different way. Our focus seems B2C
Geoff: We deal with studios
sending us raw audio, publishers sending us produced audio, and
a consumer website. We send quite a bit of audio to other
publishers as well - kobo, etc. When I'm looking at this, I'm
trying to find one system to rule them all...
... The end-user scenario is different, but the more that we
can bake in, as early as possible, makes sense to me. If we can
get publishers/partners to produce things early... We lose data
as part of the process, because publishers do not send us
standard data...
... publishers don't send things with the same file naming
conventions, etc. The more data we can standardize at the
beginning, the better. I'm still new and trying to get caught
up.
... As I can see it right now, an HTML TOC provides some
options that we would use for our enhanced audiobooks, and it
would be a natural lead-in. HTML does not seem like a good way
to include the extended information we'd like to get from
publishers, like hashes or filenames...
... that we can confirm the data we have is in the right order,
etc. I'm a fan of JSON because it's strictly structured, there
are strict definitions for resources...
Ivan: I would propose, at this
point Geoff - on the one hand it's extremely important to have
your on board with what we do. On the other hand, it's clear
that there are lots of things that need to be caught up on.
Some of the remarks are solvable with what we have now...
... I would be happy if we had a separate call - you, wendy, I,
matt, etc. To look at some of the technical details to see
where your problems are and if what we're proposing is
answering your concerns. In a sense help you to catch up on 1
1/2 years of WG work.
<Avneesh> +1 to Ivan
Ivan: I think it would be important to have it down. I don't want to leave you behind or make a resolution while this is still missing.
Tzviya: Wendy has been working with the audiobook publishers to try to get a meeting, but it's difficult to get on the agenda. I like Ivan's suggestion. A year or so ago we had a meeting with newer members. Not sure how many new members we have, but maybe it's time.
<geoffjukes> +1
<geoffjukes> me alone :)
<laurent_> +1
<laurent_> oh, -1
Tzviya: Maybe we'll talk about
that at the chair's meeting...
... I agree with Ivan, that we should not have a resolution, as
we have incomplete information.
Wendy: Some of this is complicated by the TOC situation. We have 6 open issues that are tagged as audio issues.
<wendyreid> https://github.com/w3c/wpub/issues/322
Wendy: There was a discussion
about the reading order - there is no restriction as to what
can be in the reading order. We were wondering if in
audiobooks, since it has a usecase of being mainly audio, but
sometimes it contains supplimental info (pictures, charts,
etc)...
... would that be considered a resource or part of the reading
order? It's innovative/wishful thinking. Based on the
discussion. The likely best resolution is that it should be
considered as not-part-of-the-reading-order
Ivan: I don't fully understand as I was not part of the discussion. Why do we need to specify? Why isn't it something that the author can supply/note?
Wendy: It's a decision we have to
make. I'm pro having it in the reading order, but the issue
could come down to the reading order. The User-agent could end
up not showing it. That is another option we have to
consider.
...: It might be easier to say: "these are all resources"
Benjamin: I don't like sub-typing. I prefer to keep things consistent - so readers with varying skills can play and handle only what they know. If you have a screen, it can show you what you want, otherwise not. As a spec, we already accomodate it
Laurent: I would like to have
authors coming back to developers - they say: "your system
doesn't work as it doesn't show HTML when playing audiobooks"
We have to set expectations. There is no expectation that a
classic audiobook will render HTML.
... I understand that some special reading agents would be able
to read something else. Standard one would not. We should have
expectation that it SHOULD be audio files, and it's expected
that a reading system can handle that, and everything else
should be in resources. We can let the JSON schema be open to
any resource...
... but expectations should be clearly set.
<garth> q
Marisa: I don't have a strong opinion if audiobooks should include other resources, but is this issue about having no playback model for non audio files?
Wendy: There is no support, at least not for now...
Marisa: A resource in the middle of playing audio - you don't know how long you should keep it up on the field...
Garth: There is a class of
audiobooks today - that come with a PDF or bunch of PDFs that
are ancellary material. There's not expectation for them to
render along with the content, or that a reading system is to
show it, but the content is to be bundled...
... so a user's action is to look at the supplimental content.
We need to have the ability to have those types of audiobooks
to be produced, but we should allow suplimentary material
Ivan: I'm trying to combine those
issues with others that are around - there may be other
connections. If we say that an audiobook is defined that
everything in the reading order must be an audio file. Which is
fine - it's a sensible definition for a well-existing
market...
... if we do that, nevertheless, there is a need for books that
have a mixture of the various media for a bunch of other
reasons. They are web publications, but they are not
audiobooks. Audio still has items open about items in the
manifest - read by, etc.. All those properties should not be in
the audiobook profile...
<wendyreid> https://github.com/w3c/wpub/issues/351ack ivan
Ivan: they renamed general properties that are relevant to audio. If we are strictly audiobooks, then everything related to describing the audio items, they need to be generally available, because they could describe items generally available.
<Zakim> tzviya, you wanted to ask what happens today with extra resources
Tzviya: Based on Garth and some email descriptions and from HC - I'm a bit confused about suplimental materials. If the method is a URL that the publisher provides, how does the user see it? Is it HTML? I don't know what Google does, but I know it's a huge problem.
Wendy: Publishers are providing a PDF file - I've yet to see anything but PDFs. Right now, if a customer asks for it, we can send it, but we don't have a good way to deliver it. It's a very awkward system if you're using a reading systm or an app. There's no way to surface a URL.
Tzviya: If we have a method of conveying HTML to a user - we have a solution for ToC
Garth: I agree with what wendy
said, on our initial implementation - a user contacted us and
we sent the PDF. Right now our listening system surfaces in the
player "open supplimental" and we deliver the PDF behind the
scenes so the client can click to open...
... it works very well. What we need to do - capability wise -
is that we need to allow the supplimental PDF to tag along with
the content. If the reading system has way to deal with it...
Publishers need to be able to package the material and it's up
to
... the reading system to implement
<mateus> NickRuffilo: I agree with a lot of what's been said. I think if we go with a web publication with audio and other stuff interspersed, versus just audio... If we give the user the option to download the supplements and letting them know... I'm failing to see why that would be a bad thing.
<scribe> scribenick: nickruffilo
<geoffjukes> phone issues again... :(
Tzviya: It sounds like with the existing workflows - this isn't something that exists today. Maybe we should hold off on spec'ing out. It's a problem, but maybe it's something we defer until we understand the scope of the issue.
Wendy: To clarify, this issue is about - every discussion the audiobook group talks about, it should include supplimental content. Knowing that not all reading systems will render it. So the question is if the supplimental content should be in the TOC or resource only
<wendyreid> https://github.com/w3c/wpub/issues/351
<garth> +1 Wendy
Wendy: so if we use the type specifically, the audiobook TYPE would only see audio files in the TOC>
<Zakim> bigbluehat, you wanted to point out that Atom and RSS don't have this limitation and distribute massive amounts of audio + html (+ whatever) resources successfully
Benjamin: I'm not sure what the
destinction. Why split the hair? If you have the ability to
skip a file that you didn't understand. If you stuck in a
resource that it doesn't understand. It either fails, or sees
what is next and keeps going - possibly with mention that it
had to skip.
... RSS and ATOM feeds that distribute podcasts have a very
similar datamodel. Those can mix and do often mix. You'll have
a text blog, media blog, etc. The podcast consumer will only
process the audio ones. A richer RSS reader will process
all...
<Zakim> garth, you wanted to asnwer that we must support the ability of a publish to iclude such supplemental materials with their packaged audiobook.
Benjamin: We shouldn't limit the resources, but include resources and information that gives implementors what to do with the files, but we should try not to limit ourselves and paint ourselves into a corner.
Garth: I agree with what Wendy said before - we have to allow for supplimental materials. If they are in the reading order or other resources. I lean towards other resources, but it's just an opinion. To harken back to early epub discussion, it can always just skip resources.
Avneesh: This has some TOC dependency. Reading order and TOC are different items - they may or may not have overlap. It's possible the default reading order keeps just audiofiles and the HTML TOC has all
<geoffjukes> Apologies - I had to leave. Audio quality was atrocious. I'll just catch up with the scribe
George: I get that novels are
traditionally what is done today, but we should open to a wider
range of materials. Magazines - yes there's a reading order,
but more likely I want to go to article 6 etc. A book of poetry
- human narration of poetry is much better than machine read
poetry...
... I don't want to read poetry book cover to cover, but to
navigate to the poem, bookmark, etc. If we ignore this market
of different types of publishing, then it gets relegated to
someone figuring out how to put together a bunch of
podcasts.
<mateus> +1 George... I've even seen requests for audio-textbooks
George: so we want to come up with something that does all these types of publications.
+1 to maeus with the audio-textbooks
Wendy: we'll talk about this next time we have audio issues. Tzviya, anything else?
Tzviya: Hopefully we'll do
implementations of use cases in the document - so bring your
implementation hats. If you can take a look at the use cases in
the use case document and think about that for next week
... and a discussion with new members, we'll set that up
shortly
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/Still awaiting a summary/I hope you read the summary/ Succeeded: s/we have a solution/we have a solution for ToC/ Present: tzviya simon_collinson wendyreid laudrain NickRuffilo bigbluehat gpellegrino Avneesh romain mateus ivan marisa BenSchroeter Rachel CharlesL duga Garth Regrets: matt billk dkaplan3 Josh vlad franco dave dauwhe Found ScribeNick: nickruffilo Found ScribeNick: mateus Found ScribeNick: nickruffilo Found ScribeNick: nickruffilo Inferring Scribes: nickruffilo, mateus Scribes: nickruffilo, mateus ScribeNicks: nickruffilo, mateus Agenda: https://lists.w3.org/Archives/Public/public-publ-wg/2019Feb/0013.html WARNING: Could not parse date. Unknown month name "02": 2019-02-25 Format should be like "Date: 31 Jan 2004" WARNING: No date found! Assuming today. (Hint: Specify the W3C IRC log URL, and the date will be determined from that.) Or specify the date like this: <dbooth> Date: 12 Sep 2002 People with action items: WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]