Timed Text Working Group Teleconference

Meeting minutes

This meeting

Nigel: Today, we have a topic Gary requested, about handling live delivery of TTML.
… We also have 2 issues on TTML2, which perhaps we can make progress on.
… I have kept the IMSC HRM issue about spans on the agenda in case there is anything to discuss.

Pierre: On the HRM thing, I haven't made much progress but I think we should take 10 minutes to talk about strategy.
… How do we propagate HRM changes through IMSC 1.0.1, 1.1 and 1.2?
… Rather than going through the issues themselves.

Nigel: Ok, good idea
… Then we have an IMSC Test issue/pull request.
… In AOB we have TPAC 2021.
… Any other points to discuss or make sure we cover?

group: [none]

How live delivery is handled in TTML/IMSC

Nigel: This was asked by Gary - perhaps we should work out if there is a quick answer or if we need a longer session.

Gary: This comes from the unbounded cue discussion last week.
… Partly because of your mention Nigel, one of the issues is how unbounded cues work with live captioning.
… There is segmented captioning happening with WebVTT and HLS, also done for live. I have an understanding of how that works,
… but not how TTML and IMSC does live. I suspect it might be similar.
… Figured it would be similar, but wanted a sense of it so that if there are any specific
… problems then we don't repeat the same issues.

Nigel: The first thing to note with TTML is that it's an application layer on top of TTML
… You probably need constructs for streaming delivery around TTML, provided by other things
… I know of 4 wrappers around TTML that do this: MPEG-2, MP4 DASH/HLS streaming, RTP, and the EBU-TT Live extensions
… They all provide some sort of time windowing of the TTML document, and they send a sequence of TTML documents
… The only one that doesn't use a wrapper is EBU-TT Live
… The wrapper defines a time window with a beginning and a duration that signals: from this point time onwards, this single TTML document is active
… As a consequence of that, any previously active TTML document is no longer active
… The last piece of the puzzle is to align the timelines between the TTML payload content and the external timing
… There are established ways to do that, defined in the wrapper
… There could be an external timeline, e.g., an epoch such as 1 Jan 1970 and times are relative to that. Or each TTML document starts at time zero relative to when document playback begins
… The question then is how to do it for live? Live captioning is captioning of an audio stream that also has a separate mechansim for encoding, packaging, and distribution
… The requirement, from an audience perspective, is to get the captions in a way that's aligned with the audio. They could be a little bit late, or co-timed with server side delays
… We generally don't have completely unbounded stuff. It is not possible to issue documents for encoding, packaging, distribution until you get to the end of the time it applies to
… You could have a single subtitle that begins at some time, with no end time. It would have no end or dur attribute in the document
… The application semantics would say that the document stops being active and a new document becomes active
… If the TTML document appears unbounded in that case, the application applies a bound

Pierre: They way modern packaging and streaming formats work, the playlist is at a higher level than the TTML document. The playlist sets the bounds

Gary: That's helpful. That aligns with what I figured, which also applies to segmented WebVTT
… A question would be: do unbounded cues makes sense for TTML too?

Nigel: If you want a semantic that says within the document there's no end time for an element, it can already be done. Simply omitting the end time does that

Gary: The end time is defined by how it's delivered to the application. A cue with no end time or duration would be forcefully bounded by the media segment it's embedded in, e.g., MP4

Nigel: Yes
… The last document stays active until you activate another one. In a segmented MP4 context (DASH), it's generally predefined when segment ends / segment durations will be

Pierre: I've not seen a requirement for unbounded cues yet

Gary: We'll continue with WebVTT on the use cases and deriving requirements. If it make sense for live captioning to have unbounded cues in webvtt, we could maybe also talk about application in TTML
… It's still early. Not clear that having unbounded cues is a requirement we want to proceed with

Nigel: Are we asking the wrong question? The conversation about bounded/unbounded cues starts from an assumption that a cue is a semantic object in its own right that a user can interact with
… In the schemes I've talked about, there's not a requirement to semantically identify a single piece of content as it changes over time
… Instead, the focus is on delivering the right presentation by delivering the documents
… If a client wants to do some analysis to identify duplicates (for example), it's up to the application
… Having one thing that gets updated is a different semantic model

Pierre: People talk about subtitle cues, there's a good mapping with pop-on captions, e.g., on a DVD. It breaks down when you talk about progressive subtitling, where words appear additively, paint-on
… Or where lines appear in the same region. In those scenarios the concept of a subtitle doesn't work at all
… There's no such thing as "a subtitle". The TTML model is text flowed in a region

Gary: I think that for the document that Chris started we're trying to separate high level use cases (e.g., live captioning) from requirements - e.g., create unbounded cue so we can deliver earlier, for example
… So we want the right use cases. The individual cues are less important than knowing that we're capturing spoken word

Nigel: Have we answered the question?

Gary: It does for me

Chris: The question I have is around other kinds of metadata. In WebVTT I think it's possible
… that you can annotate chapter points, or denote segmentation of the content.
… In that case if you're starting a new chapter, which says "this news segment just started"
… and it starts now and we don't know the end time,
… is there any equivalent model in TTML for that kind of use case.

Pierre: Yes, absolutely. It's possible in TTML to specify that an element has an undefined end time.

Chris: And then it becomes application specific how to interpret that.

Pierre: What to do with it. The interpretation in TTML is pretty unambiguous.
… Then do you leave it undefined or clip the presentation to some value.

Nigel: There's nothing to stop you adding your own metadata to an element, e.g., to indicate which chapter you're in
… That segmentation applies to content rather than there being some other "thing" that has start or end time signalled

Pierre: As I understand it, WebVTT came from SRT, which came from DVD subtitles. That's a really specific use case, subtitles for translation. It's all pop-on

Gary: It has other mechanisms

Pierre: People say "a cue" or "a subtitle", a model that only works if it's pop-on subtitles or captions

Nigel: Any other questions?

Chris: No

Shear calculations and origin of coordinate system. w3c/ttml2#1199

Glenn: Status update on what I've been doing.
… We recently finalised our implementation of line shear and block shear (tts:shear)
… in the TTPE package. It's checked into a branch right now, possibly merged into the main branch.
… We were able to verify the correct origin and orientation of the axes for both line shear and shear
… in all of the writing modes in combination with different default paragraph level bidi levels.
… That looks good.
… One of the things we wanted to do was to resolve an issue Nigel had brought up
… regarding processing of tts:shear semantics because in order to compute the adjustment
… to the inline progression dimension (ipd) for doing line breaking, it is necessary to know
… the value of the block progression dimension (bpd) that will be used for that adjustment.
… The value of bpd may depend on having performed line breaking, so there is a potential
… recursive process to resolve what the value of the bpd might be.
… However after analysing the TTML specification semantics we realised that
… bpd on a block area such as a paragraph is always defined in the sense that it has an initial value
… which is auto, and at the present time, auto is defined such that it maps to 100%,
… which means that the container area bpd in which the paragraph will be fitted constrains the
… maximum value of the bpd, and in fact fixes it, because in all cases we can map
… that back up to some region which is definite in its height and width and therefore bounded.
… The long and short of it is that bpd = auto = 100% = bpd of the container area constrained by region size.
… It can be no larger than the bpd of the region in which that p is placed.
… The default semantics for doing shear calculation of the ipd can be determined ahead
… of time when bpd = auto.
… If bpd is set to some other value, e.g. an explicit length, or minContent, maxContent or fitContent,
… which are defined in TTML2 but not used in IMSC, then other processes can be used to
… determine the value of BPD and therefore plugged into the shear calculation to get the adjustment to ipd.
… We were able to verify that and check that into our codebase,
… and have entered it into the implementation report as having been implemented.
… We added an expectation file in ttml2-tests for the TTPE output, so we
… view that as having been resolved.
… The next step is to update the spec as necessary.
… Cyril has mentioned a couple:
… Change sin theta to tan theta.
… Add information about the origin and orientation of the axes for the purpose of performing the skew transformation.
… I've started work on creating that update.
… I plan to generate some SVG visuals that can go into the spec that
… show the origin and axis for the different writing mode combinations wrt the paragraph directionality.
… I expect that in the next few weeks.
… We're trying to get all of the TTML2 tests implemented and checked into TTPE so that we have
… resolved any issues in the tests and that will allow us to at least have one implementation
… of every test that is listed in the implementation report.
… Right now there are 3 tests left for us to complete, which should take 2-4 weeks approximately.

Nigel: Thank you Glenn. Any questions?

SUMMARY: @skynavga Glenn to continue working on specification pull request.

Clarify if the first ISD must/may be constructed when empty w3c/ttml2#1232

github: https://github.com/w3c/ttml2/issues/1232

Glenn: I added a comment to the PR

Comment

Glenn: pointing out that there is already text in the TTML element that makes the equivalence between
… active document interval and root temporal extent. We already have established that,
… it is just that this particular instance in this procedure should have the consistent language.
… It is not introducing anything new or different in my opinion.
… I'd like to see that move forward.

Nigel: Am I correct that you're not happy with that Pierre?

Pierre: If we are going to make that change we should rationalise the terms across the document
… and really get to the bottom of what the term root temporal extent means.
… I don't think we should make this change piecemeal.

Glenn: I think this started because the wording "active document duration" appears and it is the only place where it appears
… exactly like that. The intent here is simply to resolve that one issue.
… It is clear that's what is meant here.

Pierre: I don't think it is clear.
… The term that has been used has been duration, now we're replacing it with extent.
… I would like to know what root temporal extent means.

Glenn: That boat has sailed.

Pierre: I don't know, it's been ambiguous and we should say what it does.
… It is not defined in the document, we're trying to clarify it.

Glenn: Root temporal extent is defined as a term.

Pierre: It is a circular definition. If we're clarifying it, we should say what it means or does.

Glenn: The intent of this change is not to modify the define root temporal extent.

Pierre: It actually changes the interpretation though.
… My situation is to go back and rationalise what root temporal extent means.
… We should not make piecemeal changes.

Glenn: I find that quite interesting and wouldn't discourage anyone from undertaking such a project.
… This particular issue is not predicated on reviewing the definition of root temporal extent.
… If you think it is true I would like to see the argument.

Nigel: This has been discussed before. It would be good to explain why this procedure depends on the term
… root temporal extent and defines it, which is circular.

Pierre: The [scribe missed this - apologies]

Glenn: The root temporal extent is defined by the document processing context.

Pierre: It's never clear to me how there can be an implicit duration but no implicit begin and end.

Glenn: This goes back to the semantics of SMIL which make use of the term implicit duration in a highly technical manner.
… We have used that definition in the context of TTML.
… SMIL does not (I don't recall) define an implicit begin or end and we did not do that.
… That sounds like a new work item/requirement that is not on our docket right now.
… I think it is inappropriate to slip it into this PR - it may be an interesting question and possibly elaborate that
… more in the definition of root temporal extent. But it is clear in the current language that we have
… an equivalence statement in the specification of the tt element, so what this change proposes is simply
… to make that usage consistent within the document because we had a case in the timing
… semantics that did not define that properly.

Pierre: By the way SMIL does define implicit end and implicit begin.

Glenn: Thank you

Pierre: Do they apply here?

Glenn: That's outside the scope of this PR in my opinion.

Pierre: That's my point, if we are tweaking or capturing the original intent of root temporal extent then we have
… to get to the bottom of this.
… My interest here is that there has been confusion here about what the active duration
… of a TTML document is, if you try to render a document outside its active duration.

Glenn: Durations have a fixed usage in TTML and SMIL that is independent of the begin and end points.
… If you can resolve the begin and end then the difference is the active duration.
… I still fail to see how you can interpret the current PR as an attempt to redefine the root temporal extent,
… especially as we already have the statement that makes that equivalence.
… If the phrases are different from the intended meaning in resolve timing, then I don't know what else it could be.
… "Active time duration" sounds like a shorthand for that tt element definition.
… So this change seems to make this more consistent rather than less so.

Nigel: By the way that is my position as well.

Glenn: If you think this is redefining root temporal extent I would like to see the argument for that.
… It is not the intent, and if it were true then we would have to revisit the language in the tt element as well, which is
… not in the scope of this issue.
… I have no objection to revisiting and trying to fine tune the use of the term root temporal extent.

Nigel: Thank you, we're running out of time. Anyone else have anything to add on this?
… [no] - we need to work out a way to resolve.
… I brought this to the group to try to work out how to get to consensus on the PR.

Pierre: Maybe we're closer than you think - remove the note, and take the "i.e." out, but ultimately the
… root temporal extent is application specified.

Nigel: Thank you, please could you comment on the pull request so we can end the call?

Pierre: Happy to stay on and discuss further if you have time.

SUMMARY: Nigel, Pierre and Glenn to continue discussions.

Meeting close

Nigel: Thanks everyone, let's adjourn for today. [adjourns meeting]