Timed Text Working Group Teleconference -- 16 Sep 2014

<trackbot> Date: 16 September 2014

https://www.w3.org/wiki/TimedText/geneva2014#Day_1_0900-1700

Introductions

<scribe> scribeNick: nigel

Introductions - Nigel, BBC

andreas: IRT

Cyril: Telecom ParisTech university; GPAC

elindstrom: Opera software

zcorpan: Opera software

tmichel: W3C, staff contact for the TTWG

pal: Movielabs

courtney: Apple

glenn: Representing various over the years, currently Cox, previously Samsung and Microsoft

frans_EBU: Coordinator of EBU group on subtitling

Agenda

nigel: goes through agenda on wiki page, all happy with that.
... We need to think about how we capture our output, and who will edit the note.

courtney: I'm happy to edit the note.
... I don't have a document yet, I've been working on the code first, and have some issues to tackle, and a spreadsheet for attributes.

glenn: For browser implementations mapping direct from TTML to HTML would be more efficient
... If the purpose is for direct display then this mapping would be better, but if we want to interchange to WebVTT then that translation would still be useful.

courtney: I'm interested in captions both inside and outside browser environments so I'm not focused on HTML solely.

andreas: From the mapping we have done we will quite quickly see the overlap - maybe there's a cut and paste into HTML as glenn mentioned.

pal: Re WebVTT outside browsers?

courtney: Yes, e.g. in an ISO MP4 file that is rendered in a video player.

pal: So do we need CSS in practice? To present WebVTT in subtitles and captions?

courtney: You certainly can, but it depends on how fancy you want to be. You can do basic 608 without CSS.

andreas: you need CSS to do colours, and that's certainly required in Europe.

courtney: We define for example a simple mapping from CSS to a property list. I think the better approach is to stick with CSS and
... have a way to embed it in an MP4 file track, and also in a WebVTT file.

pal: Will the mapping we do today include that?

courtney: yes

Cyril: +1

courtney: I've been thinking that one TTML file will map to a WebVTT file + a CSS file

glenn: That's what I've been thinking, and there's a reusable overlap into HTML/CSS

nigel: I've created a wiki page at https://www.w3.org/wiki/TimedText/TTMLtoWebVTT

<zcorpan> can you paste the number in irc?

Work done so far

andreas: presents work so far
... This work has been supported by the HBB4ALL project, whose target is to roll out accessibility to IP connected devices, including subtitles, signing and audio (video) description,
... with a focus on hybrid broadcast.
... This is based on EBU-TT-BasicDe as a very restricted TTML feature set.
... In fact it's a subset of EBU-TT-D which is a subset of TTML plus a couple of small extensions.
... It has a video frame with a safe area, 10% in from each edge.
... Alignment is top or bottom only, vertically.
... Horizontally, centred, left or right.
... For Germany, it's left to right, top to bottom writing direction.
... There are 8 different text foreground colours, as from WSTeletext.
... All subtitles have the same background color, font-family, font-size and line height.
... Line breaking is done manually with the element at authoring.

glenn: How is the background padding extended on either side of the text?

andreas: That's just in the example image, it's not actually present.
... How is this mapping achieved? Positioning, Styling, Timing.
... Positioning:
... [shows video frame with image of Verona]
... In TTML and EBU-TT there's a root container. In EBU-TT it's always the height and width of the video. WebVTT uses the viewport concept,
... which I understand to be the height and width of the video also.
... For the safe area, we define the tt:region, with top-left being 10% 10% in x y as specified by the origin.
... The CSS property is topleft
... The extent is 80% 80%, which in CSS is the width and height of the block level element eg the div
... To place a subtitle the region is defined once in the head and then referenced by the tt:p element. This is similar to a p in html.
... The paragraph gets the width of the region, and the height is calculated by the number of lines inside the p element.
... Vertical alignment is displayAlign: bottom or top.

nigel: Will there be CSS mappings for all of these in this presentation?

andreas: This is setting out the features to map, we should consider them in scope for our mapping later.
... I didn't use the advanced concepts in WebVTT of cue alignment, so I didn't use them. I wanted something that would certainly work in current browsers.
... In WebVTT I've put the cues in for the text. For a width of 80% the cue box has size: 80%
... The height is defined by the number of lines, just like the p element.
... This is per cue, so the settings seem to need to be repeated every time. I don't know a way to define it once and have it carried through.

courtney: If you use a region you can do that.

andreas: I didn't use a region.

courtney: Then you have to repeat it.

andreas: So that's positioning. We can define the position of the box from the left of the video frame, with 10%, using position:10% align:left
... The align setting is important. It works very differently than in TTML e.g. if you set align:middle and position:10% then the reference point for the middle isn't the cue
... start but is the middle of the cue.
... So to centre the text then you have position:50% align:middle
... For vertical alignment it's a bit trickier. To come 10% up from the bottom you can set line:90% or a line number value.
... But this doesn't align the end of the cue box, but aligns the top of the cue box. So that doesn't work.
... What you actually need is position:100% - margin - height of cue-box.
... That works if you have a lot of control over the font height and can calculate the position this way.
... In most cases that's a bit risky. So then I changed to the other possibility, to use line alignment
... The first line in the cue generates the line grid, then you can position the cue box with positive line numbers from the top
... or negative line numbers starting from -1 from the bottom.
... [example shows text one line up from bottom]
... You have to have the snap to lines flag set - this happens automatically if you use line numbers.
... For one line you can have line:-2, or for a two line subtitle, line:-3. Needs a bit of calculation.
... A dirty trick possibly is always to set it to -1 and let the renderer push it up. Possibly this is not recommended but it may work.
... Styling:
... In EBU-TT-BasicDE there's a default style defined once in the head, and a div element that references the defaultStyle.
... In WebVTT you can define a general cue selector ::cue and use almost the same property names and values.
... For font-size some calculation is needed. 60% font size in TTML comes out at 5.33% of the height of the video, which is 100% in CSS.
... A separate CSS file is needed to contain the ::cue selector.
... For inline styles in TTML we set the colour attributes on a style referenced from a span.
... In CSS you can use the pseudo-selector ::cue(c.textWhite) { color: #ffffff; background-color:rgba(0,0,0,0.7); }
... Then in the VTT c.textWhite cue class
... Timing:
... In TTML put a begin and end on, with media timeBase, reference sync is zero. In EBU-TT-BasicDE the fractional seconds are limited to 3 digits.
... This is the same for WebVTT cues.

pal: What are the rules for CSS styles when combined with locally set rules? Which takes precedence between author and user choices?

courtney: We would consider user choices to override author styles.

pal: If you're displaying it on a web page, then web styles taking over seems like not the right thing to do.

andreas: It's not clear to me how the CSS that applies to the web page interacts with the VTT cues. From testing there's no relationship.
... The video is a separate viewport with independent styling, from my testing anyway.

Cyril: I think that's not expected. I remember that the cues are sourced in the HTML page so the styles should be applied.

andreas: I tried it out in Opera.

zcorpan: The styling was implemented in presto - I'll put together a quick demo and paste the link

andreas: One important point is that we put the background color just behind the text not the box. From what I read there's no possibility
... in WebVTT to put the background only on block level elements, e.g. the whole region/p/div etc.
... It only puts the background behind each glyph. I think there's a WebVTT background box concept but it doesn't seem to apply to the block level.

glenn: So TTML allows the background to be specified on the containing block and possibly differently on the span or the p within the larger block.
... So this example (showing two spans each with its own background color) wouldn't be possible?

andreas: That's right. In Europe both possibilities are in use.
... We need to be aware of this restriction in the mapping.

<zcorpan> http://w3c-test.org/webvtt/rendering/cues-with-video/processing-model/basic.html has styling

zcorpan: This shows how a stylesheet applies to WebVTT cues - the stylesheet is in the HTML page and the cues use those styles
... There's a white video behind it.

pause for 4 minutes, back at 10:33 (CET)

<zcorpan> wrt to the positioning discussion, there are open bugs on the webvtt spec for both changing how positioning works and for adding something that allows for exact positioning. https://www.w3.org/Bugs/Public/buglist.cgi?quicksearch=webvtt%20positioning&list_id=43983

<zcorpan> https://www.w3.org/Bugs/Public/show_bug.cgi?id=25632

nigel: we're reassembling...

courtney: Here's what I've discovered from writing mapping code.
... There's an issue that we don't have an official WebVTT spec yet - we're working off drafts that aren't versioned.
... When Andreas was talking he was using browser supported features. This is causing a bit of an issue. The mapping I've been doing is off the most
... current WebVTT spec version. http://dev.w3.org/html5/webvtt/
... Here are 3 categories of issue:
... 1. TTMl is more hierarchical than WebVTT
... 2. The two specs define different properties implicitly vs explicitly.

3. The basic problem of converting units (value type conversions)

scribe: Hierarchical vs Flat:
... WebVTT has a flat structure with no nested elements. TTML provides a hierarchical structure.
... Metadata: in TTML you can nest metadata hierarchically [shows ttm:agent holmes and Dr Watson]. In WebVTT you get a list with no relationships between them.
... Proposal for WebVTT is hierarchical metadata keys

nigel: Is that just metadata or presentation issues too?

courtney: It may be less of an issue for presentation issues but there are cases where we run into a similar problem.
... Another example: Calculating relative timings hierarchically in TTML and linearly in WebVTT.

Cyril: I think some profiles restrict that.

andreas: Yes, EBU-TT-D doesn't allow nested timing.

Cyril: That raises the question which profile are we looking at?

Courtney: Yes, we can simplify the problem by specifying a profile.

glenn: It's useful, though it may take longer, to start from the general case and identify where in the absence of a profile there are issues.
... For example re timing and even styles we could define a mapping based on the sequence of Intermediate Synchronic Documents, to remove the timing issues.
... Just documenting these issues is useful.

nigel: We decided last week to use TTML1SE and WebVTT.

andreas: for styling there's some hierarchical structure in WebVTT too, by application of class nodes that are nested.

courtney: Yes you can have nested styles within a cue but if you want the same style for 10 cues you can't put them in a fragment and declare it at the fragment level.
... Implicit vs Explicit:
... Some functionality is explicitly described by attributes or parameters in one spec but implicitly derived in the other.
... For example, horizontal writing direction. In TTML there's a way to specify horizontal direction but in WebVTT there isn't (unless it's vertical) - it's inferred from the font.

glenn: tts:direction is designed to work in relation to the Unicode bidi control characters
... absent of those you can still infer directionality based on the content of the element, though it's harder with mixed content.
... So the direction attribute in TTML doesn't really say 'write right to left' but does specify the default writing direction in the absence of bidi.

courtney: WebVTT has bidi too, and rtl and ltr entities.

andreas: In Unicode the information is already there.

glenn: You have to look at the history of Unicode - people didn't want to use nestable control codes so they wanted CSS attributes to do the same thing.

<zcorpan> http://dev.w3.org/html5/webvtt/#h4_processing-model says how to determine direction

zcorpan: The horizontal direction is taken from the text in the cue, not from the font (in WebVTT)
... You can override it with unicode bidi characters if you want.

nigel: Seems like there's no issue to log in our issues list.

<zcorpan> "Apply the Unicode Bidirectional Algorithm's Paragraph Level steps to the concatenation of the values of each WebVTT Text Object in nodes, in a pre-order, depth-first traversal, excluding WebVTT Ruby Text Objects and their descendants, to determine the paragraph embedding level of the first Unicode paragraph of the cue. [BIDI]"

glenn: TTML has the CSS features as well as the plain text.

courtney: Example 2: line breaks - need to be explicit in TTML but can be just new lines in WebVTT.

Cyril: That's due to the parser - XML requires this.

andreas: Later on we can look at xml:space attributes. From the tests I've seen with xml:space="preserve" then line breaks should be preserved.

<zcorpan> XML doesn't require it really

glenn: In XSL-FO there are 4 different properties. We define an explicit mapping of xml:space to sets of those values, in TTML. We didn't expose the full XSL-FO model.

courtney: Value Type Conversions

<glenn> tnx 4 reminder

courtney: Example 1 - times
... TTML has different time expressions, WebVTT always has hh:mm:ss.sss with fractional seconds.
... Fortunately the ttp: namespace defines all the required metadata to do the conversions.
... Though I'm not sure that's the case with lengths and position values
... Again TTML allows a broader set of units - pixels, em, cells, %ages
... I'm assuming lineHeight is sort of like em. For some TTML documents I think you need the authored video dimensions to do the mapping.

pal: I think if you use %age or c you don't need the video dimensions. If you're going to use pixels then implementations should use tts:extent on the root as well.

glenn: By specifying extent on the root you can derive a pixel dimension - this doesn't tell you the pixel relationship to the video though.

andreas: An issue is that in general the root container pixel dimensions are not necessarily coincident with the video dimensions.
... The document has no way to specify this in TTML, in general.

pal: CFF-TT and EBU-TT-D relate the root container to the video. IMSC introduces an aspect ratio. All the profiles specify how the mapping goes.

andreas: For general TTML documents this is an issue.

courtney: Attribute mappings
... Some are straightforward.
... Though WebVTT IDs can be purely numeric, and xml:id doesn't allow that. So some modification or convention may be needed, e.g. "cue"+number.
... We could define the best practice.
... Both use BCP47 language values
... Preserve space needs further discussion.
... Styling attributes: colors, fonts etc are fairly straightforward.

pal: Is there a subset of CSS that's supported for WebVTT?

<zcorpan> http://dev.w3.org/html5/webvtt/#css-extensions is the subset

andreas: In WebVTT there's a subset of properties that are permitted. E.g. padding is not allowed.

courtney: One requirement set is what's needed for CEA608. It would be useful to have a standard set of CSS classes that can be used for any CEA608 translations into WebVTT.
... There are some properties with no WebVTT equivalent: display, overflow, padding, showBackground.
... For alignment, displayAlign maps to the latest version of the WebVTT spec.

andreas: I tried it out, and it would work perfectly.

courtney: But they're not widely supported yet. The mapping is nicer at least.

<zcorpan> "the properties corresponding to the 'background' shorthand" is allowed, if that is what showBackground does

zcorpan: any other properties will be ignored than those listed in the spec.
... I'm not sure how the TTML features map to those but there is a defined subset in the spec.

courtney: To expand on that, things like textDecoration in TTML you can have underline set on a cue, but for the rest of it you'd have to go to CSS to do?

zcorpan: For underline you can use CSS or the element inside a cue.

courtney: visibility and zIndex - I can't see how to do those in WebVTT.
... extent can be done with a cue box size or a region size.
... A lot of the timing in the ttp: namespace metadata doesn't map to the WebVTT because the timing that's allowed is a lot simpler.

zcorpan: visibility and zIndex is not possible in WebVTT.

nigel: can't you do visibility with opacity?

zcorpan: yes you can do visibility.

courtney: there are also the attributes "use", "value" and "type".

glenn: Those are in the profile definition mechanism - they're not content or style based.

Cyril: does this mean they don't have to be mapped?

courtney: since there are no profiles in WebVTT I guess not.

glenn: This is all part of the TTML way to specify what a processor needs to support, based on SMIL and SVG originally.
... I think it can probably be ignored but needs more thought.

andreas: If we do not find a direct mapping between WebVTT and TTML that doesn't mean that we can rule it out for the mapping
... because there's some intent in the source document and we have to check if theres something that needs to be done.

courtney: Ruby: there's no simple mapping from WebVTT to TTML for ruby.

glenn: In TTML1 you have to do the work at authoring time and use regions to place the ruby in the right place.
... I've recently specified in TTML2 the ruby markup.

Cyril: There may be several ways to define the same thing, so we should try to use a canonical representation as the mapping source.
... For example there are several ways of expressing timing - maybe a requirement before mapping is a single syntax. I'm not sure if this is possible.

courtney: it may be an interesting way to break the problem up.

Cyril: A problem I've seen before is that when attributes need to be resolved at runtime based on context, e.g. frame rate, video size etc there's not much that can be done.
... We maybe need to classify those attributes that can be mapped offline vs those that need full context to resolve.

courtney: that's my presentation.

Cyril: There's also the question of which TTML profile to use. But also there are different classes of WebVTT: valid or not? parsable or not?
... Invalid documents may be presented okay by browsers. We should say which class we're looking at.
... Then WebVTT can represent metadata, chapters, subtitles, captions etc. so we should indicate which ones we're mapping, if not all.

Logical step through

nigel: Processing model

Cyril: how does TTML handle overlapping times?

glenn: there's arbitrary overlap permitted.
... The first step I'd advocate is to create the intermediate synchronic documents and map to WebVTT.

Cyril: In WebVTT there's the concept of cues becoming active and then bumping up existing visible cues.

some discussion of how this is handled in TTML

andreas: Formally the concept of creating the ISDs makes a lot of sense - we need to make sure everyone understands what that means.

glenn: I agree. For example one thing that may not be obvious is that style inheritance is only defined on ISDs so one has to perform the ISD creation prior to style inheritance.
... I've also added a function on the TTV tool to generate the set of ISDs.

nigel: We have a choice here to map ISDs or specific bits of cue text.
... This impacts efficiency and metadata.

pal: This depends on the use case - if we just have the goal of getting equivalent presentation then efficiency and metadata are secondary concerns.

elindstrom: from a browser perspective we're interested in accurate presentation.

courtney: I've been thinking about it the opposite way - from a TTML to WebVTT conversion preserving semantics.

andreas: Would it be possible to take Courtney's attribute list and make it a structured document, take it as a header, explain the problem scenario,
... and indicate what the options and recommendations are from the WG?
... If you try to map abstractly the logical model then it's very hard. Something more concrete may be a better start.

pal: This is a question of how complicated we want to make it - I haven't heard of anyone wanting to use WebVTT as a master/archive/mezzanine format.

glenn: There's a use case for distribution though.

pal: I can see the use case of converting the TTML experience into a WebVTT experience.

glenn: Part of this may be timing oriented in the sense that user agents may potentially add TTML renderers directly, which would reduce the future needs.
... But there may still be WebVTT-only presentation devices.

pal: The issue for me is about the non-presentation-based usage of WebVTT.

elindstrom: I don't expect that to be a huge use case.

nigel: Seems like we've been considering TTML -> WebVTT here. Does the same consideration apply the other way?

courtney: WebVTT does roll-up - I'm not sure how we do that with TTML.

glenn: we may need to consider using the set element in TTML1.

pal: When you say roll-up you mean where there's an animation displayed?

glenn: yes, gradually moving up.

pal: To do that explicitly in TTML you need animation, but what is possible is to have a region that contains line A at t=0 and at t=1 line B is added, moving line A up.
... This doesn't require any animation.

glenn: Yes correct but it doesn't do the whole 608 animation.

pal: Then the question is do we need to explicitly define the roll-up animation.

glenn: Yes, we put in a note that implementation might do that.

courtney: What about paint-on?

glenn: That's no problem.
... Does WebVTT support smooth roll-up as opposed to discrete line based roll-up?

courtney: I think it does yes, I'll have to confirm.

nigel: As a general point here we can leave it open to the converter where it's left unstated in the source spec.

courtney: There's a scroll setting on the region in WebVTT that specifies this.

nigel: Is there anything else regarding processing model that may affect how we do the conversions?
... So far we have: ISDs, smooth vs discrete scrolling.
... I guess discontinuous markerMode in TTML may be non-mappable too.

glenn: I've been thinking about this too - I think it would be modelled by playing back the related media that triggers the discontinuous smpte events and recording the
... elapsed time to make a conversion from discontinuous to continuous.
... There's also the clock based timing which is also interesting! In appendix N we mapped all the timing models to a potentially continuous timeline.

nigel: I think we should exclude discontinuous marker mode and maybe clock mode too, as being non-mappable from TTML1 to WebVTT.

glenn: I think there may be some TTML2 work that can support this.

nigel: I propose to make our mapping explicitly related to TTML1 and if there's anything that helps in TTML2 we can update it later.

glenn: Or we can simply reference the ISD creation process.

<zcorpan> "If region's text track region scroll setting is 'up' and region already has one child, set region's 'transition-property' to 'top' and 'transition-duration' to '0.433s'." - smooth rollup in webvtt with scroll:up. http://dev.w3.org/html5/webvtt/#h4_processing-model

nigel: Maybe we can do both, and reference the ISD generation process and make a note that in TTML 1 the process isn't defined in a way that facilitates
... conversion to WebVTT for discontinuous and clock mode times.

courtney: If we refer to ISD conversion rather than TTML1 what's the reference document?

glenn: I'm working on this for TTML2.

courtney: Is there a draft document to refer to?

andreas: If you make the ISD concept central to the mapping it must be fully elaborated so that everyone can understand it.

glenn: I agree but I think there's no way to avoid it other than to create an alternative flavour of the same thing.
... This is the only way to solve the timing hierarchy problem.
... It also gets around the style inheritance process.

andreas: Formally I agree but it's hard to communicate the ISD - it wouldn't be a valid TTML document. So the converter wouldn't be from TTML.

glenn: We do have examples of ISDs in the TTML1 spec, which is something I'm adding in TTML2.

andreas: ISD creation is specified in TTML1 so I think we can use what's there. Is anything else needed?

glenn: Yes, the only thing absent is the specification of a serialised form. We only used ISDs as a didactic construct for explaining the formatting model.
... In TTML2 I plan to make interchange of ISDs possible in a standard way.
... It would also be useful for this exercise. Now I have an implementation already those things combine to make this progressable.

pal: For mapping can we simply assume that an ISD is a valid TTML document that happens to be static?

glenn: almost - it's not quite the same because there's some transformation, e.g. the body element is copied and reparented to the region elements that are temporally active.

courtney: My feeling is that this is just trading off one set of problems for another.

pal: I was hoping that ISD could just be used to mean 'the state of a TTML document between successive events".

Cyril: do we have a presentation on ISDs?

glenn: No, though I could do it verbally.

andreas: Maybe if it's in the TTV software we could have a look at some simple examples?
... So we don't get stuck here, can we start on attribute mappings that have to be done either way?

courtney: I'd prefer to stick with TTML rather than ISDs and defer some of these problems.

nigel: +1. Most of the problems are just about timing.

glenn: Unfortunately that's not true - there's also the problem that associates content with regions and then performing region style inheritance.
... In the ISD document the content has been associated with individual regions and then region style inheritance, and if you don't go through the ISD process then the latter breaks.

nigel: I think you can do the style computation without making the ISD.

glenn: There's a risk of duplication of effort.

courtney: I think you can map directly.

nigel: I want to defer timing issues to ISDs and do everything else directly.

glenn: To be clear I didn't mean previously that we need to serialise the ISDs

Cyril: We talked earlier about categories - we need to think about metadata etc.

pal: I've not heard those use cases.

Cyril: Can we assume that metadata-only WebVTT files are out of scope of this?

glenn: I guess the issue is searchability - if there are use cases that need searchability e.g. characters, roles, other agents, then we might need to consider that.
... If we're strictly talking about presentation than maybe we don't need to consider that.
... In WebVTT can you use metadata to define larger classes for presentation?

courtney: The only thing I've encountered along those lines is voice, which may be one example. The approach I've taken is just to map what is possible to map.
... In the document we can describe what's well defined and note what can't be supported.

andreas: I agree - we should publish something sooner and limit certain parts to a canonical representation if there are multiple ways to express the same thing.
... We can decide on a feature by feature basis what to limit, for example.

Cyril: we didnt talk about which mapping direction we're talking about.

nigel: it's both.

andreas: Additionally there are, e.g. in Germany, cases where browsers aren't used to present content, and renderers only understand TTML.
... So we need to go both ways.

nigel: Adjourns for lunch - return at 1330 CET.

<zcorpan> i will call in 14:00. then 15:00-15:30 i will be absent again

<zcorpan> correction. i will call in now but be absent between 14:00-14:30 and 15:00-15:30

trackbot, this is ttml

<trackbot> Sorry, nigel, I don't understand 'trackbot, this is ttml'. Please refer to <http://www.w3.org/2005/06/tracker/irc> for help.

trackbot, start meeting

<trackbot> Meeting: Timed Text Working Group Teleconference

<trackbot> Date: 16 September 2014

<scribe> chair: nigel

<scribe> scribeNick: nigel

Agenda

nigel: We may switch things around tomorrow due to changes to flights etc.

We will capture output at https://www.w3.org/wiki/TimedText/TTMLtoWebVTT where I enter 'wiki' in the minutes

Document Structure

courtney: Can we go through TTML elements?

Cyril: can we map the tt element to the top of a WebVTT document?

glenn: explains TTML structure down to style attributes.

Cyril: Suggests defining a style class in WebVTT corresponding to each style in TTML

glenn: yes, we can do this.

courtney: Yes. Right now the CSS document is separate, but in the future it could be embedded.

pal: Will there be feedback into WebVTT from this?

courtney: There are competing desires here - yes, in principle.

Cyril: can we go through these?

glenn: Let's keep going with structure.
... Takes group through region properties - including style attributes for origin and extent, and referential approach.
... Each region has an id. If there are no regions defined there's a default, covering 100%.

Cyril: How different is this from WebVTT regions?

courtney: WebVTT regions can not have styles, but the layout information translates pretty directly.

glenn: For example tts:opacity is a region-specific property. backgroundColor can apply to regions independently of the content in the region.
... There are a number of style properties that only apply to regions.

andreas: Can a region be compared to a div element in HTML?

glenn: yes.

andreas: So this is the only element that can be positioned absolutely within the root container.

glenn: moves to body

Cyril: Will we have an output document structure with headers and bodies, with two subsections - for styling and layout?

Courtney: yes.

Glenn: That's not a bad way to do it.

courtney: Part of this will describe the separate CSS and WebVTT document.

glenn: takes us down through body, div, p and span.
... div can contain div; p can not contain p; p can not contain div; div can not contain text.

Cyril: so p is equivalent to a cue?

courtney: seems that way.

glenn: Timing can be specified on body, p, div, p, span and br.

Cyril: cues can have nested timing in spans.

pal: is there a reason why each p can't map to a cue?

glenn: my mental model of a cue is that it is not overlapping in time with other cues. I think this makes things easier.

pal: But if we can map a p to a cue then the mapping is simpler.

courtney: What else would it map to?

glenn: Are you still assuming time has been flattened down and sliced?

pal: Yes.

glenn: So there are no overlaps. At that point content that is selected into regions is present and everything else has been filtered out.
... every piece of content is associated with a single region in TTML.

Cyril: same in WebVTT.

glenn: So the concept is to start from body, work down, and associate each piece of content with a region.
... So if there's a region we're not interested in we can filter out that content.
... So there may be multiple s all mapping into a single cue.

courtney: With WebVTT you'd define regions, and for each cue reference the region id.

glenn: That's exactly how it works in TTML but with the ability to inherit region from an ancestor.

Cyril: So you can in principle flatten the TTML structure and remove the <div>s.

glenn: You can't remove the <div>s because they specify breaks and style.

Cyril: But you could propagate down.

nigel: You can paint the background of a div so if you remove it then some information is lost.

andreas: is there a layout impact of div?

glenn: It implies a breaking boundary in the line progression direction and it may contain styling.

group: discusses slicing apart divs into multiple s each of which generates a cue.

Cyril: so if I start by resolving all the style references on a p, flattening out all the styles, then...

glenn: so you can now enumerate all the s and <div>s and assign each to a cue.

courtney: I think we should do that in the document.

glenn: Okay but you may end up with a lot of cues all with the same timing. If there's no intrinsic limitation on that then we can go ahead.

Cyril: Layout: so div affects layout?

glenn: Yes, divs can't (spatially) overlap each other within the same region.

andreas: but the only fixed dimension defined is for the region, so the height of each p and div depends on the content flowed into them.
... So there's no difference between the block level boxes that are generated by divs and ps.

Cyril: We could create artificial regions for divs that have a background color

nigel: we may have some non-mappable functionality here, if a region, a div, and a p all have different background colors.

glenn: Also if the div contains a div and both divs contain a p, and all the background colors are different, then you end up with different background paint areas

andreas: Can a div create a space that isn't occupied by a p? If a p covers only 50% of the height of the region then its parent div will just have the height of its contained s
... and not expand to the height of the region.

glenn: So it will have the same background color as the p

courtney: you can't specify an extent on a div or a p?

glenn: no that's right.

andreas: the width is defined by the region and the height by the flowed in content.

Cyril: so you can't have a div with a different background color from its child s?

glenn: That's right because we don't have a margin before or after.

nigel: I think we've just resolved that s map to cues (repeating Glenn's earlier joke)!

glenn: In TTML2 we have padding on content elements not just on region, which might impact this, but thinking about it, it should be okay because it's not margin.

courtney: What are content elements?

glenn: body, div, p, span, br.

Cyril: What if spans have timing that's shorter than their parent p?

glenn: If there's an explicit end on the span that makes its active end prior to the active end of its parent then it would depend on the fill mode - it's either freeze or remove.
... I'd have to check what we said about this, from SMIL.

andreas: in WebVTT you can have non-ended cues, that last until... when?

glenn: In TTML if there's an explicit end on the parent container and the child ends prior to that then there would be two ISDs, one
... covering the first period and the other covering the second period, and the span wouldn't be present in the second period.

nigel: +1

Cyril: so you can have a span that contains text that activates and deactivates part way through the cue.

glenn: Yes, that would be possible in TTML.

Cyril: Can we do that in WebVTT?

courtney: I don't think so - there's only styling changes part way through a cue.
... So spans with time on them - would we have to separate them into separate cues?

Cyril: I don't think that would work because they'd appear on different lines.
... You'd have to go down to the ISD level.

nigel: Can you have spans with timing?

Cyril: only to switch the text on, not off.
... So not every p is a cue, it's a bit more complicated!

glenn: If you split everything into ISDs that do not overlap then these problems can be resolved.
... We need to look more at the details and work out if there's a problem here.
... The only thing we didn't cover is animation. There's a set element in TTML1 that can also delineate ISD boundaries.
... In TTML2 we're adding continuous animation using the animate element

In TTML2 ISDs there may be some internal animation within the ISD.

andreas: it's also worth noting that every element can have metadata attached.

glenn: metadata, except for the ttm:agent attribute which can appear on any content element only, and the region, which reference agent definitions in the header,
... other metadata elements are all local not referential.

andreas: TTML also allows child elements that are not in a TTML namespace so it can be extended. A TTML processor is required to prune these out and not reject
... the document. But it doesn't have to display.

courtney: Does anyone know if we can have metadata in CSS within a style class?

andreas: you can have comments.

glenn: they're ignored in the CSS object model.

zcorpan: you can have custom properties that can be used for any purpose including metadata.

<zcorpan> http://dev.w3.org/csswg/css-variables/

nigel: Can we go through the WebVTT structure and see how that maps?

courtney: WebVTT files have a header section that starts with WEBVTT

http://dev.w3.org/html5/webvtt/

courtney: Then there can be metadata, such as language, copyright etc.

Cyril: so when you parse the file, big objects are separated by double line separators.
... Every piece of text separated by two lines is either a cue or is a comment not for display.

andreas: but comments are not defined?

<zcorpan> http://dev.w3.org/html5/webvtt/#webvtt-comments comments are defined here

Cyril: no. For example in MP4 carriage you could remove it, or put it in a previous or next segment - it won't be displayed.

courtney: In the header section you can also include region definitions.

nigel: so you can't have untimed cues?

Cyril: yes. Can you in TTML?

nigel: yes you can - they have the duration of the whole document (assuming there's no inherited time from a parent time container etc)

Cyril: this is in flux in the WebVTT standard, using keywords like 'Next' for 'until the next cue'.

glenn: during the conceptual ISD mapping process every piece of content gets timed. Ultimately the active period of the related media object will determine that time,
... in the absence of any other information.

andreas: We also have to think about multiple in TTML documents, which are allowed, but shouldn't generate multiple line breaks because they wouldn't
... be displayed in WebVTT.

Cyril: so you could define line numbering or put non-breaking spaces on otherwise empty lines. I'm not sure how the backgrounds would be painted for spaces.
... records issue on wiki

andreas: You can use empty spans on each line.

courtney: Identifiers are used - each cue can have an identifier, which would show up before the begin and end time lines.
... Also regions have ids that can be referenced in cues.

Cyril: Those cue ids come from SRT - in SRT each cue has to be a monotonically increasing number with no gaps.
... it's very common to have WebVTT files with numeric identifiers.

andreas: and the ids can have spaces in between, which isn't permitted in xml:id

courtney: so we should have a convention for mapping to TTML Ids.

nigel: Can VTT cue ids be duplicated?

courtney: no.

nigel: the reason for mentioning it is that if we do TTML ISD -> Cue then the same TTML id may resolve to multiple cues.

courtney: there's something to think about here with slicing VTT cues into time slices.

Cyrill: As long as all the spans in a p aligns with the end times of the p then you can keep it as a single cue.

nigel: that's a special case - think of live word by word subtitles.

Cyril: cues have to be laid out in start time order.
... Within a cue you can have internal timing values, that I think also have to be in increasing time order (I'm not sure about that).
... can you have TTML spans that display in reverse time order compared to the document order?

glenn: Yes, there are no constraints.

Cyril: what about in profiles?

pal: I haven't seen any profile that constrains that out.

glenn: if the TTML time container is a par (parallel) time container than a child can start after one of its preceding siblings.
... the order in the content will define the order of presentation order (spatially).

pal: IMSC 1 allows a document to be labelled progressively decodable which forbids timing on descendants of s.

courtney: So that needs to be in the document, i.e. temporal ordering within the document.

andreas: EBU-TT-D doesn't constrain this but recommends time ordering. Most legacy formats are sequentially ordered in time as well.

Cyril: even if the s were out of order in time that wouldn't be a problem, but out of order s would be a problem.

pal: But going to ISD level would avoid that.

Cyril: adds this issue to the wiki

nigel: Do we have to worry about rtl direction when sorting spans into order in WebVTT?

glenn: I would expect that when a span is active all text content of active spans are merged and then directionality is applied on the result.

courtney: let's leave the identifier mapping convention until later.

nigel: Voice spans are straightforward aren't they?

courtney: I think voice maps to agent pretty well.

nigel: +1
... What about styling based on voice cue selectors?

courtney: You could define a TTML style for each agent.
... Along those lines you can put styling directly on a span - in WebVTT I think you'd have to define CSS classes for those.

Cyril: you may not have to scan the whole document but could create a random hash for every time one is encountered.
... I'm also interested in streaming, transcoding live streams.

glenn: If it's not been converted into an ISD sequence then you can't avoid parsing the whole document (unless it's progressively decodable).
... You never know if the last markup element will be timed prior to the rest.

Cyril: WebVTT documents are always progressively decodable.
... go to example just before section 2 - this has multiple lines in the header. In this case Regions, but it could be copyright, anything else.
... So some parts of the header map to regions and others to metadata.
... continuing on document structure.
... Each cue has a timestamp for start and end, followed by optional settings.

Courtney: There are additional settings available.

Cyril: they are a combination of styling and layout.

nigel: What about at the end of the document?

Courtney: there's nothing to mark the ends of documents.

Cyril: that's a feature - you can concatenate two WebVTT files, and if the timestamps obey the time rules then it's valid.
... The second header would be ignored.

pal: what about styles?

andreas: We also need to think about error handling - processing of invalid documents.

nigel: Can we simply constrain our mapping to input documents that are valid?

Cyril: maybe not - we could consider the WebVTT to TTML mapping to do what a presentation processor would do when given an invalid document
... The behaviour is well defined.

nigel: Let's take a break until 1545...

<zcorpan> re "nigel: Can VTT cue ids be duplicated?" - yes, there is no requirement about uniqueness for cue identifiers. however region identifiers need to be unique and don't allow spaces

<zcorpan> hmm. sorry, looks like cue id requires uniqueness also. i think that changed from a few years ago

<zcorpan> looks like the spec allows a cue id to be duplicated as region id

Restarting...

Layout

andreas: We should start with the positioning of a element relative to a region.

courtney: The positioning is the piece that will map into WebVTT. There are several region attributes in TTML that can not go in WebVTT.

group: discussion of xml:lang on <region> and how it may get inherited by content elements in TTML.
... discussion of style attributes on region - which must be included?

courtney: Maybe we should go through each attribute.

<tmichel> I just joined Zakim using SIP. It works for me using code ttml#

<zcorpan> i still get "this passcode is not valid"

glenn: I have a list of style attributes that apply to region.
... there are 12 in TTML1, and of those, 9 apply only to region.
... Styles that apply both to region and other content elements are backgroundColor, display and visibility.
... the ones that apply only to region in TTML1 are displayAlign, extent, opacity, origin, overflow, padding, showBackground, writingMode and zIndex.
... Note that at least one of these will be opened up to content elements in TTML2, which is padding.
... We may also open up opacity to content elements, which would allow the definition of opacity for an element and its content as a collection.

andreas: Should we rule out the attributes that will change in TTML2?

glenn: In fact opacity and padding are extended to all content elements in TTML2.
... In both cases they aren't being removed from region, so they are still applicable to region in ttml2.

courtney: So let's start with those. I believe that only 3 map to a region in WebVTT: displayAlign, extent and origin.

andreas: And they can be mapped to properties of the region?

nigel: can't you do visibility by setting a style with opacity zero?

courtney: you can do that but only on a cue, not on a region.

nigel: So another way to say the same thing is that there's no region selector for styling?

courtney: Yes.

nigel: does the lack of zIndex imply that in WebVTT overlapping regions are prohibited?

courtney: I don't think they're prohibited.

glenn: In TTML2 on this subject we have a request for expressing z ordering for content to be able to handle 3D.

pal: That sounds similar but it's a different concept.

Loretta: I'm trying to see if the magic layout algorithm applies to region as well.
... In general there's no notion of zIndex in WebVTT.

nigel: Is there an alternative way to achieve backgroundColor on regions in WebVTT?

courtney: I don't think so, you can only do it for cues.

Cyril: adds non-mappable showBackground on region and zIndex to the wiki.

courtney: overflow is always hidden for regions too.

glenn: Can wrapping be prevented so that overflow may be relevant?
... Or what happens if you put too much content into a region i.e. too many lines?
... It sounds like extent, origin and displayAlign are currently expressible. The other 9 attributes seem to be absent.
... display seems to be only worthwhile in conjunction with animate.

nigel: It seems that the pseudo classes past and future have some relationship to animate.

andreas: Wants to note that when we finish on the TTML attributes we should go the other way round.

courtney: Let's do the non-style attributes on a region first...
... You can put timing on a region in TTML - there's no equivalent in WebVTT. attributes begin, end, dur, timeContainer

glenn: timeContainer is on regions for the processing of animate elements that are children of region.

<Loretta> Does the cue-region pseudo-element let us apply CSS styles to regions?

nigel: What's the action on that - to add it to the non-mappable list?

Cyril: why have timing on regions?

glenn: The main reason is to provide timing for background painting when no content is active, and also to specify the timing for animate elements that are children of that region.

Cyril: I'm not sure it's not mappable - you can have empty cues applied to a region, with the equivalent times of the TTML region.
... Then that would activate the region in the same way - what happens then is a later question, e.g. background painting.

glenn: Actually the timing of a region in TTML can be used to temporally clip the flow of content into that region, so it's a bit more than that.
... The question really is: do implementations use animate?

pal: I'm going to check the examples I have.
... another thing is how do you achieve dynamic positioning for text? One way is to create one region per subtitle.
... In that case you may be tempted to put the timing on the region.

Loretta: What are you trying to do here?

pal: In TTML1 there's no per-cue positioning, e.g. of each . One way to achieve that effect is to define one region per subtitle and position each region
... individually.

andreas: From the layout perspective, there's a chance that timings are put on region elements.

courtney: Shall we talk about the things that do map?
... On a WebVTT region the available settings are: width, lines, region-anchor, viewport-anchor and scroll.
... I believe that extent in TTML maps to width and lines.
... We have the dimension issues for value units, e.g. if it's in %age then it's okay but in pixels you need the size to do the unit conversion.
... I think that displayAlign and origin in TTML, in combination, map to a combination of regionAnchor and viewportAnchor in WebVTT. The two specs have
... different ways to achieve the same thing. In WebVTT you define a point within the video frame that maps to a point within the region and they don't necessarily
... have to be the same thing. Origin + displayAlign allows you to achieve the same effect.

nigel: I thought there was some freedom in WebVTT about the precise positioning, whereas in TTML there's no freedom of movement - is that right?

Loretta: I'm still wading through the WebVTT algorithm. Certainly for cues things get moved around to be as close as possible to the stated location.

nigel: Yes, I'm not sure if that applies to regions as well as cues.

Loretta: Yes, I think it may do - I'm still checking.

courtney: I think we should take that offline and research it.

andreas: I see a problem with the lines value - this defines the height of the region. A line is defined by the height of the first line of the cue, so a region does not
... always have the same height, as it depends on the first line's size. This is a hard topic to research in general, how this will resolve.

nigel: What's a concrete example of that problem?

andreas: In general the mapping from TTML to WebVTT may not be possible because for each cue selected into the same region the line height could be different,
... which will result in the region changing height.

Loretta: presumably WebVTT would expand the region to accommodate the 5 lines and TTML would clip?

glenn: That would depend on lineHeight, fontSize and overflow attributes in TTML.
... Right now we don't have an object-fitting algorithm such as in CSS.

Loretta: Is there a way of setting font-relative dimensions?

glenn: yes, they can be defined in ems or cells. Ems would be font-relative.

andreas: Why is region height important for WebVTT when no background can be drawn?

Loretta: the height is important because that determines when scrolling will start.

nigel: This seems very similar to the overflow attribute in TTML - if some lines fall out of a region, which ones should an implementation hide?

glenn: That's an implementation issue.

andreas: Can you explain the difference between the region anchor point and the viewport anchor point?

courtney: the region anchor setting defines a point that is fixed in location relative to the region, in case the region has to grow.
... the viewport anchor setting defines where in the video the region must overlap.
... It needs to be understood in relation with the display-align setting.

Loretta: right, we need two points. It's like sticking a pin through the region and in the viewport, and any changes to region size keep that point invariant.

courtney: the region viewport anchor setting has two points defined, the point within the video and the point within the region.
... Then there's an additional point that is held constant when the region is resized.

ack

nigel: I think we need to understand the region mapping algorithm from WebVTT - to origin and extent, and if that's a single value or if there are multiple values,
... which in TTML we can do using set elements on the region.
... I think we need a strawman algorithm for this mapping so that we can look at it.

andreas: I propose a gist on github for example.

courtney: I'll take it as an action item to come up with a strawman proposal.

glenn: A moment ago I thought I heard something about origin being in the centre in TTML - was that the question?

Courtney: yes, would you do that with displayAlign?

glenn: origin is always top left. You can use displayAlign to define where lines are drawn from - in which direction. Right now there's no anchor mechanism in TTML.
... Sean did come up with a change proposal, which I will have to try to dig out.

courtney: It's always top left?

glenn: yes.

nigel: In scope terms, do we need to consider the placement of text within regions, and also the placement of text not in regions?

<glenn> https://www.w3.org/wiki/TTML/changeProposal015#region_anchor_points

glenn: on the prior point, change proposal 15 has a section on this.
... This is proposed for TTML2, but not implemented yet.

courtney: In WebVTT cues can have positioning - in TTML1 they don't. So in the mapping to TTML we need to translate to a region.

glenn: In TTML2 we are defining inline region definitions, so div and p in TTML2 can take a child region element, including extent and origin.

andreas: This is sometimes misused in operation!
... In mapping from WebVTT with no region and snap to lines is active, from the WebVTT spec it looks like margins need to be added top and bottom. Is that correct?
... If the first line is not to be at the bottom and the last line must not be at the bottom, that is.
... We need clarifications of this for accurate mapping.
... will add to the Issues list on the wiki

Summary of the day

nigel: We've looked at existing work from Andreas and Courtney, thought about the processing models and document structures,
... identified that style attributes should mostly transfer straightforwardly, thought about metadata a bit, and spent a while on layout.
... Tomorrow we have some time set aside for testing, and I suggest we combine the test case generation with the mapping algorithms.
... Thank you everyone, see you tomorrow.

adjourns meeting.

- DRAFT -

Timed Text Working Group Teleconference

16 Sep 2014

Attendees

Contents

Introductions

Agenda

Work done so far

Logical step through

Agenda

Document Structure

Layout

Summary of the day

Summary of Action Items

Scribe.perl diagnostic output