From Web Media Text Tracks Community Group

A Rendering Box for multiple Cues

A formal specification is now available for browser implementers.


WebVTT currently deals with two layout concepts: the text area that is to render, and the video viewport. (Actually, there is a third: the line box - but let's only deal with the other two for now.)

In other caption formats (in particular in CEA-608 and CEA-708), a third concept exists: the concept of a rendering region which holds multiple cues. It is called "window" in CEA-708, but we want to call it a "region".

Currently, it is not possible to be deterministic about where cue text is positioned because positioning works like positioning of CSS background-images, which is not good for captioning.

Several of the layout & rendering requirements listed in the Caption Model can be satisfied with the introduction of a cue region.

Note that the FCC requirements for captions online talks about the "window" concept.

New possibilities

A cue region concept enables several new features:

Explicit fixed width and height of cue rendering box

Current WebVTT rendering adjusts the sizing of a cue depending on other already rendered cues, on font size, wrapping and positioning. There is no way to determine a fixed width (percentage of video width) and height (in line numbers) rendering box to use up by the cues either from top to bottom or bottom to top (for horizontal cues). Specifying an explicit rendering box allows for this.

Explicit fixed positioning of cues

(see also Bug 15859)

Current WebVTT adjusts the positioning of a cue based on other already rendered cues, based on font size, whether it will wrap etc. A fixed positioned rendering region (percentage positions of video viewport) restricts the cue to being rendered within that region, avoiding, for example, overlapping other video content (such as on-screen text).

Align same rendering position of several cues

This is similar to the align-item property in flexboxes: if you want several cues to be aligned at the same position, it is best to pack them together in one region and align them together in that box. This can be start, end, middle, or stretch alignment within this box.

Scrolling of cues

Cues that are successively rendered into the same rendering region can push each other out of the way within the restricted number of lines for which the box is defined. The scroll direction and the scroll transition can be specified explicitly. This will allow roll-up and roll-down support.

Layering of cues

Defining multiple rendering regions allows these regions to be layered (with a z-index) and thus define independently of when text is rendered what will be on top, in case of overlap.

Border and background color of rendering region

With an explicit concept of a rendering region it is possible to specify a border and a background color of this region that is different from the individual cue text.

Transition effects of rendering region

Though not a very important use case, with an explicit concept of a rendering region it is possible to specify a transition on the region's appearance/disappearance (fade/wipe).

Declarative means of avoiding a certain area on screen

Bug 17273 provides for a means to define a region on screen that should not be occupied by cues. With an explicit concept of a rendering region as defined here, it would be possible to define an empty cue that relates to the region and blocks other default positioned cues from overlapping this region while it is active.

NOTE: We have removed the idea of vertical cue regions because we're not aware of any scrolling vertically rendered cue examples

Inspiration for WebVTT from CEA 708

CEA708 has the following special properties for rendering regions (see also CEA-708 feature summary; CEA-708 spec):

  • identifier
  • z-index (called "priority")
  • dimensions (size of the region given in number of columns and rows in a fixed grid)
  • size change (though I don't think this is a commonly used feature and not one that the FCC requires)
  • anchor point and anchor location, together specifying which part of the region (anchor point) is fixed on which part of the video viewport (anchor location)
  • cue justification (left, right, centered, full - see flexbox analogy above)
  • cue print direction (left2right / right2left / top2bottom / bottom2top - again similar to flexbox)
  • cue scroll direction (left2right / right2left / top2bottom / bottom2top)
  • region appear/disappear display effect (fade, wipe)
  • border type
  • background color

Specification for WebVTT

We add the ability to specify a rendering region for WebVTT cues.

At this point, we only want to be able to provide regions in the header of a WebVTT file. At a later stage we may extend this spec to allow new region definitions anywhere in a WebVTT file (to extend this feature for live captioning), but for now we want to focus on simply introducing the concept. Live captioning with WebVTT is a whole other challenge to address, in particular for in-band.

Also, at this stage we do not consider changes to the region parameters at a later stage in the WebVTT file. CEA-608/708 allow moving of rollup windows to another position (e.g. to avoid burnt-in text), but this is not currently a use case that this spec solves. At a later stage we may extend this spec to allow redefining region parameters later during the WebVTT file.


Region: id=fred width=80% height=3 regionanchor=0%,100% viewportanchor=10%,90% scroll=up   start=bottom layer=10
Region: id=bill width=50% height=4 regionanchor=50%,50% viewportanchor=50%,50% scroll=down start=top    layer=1

00:00:05.940 --> 00:00:10.610 region:fred

00:00:07.040 --> 00:00:11.700 region:fred

00:00:09.410 --> 00:00:12.910 region:bill

00:00:10.610 --> 00:00:14.100 region:fred

The region with id=fred is a box of 80%*vw width and 3 lines height. Its bottom left corner is pinned to the anchor location at x=10% and y=90% of the video viewport. It therefore grows to the right and up on overflow. Its z-index is 10. Since cue text scrolls up, the first line is rendered at the bottom line of the region and successive cues added below it, pushing it up.

The region with id=bill is a box of 50%*vw width (vw=video width) and 4 lines height. Its center is pinned to the anchor location in the center of the video viewport (at x=50%, y=50%). It therefore grows from the inside out on overflow. Its z-index is 1, so it sits below region "fred". Since cue text scrolls down, the first line is rendered at the top line of the region and successive cues added on top of it, pushing it down.

While a region is defined at the beginning of the WebVTT file, it only takes up space on the video viewport during the time intervals that a cue references it. (At other times it is display:none).

Region attributes

The following region attributes are proposed:

  • id : an identifier for the region which makes it possible to place a cue into this region; cues are placed into this region by an additional cue setting called "region" which states the id of the region that it is added to; since cue setting values can't have any space characters, region identifiers are not allowed to have any either.

  • width : the width of the region given as a percentage of vw (vw=video width); defaults to 100%
  • height : the height of the region given as an integer number of lines; defaults to 3 (lines) (empirical number based on experience from TV)

Note: The line height is determined by the font in use.

  • regionanchor : a tuple of two percentages that specify the point of the region box that is pinned with the first percentage measuring the x-dimension and the second percentage measuring the y-dimension from the top left corner of the region box; if no regionanchor is given, defaults to (0%,100%) (i.e. the bottom left corner)
  • viewportanchor : a tuple of two percentages that specify the point of the video viewport that the regionanchor point is anchored to; the first percentage measuring the x-dimension and the second percentage measuring the y-dimension from the top left corner of the video viewport box, if no viewportanchor is given, defaults to (0%, 100%) (i.e. the bottom left corner), (100%, 0%)

Note: For browsers, the region maps to an absolute positioned CSS box relative to the video viewport (i.e. there is a relative positioned box that represents the video viewport relative to which the regions are absolutely positioned). Overflow is hidden.

  • scroll : specifies whether cues rendered into the region are allowed to move out of their initial rendering place and in which direction they move; valid values are "up" and "down"; for horizontal regions "up" means moving towards the top of the video viewport, "down" towards the bottom; if the attribute is ommitted, cues do not move from their rendered position
  • start : specifies where the first cue is being rendered when the region is empty; valid values are "top" and "bottom"; for horizontal regions "top" means the line in the region closest to the top of the video viewport, "bottom" the line closest to the bottom of the viewport; the default is "bottom"

Note: Cues are added to a region one line at a time below existing cues. E.g. with scroll=up , cue lines are added below existing cue lines. When an existing rendered cue line is removed, and it was "above" another already rendered cue line, that cue line moves into its space, thus scrolling in the given direction. If there is not enough space for a new cue line to be added to a region, the top-most cue line is pushed off the visible region (thus slowly becoming invisible as it moves into overflow:hidden). This eventually makes space for the new cue line and allows it to be added.

When there is no scroll direction, cue lines are added in the empty line closest to the line in the "start" position. If no empty line is available, the oldest line is replaced.

Note: If a cue has an explicit "line" or "size" specification in its cue settings, the region cue setting will be ignored and the cue rendered as though it didn't belong to a region.

Note: The speed of movement of the cues is fixed to take 0.433 second to complete (taken from CEA608/708). This may be changed through a CSS transition property. The default setting is transition-duration: 0.433s; and transition-property: top.

  • layer : specifies the z-index at which the region is displayed; defaults to 0. Should be interpreted relative to the z-index of the video, i.e. if the video z-index is 10 and the layer 5, the z-index of the region should be 15.

All attributes are optional (note, however, that a region without an id is pretty useless).

Cue settings

We introduce a new cue setting:

  • region : specifies the id of a region that the cue is added to

The region addition of a cue has side effects on other cue settings.

The region cue setting is ignored when:

  • the cue has an explicit "vertical" cue setting
  • the cue has an explicit "line" cue setting
  • the cue has an explicit "size" cue setting

The other cue settings are applied to the line boxes in the cue relative to the region box:

  • position : the percentage is calculated as a percentage of region with and the cue's lines are indented from the edge of the region that the text begins.
  • align : configures the alignment of the cue lines within the region relative to the text direction; e.g. for left-to-right horizontal text "start" aligns with the left edge of the cue line (possibly indented by "position").

Region styling

We also introduce a means to address the region with CSS to allow styling of the region box:


For example:

::cue-region(bill) {
  background-color: red;
  border: 2px solid green,

This will make the region with id=bill have a red background color and a green solid 2px border.

Note that this will not inhibit the individual cues from being directly CSS addressable. I just adds the ability to style regions.

It should eventually be possible to give Regions animation and transition effects when cues inside appear/disappear. Here are examples for a fade-in and a slide-in:

::cue-region(bill) ::cue {
  animation-name: fadein;
  animation-duration: 3s;

@-webkit-keyframes fadein {
    0% {
    100% {


::cue-region(bill) ::cue {
  animation-name: slidein;
  animation-duration: 3s;
@keyframes slidein {
  0% {
    margin-left: -100%;
  100% {
    margin-left: 0%;

Note that when Regions definitions are combined with inline styles, it's best to put the definition of the regions before the inline styles in the WebVTT header.

Simplified first step

To simplify the implementation, the following attributes of a region may not be part of first support for regions:

  • layer : just implement natural z-index layering given by the order in which the regions are defined
  • scroll=down : only implement scroll=up for now (to replicate the rollup feature of 608 and 708)
  • start : only implement the start=bottom behaviour
  • cue movement : only implement the fixed 0.433s change duration and leave the addition of CSS3 transitions to a later time

Trial Implementation in JavaScript

Note that paint-on captions are not supported in this demo.