IMSC Hypothetical Render Model

Abstract

This specification specifies an Hypothetical Render Model (HRM) that constrains the complexity of documents that conform to any of the TTML Profiles for Internet Media Subtitles and Captions ([IMSC]).

The model is not intended as a specification of the processing requirements for implementations. For instance, while the model defines a glyph buffer for the purpose of limiting the number of glyphs displayed at any given point in time, it neither requires the implementation of such a buffer, nor models the sub-pixel character positioning and anti-aliased glyph rendering that can be used to produce text output.

This section is non-normative.

The model illustrated in Figure 1 operates on successive Intermediate Synchronic Documents obtained from an input IMSC Document Instance, and uses a simple double buffering model: while an Intermediate Synchronic Document E_n is being painted into Presentation Buffer P_n (the "front buffer" of the model), the previous Intermediate Synchronic Document E_n-1 is available for display in Presentation Buffer P_n-1 (the "back buffer" of the model).

The model specifies a (hypothetical) time required for completely painting an Intermediate Synchronic Document as a proxy for complexity. Painting includes drawing region backgrounds, rendering and copying glyphs, and decoding and copying images. Complexity is then limited by requiring that painting of Intermediate Synchronic Document E_n completes before the end of Intermediate Synchronic Document E_n-1.

Whenever applicable, constraints are specified relative to Root Container Region dimensions, allowing subtitle sequences to be authored independently of Related Video Object resolution.

To enable scenarios where the same glyphs are used in multiple successive Intermediate Synchronic Documents, e.g. to convey a CEA-608/708-style roll-up (see [CEA-608] and [CEA-708]), the Glyph Buffers G_n and G_n-1 store rendered glyphs across Intermediate Synchronic Documents, allowing glyphs to be copied into the Presentation Buffer instead of rendered, a more costly operation.

Similarly, Decoded Image Buffers D_n and D_n-1 store decoded images across Intermediate Synchronic Documents, allowing images to be copied into the Presentation Buffer instead of decoded.

The Presentation Compositor SHALL render in Presentation Buffer P_n each successive Intermediate Synchronic Document E_n using the following steps in order:

clear the pixels of the entire Root Container Region, unless n=0 or E_n-1 is an empty ISD;
paint, according to stacking order, all background pixels for each region;
paint all pixels for background colors associated with text or image subtitle content; and
paint the text or image subtitle content.

The Presentation Compositor SHALL start rendering E_n:

at the presentation time of E_m, where E_m is the closest ISD for which 0<m<n and E_m-1 is not an empty ISD, if the presentation time of E_n minus that of E_m is less than IPD; or
at the presentation time of E_n minus IPD, otherwise.

Note

The Presentation Compositor never renders an ISD more than IPD ahead of its presentation time and treats sequences of empty ISDs as a single ISD.

ISD rendering and presentation times. — Figure 2 illustrates the rendering and presentation of Intermediate Synchronic Documents by the Presentation Compositor. The Presentation Compositor renders E_n at the presentation time of E_n-2 since E_n-1 is an empty ISD. In contrast, the Presentation Compositor renders E_n-2 at the presentation time of E_n-3 since E_n-4 is not an empty ISD. Finally, E₀ is rendered at the presentation time of E₀ minus IPD.

The duration DUR(E_n) for painting an Intermediate Synchronic Document E_n in the Presentation Buffer P_n SHALL be:

DUR(E_n) = S(E_n) / BDraw + DUR_T(E_n) + DUR_I(E_n)

where

S(E_n) is the total normalized drawing area for Intermediate Synchronic Document E_n, as specified in 7. Paint Regions;
BDraw is the normalized background drawing performance factor;
DUR_T(E_n) is the duration, in seconds, for painting the text subtitle content for Intermediate Synchronic Document E_n, as specified in Section 9. Paint Text; and
DUR_I(E_n) is the duration, in seconds, for painting the image subtitle content for Intermediate Synchronic Document E_n, as specified in Section .

The contents of the Presentation Buffer P_n SHALL be transferred instantaneously to Presentation Buffer P_n-1 at the presentation time of Intermediate Synchronic Document E_n, making the latter available for display.

Note

It is possible for the contents of Presentation Buffer P_n-1 to never be displayed. This can happen if Presentation Buffer P_n is copied twice to Presentation Buffer P_n-1 between two consecutive video frame boundaries of the Related Video Object.

It SHALL be an error for the Presentation Compositor to fail to complete painting pixels for E_n before the presentation time of E_n.

Unless specified otherwise, the following table SHALL specify values for IPD and BDraw.

Parameter	Initial value
Initial Painting Delay (IPD)	1 s
Normalized background drawing performance factor (BDraw)	12 s^-1

Note

BDraw effectively sets a limit on fillings regions - for example, assuming that the Root Container Region is ultimately rendered at 1920×1080 resolution, a BDraw of 12 s^-1 would correspond to a fill rate of 1920×1080×12/s=23.7×2²⁰pixels s^-1.

Note

IPD effectively sets a limit on the complexity of any given Intermediate Synchronic Document.

The total normalized drawing area S(E_n) for Intermediate Synchronic Document E_n SHALL be

S(E_n) = CLEAR(E_n) + PAINT(E_n )

where CLEAR(E_n) = 0 if n=0 or E_n-1 is an empty ISD, and CLEAR(E_n) = 1 otherwise.

Note

To ensure consistency of the Presentation Buffer, a new Intermediate Synchronic Document requires clearing of the Root Container Region.

PAINT(E_n) SHALL be the normalized area to be painted for all regions that are used in Intermediate Synchronic Document E_n according to:

PAINT(E_n) = ∑_{R_i∈R_p} NSIZE(R_i) ∙ NBG(R_i)

where R_p SHALL be the set of presented regions in the Intermediate Synchronic Document E_n.

NSIZE(R_i) SHALL be given by:

NSIZE(R_i) = (width of R_i ∙ height of R_i ) ÷ (Root Container Region height ∙ Root Container Region width)

Example 1

For a region R_i in with tts:extent="250px 50px" within a Root Container Region with tts:extent="1920px 1080px", NSIZE(R_i) ≈ 0.00603.

NBG(R_i) SHALL be the total number of elements within the tree rooted at region R_i that satisfy the following criteria:

the element is either a region, body, div, p or span; and
the opacity of the computed value of tts:backgroundColor is not 0.

Issue 5: span elements are included in NBG(R_i)

NBG(R_i) counts the number of tts:backgroundColor attributes specified span elements.

In a common scenario illustrated below, this results in the complexity of painting (relatively small) span backgrounds to be equal to painting the background of (relatively much larger) region that essentially fills the root container.

This can be addressed by excluding span from the NBG(R_i) computation, and instead including tts:backgroundColor in the list of glyph properties at https://www.w3.org/TR/ttml-imsc1.1/#paint-text.

Note

An element and its parent that satisfy the criteria above and share identical computed values of tts:backgroundColor are counted as two distinct elements for the purpose of computing NBG(R_i).

Note

The set element is not included in the computation of NBG(R_i). While it can affect the computed values of tts:backgroundColor, it is removed during Intermediate Synchronic Document construction.

The Presentation Compositor SHALL paint into the Presentation Buffer P_n all visible pixels of presented images of Intermediate Synchronic Document E_n.

For each presented image, the Presentation Compositor SHALL either:

if an identical image is present in Decoded Image Buffer D_n, copy the image from Decoded Image Buffer D_n to the Presentation Buffer P_n using the Image Copier; or
if an identical image is present in Decoded Image Buffer D_n-1, i.e. an identical image was present in Intermediate Synchronic Document E_n-1, copy using the Image Copier the image from Decoded Image Buffer D_n-1 to both the Decoded Image Buffer D_n and the Presentation Buffer P_n; or
otherwise, decode the image using the Image Decoder the image into the Presentation Buffer P_n and Decoded Image Buffer D_n.

Two images SHALL be identical if and only if they reference the same encoded image source.

The duration DUR_I(E_n) for painting images of an Intermediate Synchronic Document E_n in the Presentation Buffer SHALL be as follows:

DUR_I(E_n) = ∑_{I_i ∈ I_c} NRGA(I_i) / ICpy + ∑_{I_j ∈ I_d} NSIZ(I_j) / IDec

where

I_c is the set of images copied when painting Intermediate Synchronic Document E_n;
I_d is the set of images decoded when painting Intermediate Synchronic Document E_n;
IDec is the image decoding rate; and
ICpy is the normalized image copy performance factor.

NRGA(I_i) is the Normalized Image Area of presented image I_i and SHALL be equal to:

NRGA(I_i)= (width of I_i ∙ height of I_i ) ÷ ( Root Container Region height ∙ Root Container Region width )

NSIZ(I_i) SHALL be the number of pixels of presented image I_i.

The contents of the Decoded Image Buffer D_n SHALL be transferred instantaneously to Decoded Image Buffer D_n-1 at the presentation time of Intermediate Synchronic Document E_n.

The total size occupied by images stored in Decoded Image Buffers D_n or D_n-1 SHALL be the sum of their Normalized Image Area.

The size of Decoded Image Buffers D_n or D_n-1 SHALL be the Normalized Decoded Image Buffer Size (NDIBS).

Unless specified otherwise, the following table SHALL specify ICpy, IDec, and NDBIS.

Parameter	Initial value
Normalized image copy performance factor (ICpy)	6
Image Decoding rate (IDec)	1 × 2²⁰ pixels s^-1
Normalized Decoded Image Buffer Size (NDIBS)	0.9885

In the context of this section, a glyph is a tuple consisting of (i) one character and (ii) the computed values of the following style properties:

tts:color
tts:fontFamily
tts:fontSize
tts:fontStyle
tts:fontWeight
tts:textDecoration
tts:textOutline
tts:textShadow

Note

In the case where a property is prohibited in a profile of [IMSC], the computed value of the property specified in [ttml2] can be used.

Note

While one-to-one mapping between characters and typographical glyphs is generally the rule in some scripts, e.g. latin script, it is the exception in others. For instance, in arabic script, a character can yield multiple glyphs depending on its position in a word. The Hypothetical Render Model always assumes a one-to-one mapping, but reduces the performance of the glyph buffer for scripts where one-to-one mapping is not the general rule (see GCpy below).

For each glyph associated with a character in a presented region of Intermediate Synchronic Document E_n, the Presentation Compositor SHALL:

if an identical glyph is present in Glyph Buffer G_n, copy the glyph from Glyph Buffer G_n to the Presentation Buffer P_n using the Glyph Copier; or
if an identical glyph is present in Glyph Buffer G_n-1, i.e. an identical glyph was present in Intermediate Synchronic Document E_n-1, copy using the Glyph Copier the glyph from Glyph Buffer G_n-1 to both the Glyph Buffer G_n and the Presentation Buffer P_n; or
otherwise render using the Glyph Renderer the glyph into the Presentation Buffer P_n and Glyph Buffer G_n.

Figure 3 Example of Presentation Compositor Behavior for Text Rendering

The duration DUR_T(E_n) for rendering the text of an Intermediate Synchronic Document E_n in the Presentation Buffer is as follows:

DUR_T(E_n) = ∑_{g_i ∈ Γ_r} NRGA(g_i) / Ren(g_i) + ∑_{g_j ∈ Γ_c} NRGA(g_j) / GCpy

where

Γ_r is the set of glyphs rendered into the Presentation Buffer P_n using the Glyph Renderer in Intermediate Synchronic Document E_n;
Γ_c is the set of glyphs copied to the Presentation Buffer P_n using the Glyph Copier in Intermediate Synchronic Document E_n;
Ren(g_i) is the text rendering performance factor for glyph g_i; and
GCpy is the normalized glyph copy performance factor.

The Normalized Rendered Glyph Area NRGA(g_i) of a glyph g_i SHALL be equal to:

NRGA(g_i) = (fontSize of g_i as percentage of Root Container Region height)²

Note

NRGA(G_i) does not take into account decorations (e.g. underline), effects (e.g. outline) or actual typographical glyph aspect ratio. An implementation can determine an actual buffer size needs based on worst-case glyph size complexity.

The contents of the Glyph Buffer G_n SHALL be copied instantaneously to Glyph Buffer G_n-1 at the presentation time of Intermediate Synchronic Document E_n.

It SHALL be an error for the sum of NRGA(g_i) over all glyphs Glyph Buffer G_n to be larger than the Normalized Glyph Buffer Size (NGBS).

Unless specified otherwise, the following table specifies values of GCpy, Ren and NGBS.

Normalized glyph copy performance factor (GCpy)
Script property, as defined at [UAX24], for the character of g_i	GCpy
`Latin`, `Greek`, `Cyrillic`, `Hebrew` or `Common`	12
any other value	3
Text rendering performance factor Ren(G_i)
Script property, as defined at [UAX24], for the character of g_i	Ren(G_i)
`Han`, `Katakana`, `Hiragana`, `Bopomofo` or `Hangul`	0.6
any other value	1.2
Normalized Glyph Buffer Size (NGBS)
1

Note

The choice of font by the presentation processor can increase rendering complexity. For instance, a cursive font can generally result in a given character yielding different typographical glyphs depending on context, even if latin script is used. Conversely the rendering of scripts that fall in the any other value category can in practice achieve performance comparable to, say, the latin script.

This section is non-normative.

This specification has no inherent security or privacy implications.

The algorithm defined within this specification is used for static analysis of a resource. This specification does not define any protocol or interface for obtaining such a resource, and it does not define any interface for exposing the results of the analysis. No personal or sensitive information is processed as part of the algorithm, other than any such information that might happen to be part of the IMSC Document Instance being analysed. No information is exposed by the algorithm to any origin. No scripts are loaded or processed as part of the algorithm and no links to external resources are dereferenced.

Implementers of this specification should capture and meet privacy and security requirements for their intended application. For example, an implementation could, when reporting on an error encountered during processing of an IMSC Document Instance, include a section of the content of an IMSC Document Instance to elaborate the error. If that content could include sensitive or personal information, the implementation should ensure that any such output is provided using appropriately secure protocols. No such reporting is defined or required by this specification.

IMSC Hypothetical Render Model

Abstract

Status of This Document

1. Scope

2. Documentation Conventions

3. Terms and Definitions

4. Conformance

5. Overview

6. General

7. Paint Regions

8. Paint Images

9. Paint Text

A. Accessibility Considerations

A.1 Impact of non-conformance

A.2 User customisation of presentation

B. Privacy and Security Considerations

B.1 General

B.2 Implementation considerations

C. Acknowledgements

D. Summary of substantive changes

E. References

E.1 Normative references

E.2 Informative references