IMSC Hypothetical Render Model

W3C Working Draft

More details about this document
This version:
https://www.w3.org/TR/2022/WD-imsc-hrm-20220322/
Latest published version:
https://www.w3.org/TR/imsc-hrm/
Latest editor's draft:
https://w3c.github.io/imsc-hrm/spec/imsc-hrm.html
History:
https://www.w3.org/standards/history/imsc-hrm
Commit history
Editor:
Feedback:
GitHub w3c/imsc-hrm (pull requests, new issue, open issues)

Abstract

This specification specifies an Hypothetical Render Model (HRM) that constrains the complexity of documents that conform to any of the TTML Profiles for Internet Media Subtitles and Captions ([IMSC]).

The model is not intended as a specification of the processing requirements for implementations. For instance, while the model defines a glyph buffer for the purpose of limiting the number of glyphs displayed at any given point in time, it neither requires the implementation of such a buffer, nor models the sub-pixel glyph positioning and anti-aliased glyph rendering that can be used to produce text output.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Timed Text Working Group as a Working Draft using the Recommendation track.

Publication as a Working Draft does not imply endorsement by W3C and its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 2 November 2021 W3C Process Document.

1. Scope

This specification specifies an Hypothetical Render Model (HRM) that constrains the complexity of a IMSC Document Instance.

2. Documentation Conventions

This specification uses the same conventions as [IMSC].

3. Terms and Definitions

character. The character code property of a Character Information Item.

Note

The term character is for practical purposes the same as a code point.

code point. As defined by [i18n-glossary].

empty ISD. An Intermediate Synchronic Document with no presented region.

error. A failure to conform to the constraints defined by this specification.

IMSC Document Instance. A Document Instance that conforms to any profile defined in any edition of [IMSC].

4. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key word SHALL in this document is to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

Unless noted otherwise, this specification applies to an IMSC Document Instance.

A sequence of consecutive Intermediate Synchronic Documents conforms to the Hypothetical Render Model if is processed without error.

Note

Applying the Hypothetical Render Model to a Document Instance that is not an IMSC Document Instance yields results that might not reflect the complexity of the Document Instance.

5. Introduction

This section is non-normative.

5.1 Objective

The objective of the HRM is to allow subtitle and caption authors and providers to verify that the content they provide does not exceed defined complexity levels, so that playback systems can render the content synchronized with the author-specified display times.

Playback systems include desktop computers, mobile devices and home theatre devices.

Note

The HRM is not a new concept: it has been included in all versions and editions of [IMSC] and has remained substantially unchanged. It is refactored herein to simplify document maintenance.

5.2 Why limit the complexity of IMSC Document Instances?

IMSC Document Instances are typically authored by a first party and rendered by a second party. Unless both parties agree on the maximum complexity of a IMSC Document Instance, it is likely that:

The HRM prevents incomplete presentations of IMSC Instance Documents.
Figure 1 The HRM allows authors and implementers of presentation processors to agree on the maximum complexity of IMSC Document Instances.

As illustrated in Figure 1, by defining a method (the HRM) to compute a proxy for the complexity of an IMSC Document Instance and specifying a complexity limit based on such proxy:

5.3 Why is the HRM needed to limit complexity?

The HRM supplements the syntactic and structural constraints imposed in [IMSC] by imposing constraints on the contents of the presentation.

Because of the temporal and spatial variability of subtitles and captions across types of content, territories and languages, it is not possible to limit the complexity of an IMSC Document Instance using only average values.

An average-based constraint of 840 characters per minute could be met in multiple ways, with different rendering complexities. Contrast two potential approaches:

In the first, 5 characters are presented for a fraction of a second, followed by 835 characters that are then presented for over 59 seconds. This generates a high rendering complexity for the 835 characters, since there is only a brief time available to paint them.

In the second, 210 characters must be painted every 15 seconds, giving 15 seconds to prepare for the next presentation. This has a much lower rendering complexity.

The HRM achieves a more accurate representation of the complexity of an IMSC Document Instance at any given time by taking into account its past complexity in addition to its instantaneous complexity. The same approach is commonly used in video to limit bitstream complexity, e.g., the Hypothetical Reference Decoder (HRD) specified in [iso14496-10].

5.4 How does the HRM measure and limit complexity?

The HRM defines a simple model for the rendering of subtitles and captions, and uses the time it takes to render subtitles and captions according to that model as a proxy for the complexity of the subtitles and captions. Rendering includes drawing region backgrounds, rendering and copying text, and decoding and copying images. Complexity is then limited by requiring that the time to render one subtitle or caption is shorter than the time elapsed since the previous subtitle or caption.

This simple model requires only a static analysis of the IMSC Document Instance, requires no fetching of external resources and does not require the IMSC Document Instance to be actually rendered. Several simplifying assumptions are made to achieve this. For example, the model assumes that each character is drawn independently, and accounts for that assumption being, in many cases, false, by assigning different render speeds for different scripts. In general the model is not intended to capture the actual time that an implementation takes to render subtitles and captions, but rather scale with it: a document that is twice as complex according to the model would require roughly twice as many resources to actually render.

5.5 Where is the HRM used?

The HRM is typically used prior to distribution of the IMSC Document Instance to the end-user, as an integral part of authoring and as a quality check before distribution.

When the HRM is used, the consequences of an IMSC Document Instance exceeding the HRM limits depends on the context:

The HRM is not intended to be used when the IMSC Document Instance is presented to end-users since:

6. Architecture

This section is non-normative.

Hypothetical Render Model
Figure 2 Hypothetical Render Model

The model illustrated in Figure 2 operates on successive Intermediate Synchronic Documents obtained from an input IMSC Document Instance, and uses a simple double buffering model: while an Intermediate Synchronic Document En is being painted into Presentation Buffer Pn (the "front buffer" of the model), the previous Intermediate Synchronic Document En-1 is available for display in Presentation Buffer Pn-1 (the "back buffer" of the model).

The model specifies a (hypothetical) time required for completely painting an Intermediate Synchronic Document as a proxy for complexity. Painting includes drawing region backgrounds, rendering and copying glyphs, and decoding and copying images. Complexity is then limited by requiring that painting of Intermediate Synchronic Document En completes before the end of Intermediate Synchronic Document En-1.

Whenever applicable, constraints are specified relative to Root Container Region dimensions, allowing subtitle sequences to be authored independently of Related Video Object resolution.

To enable scenarios where the same glyphs are used in multiple successive Intermediate Synchronic Documents, e.g. to convey a CEA-608/708-style roll-up (see [CEA-608] and [CEA-708]), the Glyph Buffers Gn and Gn-1 store rendered glyphs across Intermediate Synchronic Documents, allowing glyphs to be copied into the Presentation Buffer instead of rendered, a more costly operation.

Similarly, Decoded Image Buffers Dn and Dn-1 store decoded images across Intermediate Synchronic Documents, allowing images to be copied into the Presentation Buffer instead of decoded.

7. General

The Presentation Compositor SHALL render in Presentation Buffer Pn each successive Intermediate Synchronic Document En using the following steps in order:

  1. clear the pixels of the entire Root Container Region, unless n=0 or En-1 is an empty ISD;
  2. paint, according to stacking order, all background pixels for each region;
  3. paint all pixels for background colors associated with text or image subtitle content; and
  4. paint the text or image subtitle content.

The Presentation Compositor SHALL start rendering En:

Note

The Presentation Compositor never renders an ISD more than IPD ahead of its presentation time and treats sequences of empty ISDs as a single ISD.

ISD rendering and presentation times.
Figure 3 illustrates the rendering and presentation of Intermediate Synchronic Documents by the Presentation Compositor. The Presentation Compositor renders En at the presentation time of En-2 since En-1 is an empty ISD. In contrast, the Presentation Compositor renders En-2 at the presentation time of En-3 since En-4 is not an empty ISD. Finally, E0 is rendered at the presentation time of E0 minus IPD.

The duration DUR(En) for painting an Intermediate Synchronic Document En in the Presentation Buffer Pn SHALL be:

DUR(En) = S(En) / BDraw + DURT(En) + DURI(En)

where

The contents of the Presentation Buffer Pn SHALL be transferred instantaneously to Presentation Buffer Pn-1 at the presentation time of Intermediate Synchronic Document En, making the latter available for display.

Note

It is possible for the contents of Presentation Buffer Pn-1 to never be displayed. This can happen if Presentation Buffer Pn is copied twice to Presentation Buffer Pn-1 between two consecutive video frame boundaries of the Related Video Object.

It SHALL be an error for the Presentation Compositor to fail to complete painting pixels for En before the presentation time of En.

Unless specified otherwise, the following table SHALL specify values for IPD and BDraw.

Parameter Initial value
Initial Painting Delay (IPD) 1 s
Normalized background drawing performance factor (BDraw) 12 s-1
Note

BDraw effectively sets a limit on fillings regions - for example, assuming that the Root Container Region is ultimately rendered at 1920×1080 resolution, a BDraw of 12 s-1 would correspond to a fill rate of 1920×1080×12/s=23.7×220pixels s-1.

Note

IPD effectively sets a limit on the complexity of any given Intermediate Synchronic Document.

8. Paint Regions

The total normalized drawing area S(En) for Intermediate Synchronic Document En SHALL be

S(En) = CLEAR(En) + PAINT(En )

where CLEAR(En) = 0 if n=0 or En-1 is an empty ISD, and CLEAR(En) = 1 otherwise.

Note

To ensure consistency of the Presentation Buffer, a new Intermediate Synchronic Document requires clearing of the Root Container Region.

PAINT(En) SHALL be the normalized area to be painted for all regions that are used in Intermediate Synchronic Document En according to:

PAINT(En) = ∑Ri∈Rp NSIZE(Ri) ∙ NBG(Ri)

where Rp SHALL be the set of presented regions in the Intermediate Synchronic Document En.

NSIZE(Ri) SHALL be given by:

NSIZE(Ri) = (width of Ri ∙ height of Ri ) ÷ (Root Container Region height ∙ Root Container Region width)

NBG(Ri) SHALL be the total number of elements within the tree rooted at region Ri that satisfy the following criteria:

Issue 5: span elements are included in NBG(R_i)

NBG(Ri) counts the number of tts:backgroundColor attributes specified span elements.

In a common scenario illustrated below, this results in the complexity of painting (relatively small) span backgrounds to be equal to painting the background of (relatively much larger) region that essentially fills the root container.

image

This can be addressed by excluding span from the NBG(Ri) computation, and instead including tts:backgroundColor in the list of glyph properties at https://www.w3.org/TR/ttml-imsc1.1/#paint-text.

Note

An element and its parent that satisfy the criteria above and share identical computed values of tts:backgroundColor are counted as two distinct elements for the purpose of computing NBG(Ri).

Note

The set element is not included in the computation of NBG(Ri). While it can affect the computed values of tts:backgroundColor, it is removed during Intermediate Synchronic Document construction.

9. Paint Images

The Presentation Compositor SHALL paint into the Presentation Buffer Pn all visible pixels of presented images of Intermediate Synchronic Document En.

For each presented image, the Presentation Compositor SHALL either:

Two images SHALL be identical if and only if they reference the same encoded image source.

The duration DURI(En) for painting images of an Intermediate Synchronic Document En in the Presentation Buffer SHALL be as follows:

DURI(En) = ∑Ii ∈ Ic NRGA(Ii) / ICpy + ∑Ij ∈ Id NSIZ(Ij) / IDec

where

NRGA(Ii) is the Normalized Image Area of presented image Ii and SHALL be equal to:

NRGA(Ii)= (width of Ii ∙ height of Ii ) ÷ ( Root Container Region height ∙ Root Container Region width )

NSIZ(Ii) SHALL be the number of pixels of presented image Ii.

The contents of the Decoded Image Buffer Dn SHALL be transferred instantaneously to Decoded Image Buffer Dn-1 at the presentation time of Intermediate Synchronic Document En.

The total size occupied by images stored in Decoded Image Buffers Dn or Dn-1 SHALL be the sum of their Normalized Image Area.

The size of Decoded Image Buffers Dn or Dn-1 SHALL be the Normalized Decoded Image Buffer Size (NDIBS).

Unless specified otherwise, the following table SHALL specify ICpy, IDec, and NDBIS.

Parameter Initial value
Normalized image copy performance factor (ICpy) 6
Image Decoding rate (IDec) 1 × 220 pixels s-1
Normalized Decoded Image Buffer Size (NDIBS) 0.9885

10. Paint Text

In the context of this section, a glyph is a tuple consisting of (i) one character and (ii) the computed values of the following style properties:

Note

In the case where a property is prohibited in a profile of [IMSC], the computed value of the property specified in [ttml2] can be used.

Note

The Hypothetical Render Model defines a one-to-one mapping between characters and glyphs (using the definition of glyph from this document). While a one-to-one mapping between code points and glyphs (using the definition of glyph from [i18n-glossary]) is common in some scripts (such as the Latin script), the actual relationship is more complex. Some scripts, such as Arabic, use different glyphs for a given character, depending on its position in a word. Some scripts require combining marks or use a sequence of code points to form a glyph. Cases exist where a given sequence of code points can have different glyph representations depending on context. This complexity is accounted for by reducing the performance of the glyph buffer for scripts where a one-to-one mapping is not the general rule (see GCpy below).

For each glyph associated with a character in a presented region of Intermediate Synchronic Document En, the Presentation Compositor SHALL:

Example of Presentation Compositor Behavior for Text Rendering
Figure 4 Example of Presentation Compositor Behavior for Text Rendering

The duration DURT(En) for rendering the text of an Intermediate Synchronic Document En in the Presentation Buffer is as follows:

DURT(En) = ∑gi ∈ Γr NRGA(gi) / Ren(gi) + ∑gj ∈ Γc NRGA(gj) / GCpy

where

The Normalized Rendered Glyph Area NRGA(gi) of a glyph gi SHALL be equal to:

NRGA(gi) = (fontSize of gi as percentage of Root Container Region height)2

Note

NRGA(Gi) does not take into account decorations (e.g. underline), effects (e.g. outline) or actual typographical glyph aspect ratio. An implementation can determine an actual buffer size needs based on worst-case glyph size complexity.

The contents of the Glyph Buffer Gn SHALL be copied instantaneously to Glyph Buffer Gn-1 at the presentation time of Intermediate Synchronic Document En.

It SHALL be an error for the sum of NRGA(gi) over all glyphs Glyph Buffer Gn to be larger than the Normalized Glyph Buffer Size (NGBS).

Unless specified otherwise, the following table specifies values of GCpy, Ren and NGBS.

Normalized glyph copy performance factor (GCpy)
Script property, as defined at [UAX24], for the character of gi GCpy
Latin, Greek, Cyrillic, Hebrew or Common 12
any other value 3
Text rendering performance factor Ren(Gi)
Script property, as defined at [UAX24], for the character of gi Ren(Gi)
Han, Katakana, Hiragana, Bopomofo or Hangul 0.6
any other value 1.2
Normalized Glyph Buffer Size (NGBS)
1
Note

While DURT(En) is not affected, the choice of font by the presentation processor can increase actual rendering complexity at time of presentation. For instance, a cursive font might select different glyphs for a given grapheme (in order to maintain joining or for the start/end of the word) even in the Latin script. Conversely the rendering of scripts that fall in the any other value category can in practice achieve performance comparable to, say, the Latin script.

A. Accessibility Considerations

This section is non-normative.

A.1 Impact of non-conformance

In a system where IMSC Document Instances are expected to conform to the Hypothetical Render Model, an IMSC Document Instance that does not conform to the Hypothetical Render Model might negatively impact accessibility during presentation of the IMSC Document Instance and its associated content.

A.2 User customisation of presentation

This specification does not attempt to model any additional complexity for presentation processors that might arise due to the user customisation of presentation, for example as described by [media-accessibility-reqs]; such user customisation is not defined by [IMSC].

Implementers of presentation processors that support user customisation of presentation should ensure that those processors are able to present IMSC Document Instances that conform to the Hypothetical Render Model, even if the customisation effectively increases the complexity of presentation.

B. Privacy and Security Considerations

This section is non-normative.

B.1 General

This specification has no inherent security or privacy implications.

The algorithm defined within this specification is used for static analysis of a resource. This specification does not define any protocol or interface for obtaining such a resource, and it does not define any interface for exposing the results of the analysis. No personal or sensitive information is processed as part of the algorithm, other than any such information that might happen to be part of the IMSC Document Instance being analysed. No information is exposed by the algorithm to any origin. No scripts are loaded or processed as part of the algorithm and no links to external resources are dereferenced.

B.2 Implementation considerations

Implementers of this specification should capture and meet privacy and security requirements for their intended application. For example, an implementation could, when reporting on an error encountered during processing of an IMSC Document Instance, include a section of the content of an IMSC Document Instance to elaborate the error. If that content could include sensitive or personal information, the implementation should ensure that any such output is provided using appropriately secure protocols. No such reporting is defined or required by this specification.

C. Error Reporting and Exception Handling

This section is non-normative.

C.1 Error Reporting

This specification does not define how, or even if, errors should be reported.

For example, an implementation could stop on the first error encountered, or continue to process the IMSC Document Instance and report every error. Or an implementation could exit with an appropriate status code without reporting any details at all.

C.2 Exception Handling

This specification does not define any runtime exceptions, or how such exceptions should be handled.

D. Acknowledgements

This section is non-normative.

E. Summary of substantive changes

This section is non-normative.

F. References

F.1 Normative references

[i18n-glossary]
Internationalization Glossary. Richard Ishida; Addison Phillips. W3C. 11 February 2022. W3C Working Group Note. URL: https://www.w3.org/TR/i18n-glossary/
[IMSC]
TTML Profiles for Internet Media Subtitles and Captions. World Wide Web Consortium (W3C). URL: https://www.w3.org/TR/ttml-imsc/
[RFC2119]
Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
[RFC8174]
Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174
[ttml2]
Timed Text Markup Language 2 (TTML2). Glenn Adams; Cyril Concolato. W3C. 8 November 2018. W3C Recommendation. URL: https://www.w3.org/TR/2018/REC-ttml2-20181108/
[UAX24]
Unicode Script Property. Ken Whistler. Unicode Consortium. 27 August 2021. Unicode Standard Annex #24. URL: https://www.unicode.org/reports/tr24/tr24-32.html

F.2 Informative references

[CEA-608]
CTA 608-E, Line-21 Data Services. Consumer Technology Association. URL: https://www.techstreet.com/standards/cta-608-e-r2014?product_id=1815447
[CEA-708]
CTA 708-D, Digital Television (DTV) Closed Captioning. Consumer Technology Association. URL: https://www.techstreet.com/standards/cta-708-d?product_id=1815448
[iso14496-10]
Information technology — Coding of audio-visual objects — Part 10: Advanced video coding. ISO/IEC. Under development. URL: https://www.iso.org/standard/83529.html
[media-accessibility-reqs]
Media Accessibility User Requirements. Shane McCarron; Michael Cooper; Mark Sadecki. W3C. 3 December 2015. W3C Working Group Note. URL: https://www.w3.org/TR/2015/NOTE-media-accessibility-reqs-20151203/