Comparison of linear vs composited HDR pipelines
Presenter: Timo Kunkel (Dolby Laboratories)
Duration: 16 min
Slides & video
Thank you for joining this presentation comparing linear High Dynamic Range or HDR pipelines with composited HDR pipelines.
My name is Timo Kunkel and I'm a senior researcher with Dolby Labs in San Francisco.
Let me give you a quickly a big picture over this presentation.
HDR pipelines for full screen content which typically refer to movies and video are well established.
These HDR pipelines covering all relevant parts of an HDR capable ecosystem such as content production, delivery, and display.
However, successfully facilitating HDR content that is composited in real-time as it is the case with websites or graphical user interfaces is still in its infancy.
With this presentation, I'm comparing the differences that one is likely to encounter between the two pipeline approaches.
And discuss the aspects that need to be considered in order to establish a successful running pipeline for composited HDR content.
Okay, let's start by first identifying the fundamental differences between linear and composited pipelines.
Movies and video can be classified as linear content as it is played back from a start to finish point.
Where the individual frames of the content are played one after the other.
The most common presentation form is where the content covers the full screen.
Where we maintain, for example a constant image dimension, frame rate or encoding type.
This means that the fundamental content properties are known and don't change throughout the playback duration.
Nevertheless, individual movie or video clips with different properties can succeed each other.
In such a scenario, the interpretation of the content properties can change.
And with that linear content can be called Temporally Mixed Media.
This is not necessarily the case with composited media content.
Where individual content elements are composed in a special manner.
Examples for this use case include web browsers, but also the graphical user interface rendering engines in operating systems or apps.
The spatial layout area is frequently called a rendering canvas.
And like painters canvas, colors, text images and other content elements are placed on the compositing canvas in an orderly fashion resulting in the final layout.
Besides the spatial placement, temporal aspects might also be present.
For example by animations or embedded videos.
Therefore, composite content can be called Spatially and Temporally Mixed Media.
So one fundamental note on terminology.
Please be aware that concepts and approaches might have different meaning and scope depending on the field of application.
This is particularly true when fields intersect as is the case here, when comparing linear and composite pipelines.
With that in mind, I'm using terms that are intended to communicate the concepts in order to bridge between the fields.
But they might not match exactly the terms in active use in particular contexts.
Each movie frame is decoded and rendered into a frame buffer.
And this frame buffer, typically the full display area, making the content appear full screen.
The renderer used for this task employs a set of metadata that defines the content properties for the duration of the playback.
Canvas based rendering pipeline is more complex than a typical linear pipeline.
This stems from the fact that instead of only processing a single full screen frame, as with a linear workflow, the renderer now must ingest a multitude of composition elements and place them on the canvas using layout recipes.
Which are, for example, based on HTML, CSS or XML.
It is also possible that the canvas extends beyond the full screen area that is immediately visible.
Nevertheless, layout and appearance consideration must be factored in for the whole canvas.
The active or visible area is then rendered into the frame buffer as before with the linear approach and sent it to the display.
Until recently, canvas based rendering was limited to content elements that are encoded in Standard Dynamic Range or SDR.
SDR is a retronym that was established over the past 15 years to identify traditional content that uses gamma encoding, more limited luminance ranges and color gamuts such as Rec. 709 or sRGB.
Due to this limitation to only the SDR content, a rendering engine can assume that all graphics elements follow SDR specifications.
And with that match the encoding of the canvas or can be matched using a single conversion approach.
The complexity of the rendering pipeline increases when content that offers properties that go beyond SDR are introduced.
Such content is commonly called High Dynamic Range or HDR.
Now the challenge for an HDR capable rendering engine is that it must be able to ingest content elements with varying encodings, luminance brackets and color gamuts and composite them to the canvas in a perceptually meaningful and consistent way.
Please be aware that the scope of the term HDR can entail both extended luminance and chromatic properties.
Nevertheless, it is also possible that HDR solely describes extended luminance range.
The chromatic properties going beyond the color gamut such as sRGB are then referred to as Wide Color Gamut or WCG.
This brings us to the first set of takeaways.
HDR imaging pipelines for linear content, such as movies are already more complex than pure SDR ones.
Now adding spatial canvas compositing to an HDR imaging pipeline increases complexity even further as the renderer must facilitate content elements that are provided with varying sourcing codings and properties.
Now let's have a deeper look into the aspects and properties that are relevant to content elements and the canvas itself, when considering an HDR capable compositing pipeline.
Let's first look at the canvas.
In the context of linear content, the term canvas is actually not used.
But a concept that can be considered of being related is that of a content container.
Containers represent the fundamental properties of content such as a particular color volume that is maintained over certain periods.
Such as the duration of a movie.
Common properties for HDR containers include a signal nonlinearity.
Which is referred to as Electro Optical Transfer Function or EOTF.
The most common HDR EOTFs are the Absolute PQ and Relative HLG EOTFs.
Both are defined in the ITU-R Recommendation BT. 2100.
Similarly a Wide Color Gamut is enabled by more saturated color primaries.
These are defined by following DCI-P3 with a D65 white point and ITU-R Recommendation BT. 2100 which defines color primaries that match the ones from BT. 2020.
To better illustrate this concept, the example on the top right shows a conceptual movie timeline to which several individual scene clips are added.
Here in this example of the container is in PQ.
The first content piece labeled scene one, matches the properties of the container.
It therefore can typically be directly placed without conversion.
In contrast, scene two is provided as SDR clip as illustrated in the example on the bottom right.
In order to place it into the PQ container, it first must be converted from SDR to PQ through a dedicated transform.
Similarly, scene three is provided in the HLG HDR format.
Which also requires conversion.
And obviously if the HDR container is in HLG, then any PQ content needs to be transformed first before being placed.
Similar to ingesting scenes originating from different formats and encodings as described on the previous slide.
Canvas based approaches need to handle it and potentially convert, different content elements in order to place them on a single format canvas.
In addition to the before mentioned PQ and HLG and encodings here, linear light encodings can also be a realistic option.
For example through a 16 bit floating point representation.
And please note that linear here refers to the luminous intensity and not time.
An additional consideration is to identify or define particularities such as universal global luminance or signal levels that present particularly meaningful colors.
One example is graphics white.
Which can, for example describe a common website background or the luminance of texts used for closed captioning.
Now, after we have discussed the canvas properties.
Let's look closer at the content elements that be into place on the canvas.
As mentioned earlier, content elements with potentially varying properties such as EOTF, nonlinearity and color volume must be accurately ingested and placed on a canvas.
This includes considerations on how to map SDR or different HDR elements onto a single HDR canvas by considering additional properties, such as luminance anchors or other scaling properties.
There are standards such as ITU-R Recommendations BT. 709 and BT. 2100 that provides some approaches for conversion amongst formats.
However, even for linear HDR pipelines there are currently no standards preserving all properties of how to map, composite, and render SDR and different HDR content elements.
The properties of individual content elements, also depend on the content creation context.
Such as the mastering display or if color adjustments were carried out in a dark or bright environment.
Considering all these properties, makes mapping and compositing content from diverse sources a challenging task.
This brings us to the second set of takeaways.
Today the properties of content elements, such as color volume or mastering conditions are often unknown.
But even if they are known, the encoding of the elements varies widely.
For example through the non-linearity or EOTF, if the signal is absolute or not or in rich color gamut, the signal is represented.
This makes mapping and compositing content from diverse sources challenging.
To remedy this situation, providing more information about source and target display, content and environment is highly beneficial.
And also metadata can likely help but it requires implementation into the ecosystem.
Examples for metadata standards include SMPTE standard 2086, CTA 861.
ITU-T Recommendation H.273 and the display EDID standard.
How to do all this, is an active area of research and we will probably hear much more about this in the near future.
From a higher level point of view, the final step in the pipeline typically includes sending the successfully composited content to the target display.
In this step, the rendering engine needs to make sure that this composited content can actually fit into the capabilities of the target display.
Ideally without losing the intent of the composited content.
In this context, this does not refer to spatial or temporal aspects, but to the color volume.
Which can be understood as a combination of luminance and a chromatic range.
Therefore, this process is called tone, gamut, or color volume mapping.
This slide illustrates the fundamental concept of color volumes and content mapping.
It is typically a required step both for linear and canvas based HDR pipelines.
Color volume is mainly defined by the color primaries and the minimum and maximum luminance or code level.
Such a color volume could be represented by the content container discussed earlier or by a physical hardware device such as a display.
Here on the left, we have an example of a large color volume that could be representative of a reference or higher end consumer display.
It is also important to differentiate between the container or device color volume, that represents what encoding format or display device overall can facilitate, and the actual colors that a particular piece of content requires.
The letter is illustrated by the grey point cloud inside the color volume which represents the image with the lighter flame at the lower part of the slide.
This example scene might contain more subtle colors or more extreme luminance levels than a potential target display with its specific color volume can recreate.
An example of such a smaller color volume is given on the right and can be seen as example for lower tier HDR or SDR displays on TVs, computer displays, and mobile devices.
Now, in order to compress the pixel point cloud in the large color volume to be faithfully represented on a less capable display, we employ a tone mapper, which can facilitate the necessary color volume reduction of the content.
The result is a new point cloud that both fits into the color volume of the target display and ideally retains the original intent of the content.
Finally, to improve the mapping process even further, metadata can be employed.
Which is common with already established advanced HDR formats.
Now we have discussed the fundamental elements of an HDR pipeline from content deployment to display.
This final slide provides an overview of the most common steps and options that one will likely encounter with an HDR capable canvas-based rendering pipeline.
Such a display should be capable of ingesting both SDR and HDR content elements, and then composite them to both SDR and HDR capable canvases, for output to their respective display system capabilities.
Any color volume differences are facilitated by tone mapping before being rendered on a target display.
One more key point to acknowledge is that the tone mapping engine is not always part of the rendering pipeline implemented for example in a browser operating system in order to feed a passive HDR display.
It is also common that HDR displays include a tone mapping engine and therefore do not require tone mapping by this source device.
Those devices are here labeled smart HDR display.
And this brings us to the conclusion of this presentation.
In this presentation, we have discussed that canvas compositing and rendering pipelines are more complex than linear full screen rendering ones.
For example, canvas compositing pipelines must consider the layout, color, shading and blending described in multiple source files in order to create the final layout and appearance.
Now adding HDR and Wide Color Gamut support complicates the process even further as we now have to ingest mixed source elements that can provided in both SDR and different flavors of HDR for the same canvas.
Further, the final composited output must be compatible with different display capabilities which again can be in SDR or HDR.
As the display capabilities vary, tone mapping approaches are employed.
The effectiveness of the tone mapping process can be improved by providing metadata that for example, describes certain aspects of the source content elements.
With all this, we can't ignore that backwards compatibility must also be maintained to enable wide deployability of the content.
And finally, independent of how we process the content for display.
One fundamental desire is to maintain the content intent, which could refer to creative aspects of the content.
But can also be aimed at visibility or viewing comfort.
Thank you for joining this presentation.
The following appendix provides a list
of current HDR related standards
as well as reference books about HDR.
I also provide an example list
of HDR related