HTML/Media Task Force/MSE Ad Insertion Use Cases

From W3C Wiki

General use cases for Alternate Content during video playback

This section describes use cases for insertion of alternate content in to playback of a video stream. In the figures below a horizontal bar represents a contiguous stream of video, segmented into fragments, which may be interspersed with program OUT and IN points. Content is presented from right to left, beginning at the presentation point. Fragments to the left of the presentation point will be presented in order right to left - think of the presentation point sliding left, or the content sliding right, as presentation progresses.

A manifest represents some sub-set of fragments of the video content, called the manifest window here. The first fragment listed in the manifest will be the first to present, and the last fragment is the last to present. The actual presentation point is generally somewhere within the manifest window. Typically, a Player will acquire a new manifest each time it presents a fragment, so the window progresses through the content at the same rate the content is presented. This gives the Player foresight into what's coming up in the stream, such as program OUT and IN points.

Program OUT and IN points indicate points at which the presentation may exit and re-enter the primary stream. OUT and IN points are assigned by an external mechanism (not pertinent to this discussion).

Unless otherwise noted, blackouts and Dynamically Ad Insertion (DAI) are identical. Where text discusses ads, blackouts may be substituted. Where Alternate Content is used, it refers to either or other uses such as alternate camera angles, etc ...

All of the Linear scenarios below may occur as the result of starting a Player and joining a stream, changing a channel, or during presentation. General Requirements for all Use Cases:

1) Alternate content have to be frame accurate when inserted.
2) Switches between content of potentially different configurations. A common use case is Dolby main and AAC Alternate Content.

Scenario 1 - Live Video, no ads

In this scenario, a manifest represents a window that contains only entertainment fragments. The Player will play main entertainment and no Alternate Content will be presented

Figure 1

Scenario 2 - Live Video, standard switch to Alternate Content

In this scenario, the manifest represents a series of entertainment fragments, and zero or more content fragments representing a default ad in the primary stream. This is typical of a rolling manifest that traverses a segment of content into the leading edge of an ad. The Player will play the content referenced by Alternate Content response when the presentation point reaches the program OUT point. The Player will continue to process the original manifest representing the primary stream to detect the IN tag corresponding to the OUT tag (see scenario 3).

Figure 2

Note: The number of fragments in the Ad may be greater or less then the number of fragments in the primary stream between OUT and IN points. If greater, only the number of ad fragments matching the primary feed will be presented, truncating the ad. If less, a portion of the underlying content in the primary feed, presumably a default ad, will be presented.

Scenario 3 - Live Video, Blackout going back to main entertainment

In this scenario, the manifest represents a series of content fragments, and zero or more entertainment fragments.. This is typical of a rolling manifest that traverses a segment of content and an ad 'slot'. Regions between program OUT and IN points are frequently more than 30 seconds and therefore may not be fully represented in a single manifest. When the presentation point meets the program IN point, the Player will resume presenting content from the primary stream.

Figure 3

Scenario 4 - Live Video, start watching in the middle of Alternate Content point

In this scenario, the manifest represents a series of content fragments, and zero or more entertainment fragments. This is typical of a manifest that is acquired during the middle of an ad 'slot', which may occur during channel change.

Figure 4

Note: a newly acquired manifest that includes an IN tag without a preceding OUT tag is not expected and is considered illegal. The Player must truncate the leading edge of the ad so the remainder fits into the region between the window boundary and the IN point.

Scenario 5 - In-Progress cDVR

In this scenario, the manifest represents a programming event from its start to the current presentation point (the 'live' point), and may contain program OUT and IN points that precede the presentation point. In order to ensure that proper ads are presented if the viewer rewinds, all of the ads must be resolved at the beginning of the presentation.

Figure 5

Scenario 6 - Completed cDVR

In this scenario, the manifest represents an entire programming event, which may contain program OUT and IN points. This scenario is similar to scenario 5 except that the manifest represents the entire programming event, and the presentation point begins at the beginning of the event. Because the presentation is not bound by the inflexible temporal constraints of a live environment, the ads placed into the entertainment content may shrink or grow the overall size of the content. For example, pre and post roll ads that are appended to the beginning or end of the entertainment grow the overall size of the content. Mid-rolls may be inserted into content, thereby growing the overall size, or may be presented instead of existing content. In the latter case, the ad may represent more or less playtime than the content in the primary feed; in any case the entire ad will be presented and the entertainment will resume after the ad, effectively growing or shrinking the presentation time of the asset according to the size of the ad.


Scenario 7 - VOD

In this scenario, the manifest represents an entire programming event, which may contain program OUT and IN points. This scenario is identical to scenario 6.

Implementation Use Cases

This section outlines ad insertion use cases that are not adequately addressed by the current MSE APIs.

Each of the use cases is comprised of a situation where the user wishes to perform a seamless switch from main content to ad content or vice versa. In each use case, one piece of content is identified as content A, and the other is identified as content B. A may be ads and B may be main content, or vice versa. The user wishes to play content A (or a portion of A) then perform a seamless switch to content B (or a portion of B) at a specific time or frame.

For each of the use cases, assume that:

Assumption 1) A seamless switch is defined as continuous playback where no visible/audible pause during the transition from A to B occurs whether due to late buffering, late codec initialization, or other causes.
Assumption 2) User agent is given enough advance notice that a seamless switch is theoretically possible.
Example: User would like to perform a switch from content A to content B in 15 seconds. Assuming decent network conditions, this should be enough time to fetch the content, buffer it, and schedule a seamless switch.
Assumption 3) A and B may have been encoded with different encoders or with different settings that cannot be synchronized. This may occur for reasons such as:
a) A and B originate from different sources.
Example: A is primary content that originates from an MVPD. B is ad content that originates one of many vendors outside the MVPD. The encoder models and settings used to encode A and B may differ since it is impractical for the MVPD to synchronize encoder settings and models across all ad vendors.
b) A and B were encoded at different times.
Example: An MVPD encodes A using encoder model M1. At a later time, the MVPD decides to begin using encoder model M2 across their organization. The MVPD encoders content B using M2. Details such as track IDs may differ between M1 and M2. Due to the size of the MVPD's content library, it is impractical to re-encode all legacy content.

Use case 1) Seamless switch between ads and main content that contain different number of tracks

Priority: HIGH

User wishes to play content A (or a portion of A) then perform a seamless switch to content B (or a portion of B) at a specific time or frame.

A is multiplexed content and contains x tracks of type T. B is multiplexed content and contains y tracks also of type T, but x != y. All other factors (codec, etc) are consistent between A and B.

Example: A is main content with English/Spanish audio, B is an ad with only English. Ad content often contains a smaller number of language tracks than main content.

Use case 2) Seamless switch between ads and primary content that use different codecs

Priority: HIGH

A uses codec C1. B uses codec C2. All other factors (track IDs, etc) are consistent between A and B.

Examples:

  • A is main content with H264 video and Dolby audio. B is an ad with H264 video and AAC audio. Dolby support is common among set-top boxes. Primary content is often encoded with Doby audio. Dolby audio is not as common among ad content.
  • A is main content with HEVC video and AAC audio. B is an ad with H264 video and AAC audio. As HEVC gains adoption, ad and primary content will often be a mixture of HEVC and H264.

The use case of switching across profiles and levels does not typically cause issues since "higher" profiles/level are generally supersets of lower profiles/levels.

Use case 3) Seamless switch between ads and primary content that use different byte stream formats

Priority: HIGH

A is of a different byte stream format from B.

Example: A is main content formatted as m2ts. B is ad content formatted as mp4. This is common when dealing with legacy VoD content.

Use case 4) Seamless switch between ads and primary content with different track IDs

Priority: LOW

A is multiplexed content with x tracks of type T. B is also multiplexed content with x tracks of type T. The track IDs of A differ from those of B. x > 1. All other factors (codec, etc) are consistent between A and B.

Example: A is main content with English/Spanish audio with track IDs 1 and 2. B is an ad with English/Spanish audio with track IDs 1 and 3. Since ad content is typically encoded independently from primary content and track IDs are not standardized, there is no guarantee they will use consistent schemes for designating track IDs.

Use case 5) Seamless switch between ads and primary content, where one is multiplexed and the other is demultiplexed

Priority: LOW

A consists of demultiplexed tracks. B consists of multiplexed tracks.

Specification Gaps

This section outlines gaps in the current specification in addressing the above use cases.

The use cases encounter difficulty due to step #3 of the initialization segment received algorithm (Section 3.5.8). Step #3 reads:

3. If the first initialization segment received flag is true, then run the following steps:
1. Verify the following properties. If any of the checks fail then run the end of stream algorithm with the error parameter set to "decode" and abort these steps.
* The number of audio, video, and text tracks match what was in the first initialization segment.
* The codecs for each track, match what was specified in the first initialization segment.
* If more than one track for a single type are present (ie 2 audio tracks), then the Track IDs match the ones in the first initialization segment.

A possible workaround (Workaround A) might be to utilize multiple MediaSource objects, but this approach has multiple issues:

  1. Initiating the switch using TextTrackCue is not guaranteed to be frame accurate. Testing suggests that current implementations are not frame accurate.
  2. The user agent is unaware of the switch until the time of the switch and therefore cannot perform buffering or decoder initialization ahead of time.
  3. User Agent implementations may not support multiple MediaSource objects. Section 2.2 specifies this as a "quality of implementation issue."

A second workaround (Workaround B) might be to intiialize SourceBuffers of all possible audio and video codecs. This approach has multiple issues:

  1. Issues #1 and #2 of Workaround A still apply.
  2. User Agent implementations may not support enough SourceBuffers. Section 2.2 specifies this as a "quality of implementation issue."
  3. The number of required SourceBuffers may be unmanageably large, especially when considering use cases that involve a combination of multiplexed/demultiplexed content and multiple possible codecs.
  4. The complexity required for a user to implement this solution is high.

A third workaround (Workaround C) might be to add a JavaScript container format parser to Workaround B. This approach has multiple issues:

  1. The complexity required for a user to implement this solution is very high. The Javascript code must parse the container format, demultiplex tracks, and remap track IDs in order to work around model limitations.
  2. For the multi-codec use case, issues #1-4 of Workaround B still apply. This approach reduces the number of necessary SourceBuffers, but potentially requires more SourceBuffers than may be supported. Switches are not guranteed be frame accurate.

Comments

Some feedback from members of the Multi-Device Timing Community Group: