TTML/changeProposal025

From W3C Wiki
Jump to: navigation, search

< Change Proposal Index

Distribution - OPEN

  • Owner: Nigel Megitt.
  • Started: 16/01/14

Issues Addressed

Summary

There are two scenarios in which TTML samples, i.e. short snippets bounded in time, are needed, which collectively relate to a single entity.

The first is for live authoring, in which an authoring station may need to create and emit a series of short documents for display with low latency. The second is for distribution, in which an encoder or packager may need to divide a longer document into a series of short documents for managed data rate distribution to an onward chain. An example of this would include use in MPEG DASH.

In the live case there is a follow-on use case, which is to permit an archival device to accumulate a set of TTML documents together to form a larger one, i.e. one whose temporal extent is the union of the temporal extents of the input documents.

In the distribution use case there is also a follow-on use case, which is to provide direction to presentation processors to enable them to display the series of documents with no visible join between them, i.e. as though they were received as a single document. There may be a need in this case to provide some guidance, either in the specification or in the contents of the provided documents, to minimise the implementation cost (at design time and at run time) associated with this use case.

At first glance TTML can already handle these scenarios by simply creating small documents. However there is no support for chained related documents. For example the use of xml:id expresses a uniqueness requirement that is per-document not per-set-of-documents.

This problem may be generalised where documents need to be combined for other purposes, for example rather than temporal sampling there could be spatial sampling, perhaps where content in different documents relates to different regions.

A starter document for this (including errors) was posted to the reflector at 1.

Simplified use cases

  1. In a live authoring context, a set of TTML documents needs to be combined efficiently into a single larger TTML document using an ‘accumulating’ processor.
  2. In a distribution context, a TTML document needs to be segmented efficiently into a set of shorter TTML documents.
  3. A distribution processor (i.e. a piece of consumer equipment) needs to be able to process a sequence of temporally ordered TTML documents that may overlap temporally with minimal processing required between documents.

Proposal

Modify the scope of uniqueness of element identifiers to allow groups of documents to have elements that are identified as being the same, and to specify the rules for identity that apply. This is at the syntactic level, but taking advantage of knowledge of the structure of TTML, i.e. it’s not something that could just be punted over to XML (which doesn’t deal with this scenario anyway).

This proposal would allow documents from the same group to be combined. Segmenting documents is straightforward sub-selection from the originating document. Distribution processors can assume that any element in the same group with the same identifier, in the <head>, is the same, and potentially shortcut pre-processing of regions, styles etc.

Syntax of group identifier

The group identifier does not affect general processing of TTML documents, and indeed has no effect at all on processing of a single document. It is therefore considered to be metadata.

  • Metadata attribute ttm:documentGroup
  • Considered significant on the tt:tt element only
  • Optional
  • Type: NCName from [XML Schema Part 2]
ttm:documentGroup : xsd:NCName

Rules for combining documents with the same group identifier

  • A group identifier on the <tt> element extends the uniqueness scope of identifiers to all documents with the same group identifier.
    • xml:id shall only be duplicated on elements considered identical in different documents with the same group identifier if this does not break the rules here.
  • Identified elements in <head> must be identical, in attributes, content and descendants.
    • <metadata> children of identified elements may differ in their descendants and contents.
    • For example, if an element from the animation vocabulary were added to an identified region in the head, this may cause the presentation to differ in the resultant documents. This is therefore not permitted.
  • Non-identified elements in <head> whose maximum cardinality is 1 (i.e. styling, layout and animation) may have their contents combined: identified descendants must be identical, in attributes and content; any non-identified descendants with unlimited maximum cardinality are duplicated on combination.
  • Identified elements in <body> have a different identity test.
    • Attributes must be identical, i.e. the set of specified attributes must be the same and the values of each specified attribute must be the same.
    • Any identified descendants must be the same, using this identity test.
    • Content must be identical (if present).
    • Different identified and non-identified descendants may be present.
    • Non-identified elements are always treated as differing from otherwise apparently identical elements in other documents in the group, i.e. they are duplicated on combination.

Algorithm for combining 2 documents with same group identifier

function CombineDocuments(A, B) returns ttmlDocument
{
  if (!exists(A.tt.attributeSet[“groupIdentifier”]) || !exists(B.tt.attributeSet[“groupIdentifier”])) exit this process with error; // both docs must have a group identifier
  if (A.tt.attributeSet != B.tt.attributeSet) exit this process with error; // includes groupIdentifier attributes
  // define attribute set equality comparison as checking that same attributes are present in both and every one has the same value.

  C = new ttmlDocument;
  C.tt.attributeSet = A.tt.attributeSet;
  C.head = CombineElementsWithStrictUniqueness(A.head, B.head); // assuming both have a head element
  if (exists(A.body) && exists(B.body)
  {
     if (A.body.attributeSet != B.body.attributeSet) exit this process with error; // the attribute set includes xml:id if present
     C.body = CombineElementsWithLaxUniqueness(A.body, B.body);
  }
  else if (exists(A.body))
  {
     C.body = A.body;
  }
  else if (exists(B.body))
  {
     C.body = B.body;
  }
  return C;
}
function CombineElementsWithStrictUniqueness(A, B) returns Element
{
  if (typeof(A) != typeof(B)) exit this process with error;
  if (A.attributeSet != B.attributeSet) exit this process with error;
  if (!deepEqual(A.identifiedChildElements, B.identifiedChildElements)) exit this process with error; // deepEqual checks everything except metadata
  if (A.pcdata != B.pcdata) exit this process with error;
  C = new Element(typeof(A), A.attributeSet);

  // Combine metadata with lax uniqueness first
  if (exists(A.metadata) && exists(B.metadata))
        C.metadata = CombineElementsWithLaxUniqueness(A.metadata, B.metadata);
     else if (exists(A.metadata))
        C.metadata = A.metadata;
     else if (exists(B.metadata))
        C.metadata = B.metadata;

  // [NM] This needs a fix: non-identified elements with maximum cardinality 1 such as layout and
  // styling should allow their contents to be additively combined pruning duplicated identified
  // descendants, and checking that those duplicated identified descendants are strictly identical.
  // In the similar case of identified elements with maximum cardinality 1, should different xml:id
  // values on those elements be ignored or used to flag an error state?

  // Combine remaining children with strict uniqueness
  for each element E in A
  {
     if (typeof(E) != “metadata”)
        C.appendChild(E);
  }
  for each element E in B
  {
     if (E.id is null and typeof(E) != “metadata”) then C.appendChild(E);
  }

  // Copy content
  C.pcdata = A.pcdata;

  return C;
}
function CombineElementsWithLaxUniqueness(A, B) returns Element
{
  if (typeof(A) != typeof(B)) exit this process with error;
  if (A.id == null || B.id == null || A.id != B.id) exit this process with error; // non-identified and differently-identified elements can’t be combined but must be appended
  if (A.attributeSet != B.attributeSet) exit this process with error;
  if (A.pcdata != B.pcdata) exit this process with error;
  C = new Element(typeof(A), A.attributeSet);
  for each element E in A.children
  {
     if (E.id == null || !exists(B.children[E])) // children[E] means ‘indexed by identifier of E’
     {
        C.appendChild(E);
     }
     else
     {
        if (exists(B.children[E]))
           C.appendChild(CombineElementsWithLaxUniqueness(E, B.children[E]);
     }
  }
  for each element E in B.children
  {
     if (E.id == null || !exists(A.children[E]))
        C.appendChild(E);
     // We’ve already appended the combined children that are also present in A.
  }

  C.pcdata = A.pcdata;
  return C;
}

Edits to be applied

  • New attribute on tt:tt groupIdentifier to identify a group - no need for document uniqueness since it's unique for a group of documents, so just an id.
  • New normative section expressing rules for combining documents in a group with the same identifier, as per rules and algorithm agreed (see proposal above).
    •  ?? The algorithm could additionally or alternatively be expressed as an XSLT document or even an XPROC pipeline with embedded XSLT at the expense of general readability but with the advantage of being well defined, rather like the difference between an infoset and an XSD schema.
  • New non-normative section describing potential processing shortcuts that implementations could take if they wish to rely on strict adherence to the groupIdentifier rules.
  •  ?? Do we need a section on splitting documents into sub-documents? Not sure why this would be needed. If so, normative or informative?
  •  ?? We should consider removing xml:id from elements whose maximum cardinality in the document is 1, e.g. tt.head.[styling|layout|animation] etc. since this the entity is already uniquely defined structurally; otherwise we have a potential problem if two documents in the same group have different xml:id values on those elements (or an opportunity to flag an error depending on how you see it).

Edits applied

Impact

  • These changes would not affect the structure or implementation of any single TTML2 document.
  • The only change is the addition of a single optional group identifier to the document which does not normatively affect processing of the document.
  • If we remove or deprecate or ignore xml:id on elements with a maximum cardinality of 1 in the document then this could render existing documents invalid.
  • This change helps to meet a currently unmet need for users who create a stream of small TTML documents and wish to combine them using normative rules.

References

[1 Email including attached draft EBU document on segmentation]