Add new post

Multimedia Tracks, Data and Metadata

Metadata standards, extensible ontology, vocabulary and API for multimedia track metadata, such as XMPMPEGMatroska and WebM, can provide enhanced viewing experiences and features based upon multimedia metadata.  Describing multimedia tracks with metadata enhances the uses of tracks, of track-based data, and the portability of multimedia files.  Web browsers and multimedia software can provide enhanced viewing experiences and features based upon multimedia metadata and including for multi-device scenarios as broached in: Argumentation Scenarios and Use Cases: Web and Television, Speeches, Presentations, Discussions and Debates.

A use case is that of multimedia presentations.  Presentation scenarios are numerous, including digital education.  Presentation videos might have a number of video tracks: (1) video of presenter, (2) video of a presentation surface, (3) video with cinematography between presenter and presentation surface. HTML5 supports synchronizing multiple media elements for simultaneous rendering such as video tracks (1) and (2). A fourth track, however, could be (4) side-by-side of tracks (1) and (2). Some of the combinations of presenters and presentation surfaces, as aforementioned, are referred to as enhanced video.

A 2011 article, HTML5 multi-track audio or video, about media resources with multiple media tracks, indicates the use of mediagroups. For example:

<video id="v1" poster="presenter.png" controls mediagroup="presentation">
 <source type="video/mp4" src="video.mp4#track=v1&track=a1">

<video id="v2" poster="presentation.png" controls mediagroup="presentation">
 <source type="video/mp4" src="video.mp4#track=v2&track=a2">

The example multimedia object, video.mp4 has four referenced tracks: v1, v2, a1 and a2. Track metadata can be useful for programmatic uses of multimedia tracks, where track identifiers, such as v1 and v2, are not indicative of the semantics of the track content, such as presenter and presentation. URI-based track metadata, from an extensible ontology and vocabulary, could indicate audio, video and data track contents for scenarios including enhanced video, multiple camera angle video, multiview video, free viewpoint video and 3D video and so as to increase the portability of the multimedia across web pages.

With URI-based track metadata, multimedia software could recognize multimedia track structure (see also: AudioTrackList and VideoTrackList), such as tracks of presenters and presentations, and from URL’s such as to video.mp4, to provide features, ergonomics and intuitive navigation.

Beyond features possible from XHTML transcripts, multimedia tracks can include: XML, RDF, temporal XML and RDF data, and other data pertaining to 3D geometry and animations (see also: as well as data pertaining to multiple camera angles, multiview video, free viewpoint video and 3D video. Utilizing track metadata, selections of multimedia can provide data in multiple clipboarding formats.

Extensible semantic metadata ontology, vocabularies, including the expressiveness of XMP, MPEG, Matroska and WebM, and JavaScript API can facilitate enhanced features, uses of tracks, of track-based data, and the portability of multimedia objects.


Semantics and Selectors

Semantics enhances the selection and styling of content; varieties of semantic selection include: (1) selecting upon URI items in white space separated lists of TERMorCURIEorAbsIRI values, (2) selecting upon parallel markup structure and reference combinators, and (3) graph-based selections with SPARQL expressiveness.

Selecting upon URI items in white space separated lists of TERMorCURIEorAbsIRI values, such as @xhtml:role, @rdf:type, @rdfa:typeof or @epub:type, could be expressed with a syntax resembling:

x|element[x|attr ~= uri(x|value1)][x|attr ~= uri(x|value2)] { ... }
x|element[x|attr ~= uri(x|value1)], x|element[x|attr ~= uri(x|value2)] { ... }
x|element:matches([x|attr ~= uri(x|value1)], [x|attr ~= uri(x|value2)]) { ... }
x|element:not([x|attr ~= uri(x|value1)]) { ... }

An example of selecting upon parallel markup structure, e.g. MathML content markup and parallel markup, and reference combinators:

annotation-xml[encoding="..."] ... /xref/ mo { ... }

Ontologydescription logic and semantic reasoning can enhance the functionality of selection based upon URI items in TERMorCURIEorAbsIRI attribute values, selection based upon the parallel markup structure and reference combinators and of graph-based selection, with an expressiveness resembling that of SPARQL, as broached in Document and Package Semantics and Metadata.

Document and Package Semantics and Metadata

Linguistic and semantic annotations, rhetorical structure and argumentation formats are some of the numerous scenarios where data or metadata are desired in addition to document trees, e.g. SSML and XHTML documents.  In SSML contexts, such data can facilitate prosodic speech synthesis and, in XHTML contexts, many new features are possible.

A solution for document and modular document component semantics is a document object model interface, e.g. document.semantics, a graph-based interface.  The contents of such a graph could be:

  1. From content regions in a document as per: <script type="application/rdf+xml">...</script> or <semantics type="application/rdf+xml">...</semantics>.
  2. Linked to from a document as per: <script type="application/rdf+xml" src="..." /><semantics type="application/rdf+xml" src="..." /> or <link rel="semantics" type="application/rdf+xml" href="..." />.
    1. A @rel attribute could vary processing or map graphs to resultant graphs; <semantics rel="annotation" type="application/rdf+xml" src="..." /> or <link rel="semantics annotation" type="application/rdf+xml" href="..." /> could map graph data to or from an annotation ontology.
  3. Inferred from or processed from other document content including: document markup semantics, structural semantics, attributes such as @xhtml:role, @rdf:type, @rdfa:typeof or @epub:type, microformats and RDFa.

Documents can interface as both trees and graphs.  A graph dataset could be derived from a document object model tree dataset, programmatic changes through a tree-based document object model could be reflected in graph-based data; a tree dataset could be derived from a graph dataset, changes through a graph-based API could be reflected in tree-based, document object model, data.

For modularity, object elements could have a semantics component and so too could custom elements. Web components could include a means of specifying such semantics in addition to styling and scripting. XML preprocessing can output semantic graphs including utilizing parallel markup.

In addition to a document semantics and metadata interface, an interface could reference package semantics metadata, as described in OpenDocument 1.2, Part 3: PackagesChapter 6: Metadata Manifest.

Enhanced features include semantic reasoning upon graph-based data and the Web-based and desktop-based indexing, search and retrieval of such data and metadata, the data and metadata of document packages, documents, document components and multimedia.  Furthermore, by expanding document object models to include document semantics, implementations of semantic selectors can be facilitated.

XML Preprocessing and XSLT Processing Models

There has been interest in dynamic or parameterizable XSLT imports and includes.  XML preprocessing, XML macros and XSLT-enhanced XML includes, facilitates such expressiveness.

For example:

  <schema xmlns="">
  <transform xmlns="">
    <template match="...">
      <element name="include" namespace="...">
        <attribute name="href" namespace="...">
          <value-of select="..." />

such that:

<xmlmacro href="file1.xslt">
  <xmlmacro href="file2.xslt">
    <xmlmacro href="file3.xslt">

describes and expands into a structure as per iterative processing and the iterative processing of XML preprocessing facilitates dynamic or parameterizable XML, XSLT-enhanced XML, and XSLT includes.

XSLT processing models are topical to XML preprocessing and, in addition to heuristics from other preprocessing models, advanced functionalities are possible from parallel processing, where each processing context is as a concurrent thread and can access a document object model, including traversal between macros and includes and macro expansions and included content, and where concurrent processing contexts can exchange messages.  Such concurrency facilitates advanced scenarios, e.g. layout or rendering engine logic and grammatical processing scenarios such as the grammatical framework.

For those interested, the topics pertain to: preprocessing, rewriting systems, string rewriting systems, term rewriting systems, graph rewriting systemsLindenmayer systems, parallel rewriting systems, process calculi and trace theory.

Also topical to macro expansion is outputting multiple subtrees and such that concurrent processing contexts can output @xref attributes referencing elements between subtrees:

    <!-- processing context output subtree 1 -->
    <!-- processing context output subtree 2 -->

Argumentation Scenarios and Use Cases: Computation

A use case category for models of argumentation, formats and ontology, is that of general-purpose computation. Pertinent topics include serializing and deserializing data structures to and from argumentation formats and the utility of such data structures for general-purpose computing. Accordingly, the expressiveness of claims should include that of lambda calculus and of abstract syntax trees.

function(…, IArgument** argument)

Topical are the Curry-Howard correspondence, program semantics, axiomatic, denotational and operational semantics, mappings from the structures of programs, subroutines in programs, to argumentation for outputs from inputs, such mappings to programs and subroutines which generate, in addition to outputs from inputs, argumentation for outputs from inputs, metaknowledge, metareasoningmetalogic programming and metaprogramming.  Some programming languages, such as logic programming and functional programming languages, include such expressiveness and other programming languages’ compilers could, for instance via program transformation, generate both function(…) and function(…, IArgument** argument).  Some scripting environments and runtime environments with suitable runtime reflection could, additionally, provide such functionalities.

In addition to mechanically generating argument structure based upon programming language structures or annotated structures, there is the programmatic, or manual, construction of resultant argument structure. The semantics of function calls could be topical to approaches as pertinent to the construction of resultant argument structure from the argument structures returned by subroutines.

Argumentation Scenarios and Use Cases: Web and Television, Speeches, Presentations, Discussions and Debates

Speakers and presenters have made use of technologies, for instance wall displays in meeting rooms and video walls in auditoriums, and we can envision speeches and presentations designed for multiple devices with enhanced content for and enhanced interactivity for audience members with mobile devices.  Additionally topical are multiple devices with live and prerecorded speeches, presentations, discussions and debates.

Some use cases include:

  1. Navigating video presentations, e.g. a table of contents.
  2. Viewing both presenters and presentation slides.
  3. Hypertext-based, multimedia transcripts.
  4. Synchronized hypertext documents.
  5. 3D models, e.g. products and product features.
  6. Interactive infographics, e.g. business data.
  7. Files, documents or reports, videos or video clips, arriving at the start of presentations or presentation topics, indicated as content that the audience should already be familiar with.
  8. Files, documents or reports, videos or video clips, arriving at the end of presentations or presentation topics, indicated as for follow-up reading for interested audience members.
  9. Links to web content, files, documents and video, hypervideo, where there exist various hypervideo hyperlink navigation options.

In addition to the contexts of speeches and presentations, applications of technology to discussions and to debates are numerous.  Tablet computers can record and process audio, for example speech recognition, including from multimedia, while also having multi-touch and stylus input features, combinations of which can facilitate real-time note-taking, for instance flow diagramming, of arguments and argument structure during debates by participants as well as by audience members.

Argumentation technology can equip and empower both orators and audiences of live and prerecorded video of speeches, presentations, discussions and debates with tools to conveniently take notes, to access and analyze data, to interact in new ways, and to conveniently perform argument reconstruction, ascertaining arguments and argument structures.

The Argumentation Community Group can discuss the scenarios of multiple devices for speeches, presentations, discussions and debates in auditoriums, during teleconferencing, and with live and prerecorded video, and possibly towards a document, a repository of use cases.

Argumentation Scenarios and Use Cases: Web and Television

The scenarios for and use cases of argumentation formats, ontologies, and software, including argument mapping and visualization software, are numerous.  So too are the scenarios and use cases, as well as human-computer interaction topics, for linguistically or semantically annotated content, data-enhanced documents and multimedia.

As per linguistic and semantic annotation, our ontological models will be of use for describing natural occurrences of all forms of argumentation, argumentation from various corpora, and of use for the annotation of text, hypertext and multimedia content.

On the topic of multimedia content, I would like to complement the report Repository of Existing Business-Level Use Cases for TVs In Tandem With Other Screens that Enrich Programs and Commercials via the Web and to indicate that the report has been useful for indicating topics including to colleagues at C-SPAN and PBS.

In addition to the topics, scenarios, and use cases in that report, I would like to broach for discussion a number of topics including: video formats and interactive 3D graphics, educational television, C-SPAN, PBS, debate television, news television, and audience comments and feedback topics.

On the topic of new combinations of video and interactive 3D graphics, educational television and science television have often made use of computer graphics (e.g. scientific visualization) and, as video is increasingly rendered on devices with graphics cards, new video formats can facilitate combinations of video and interactive 3D graphics.  We can explore and develop new video streaming and storage formats (e.g. MPEG, MPEG-4 includes 3D graphics capabilities) for use cases and scenarios where video players can, in addition to rendering 3D graphics, facilitate interactivity, including with speech recognition.  Such interactivity can enhance user experiences.

Numerous children’s television shows include, towards interactivity, pauses after characters ask questions or present quizzes.  With multiple device scenarios, with interactive 3D graphics and speech recognition features, interactive television shows (e.g. Blue’s Clues and Dora the Explorer) can be even more interactive and educational.

On the topic of C-SPAN, there are exciting possibilities with regard to C-SPAN content and websites.  Some conversation-starting ideas pertaining to C-SPAN content include: transcripts, transcript-based navigation, transcript analysis, summarization, summarization-based navigation (e.g. table of contents into video), topics and subtopics, entity extraction, hyperlinks into the C-SPAN video archive and to other websites, as well as speakers’ presentations, data items, infographics, or other content.

Topics include enhanced features for both live and recorded content, streaming and stored content, and, in addition to content from the capital and content from civics events, various panels and summits, C-SPAN content includes the shows: America & The Courts, American History TV, Book TV, First Ladies, Local Content Vehicles, Newsmakers, Prime Minister’s Questions, Q&A, The Communicators, and Washington Journal.

On the topic of PBS, a list of PBS shows is available at:

On the topic of debate television, video, the Web and technology, including multi-device scenarios, scholars, scientists and technologists have been interested in these topics and, in addition to some of the aforementioned C-SPAN ideas, research is underway into enhancing debate and debate video with computer technology.

On the topic of news television, numerous innovations are possible including ideas from Repository of Existing Business-Level Use Cases for TVs In Tandem With Other Screens that Enrich Programs and Commercials via the Web and those indicated herein.

On the topic of audience comments and feedback, where users can share viewing experiences with friends and can share viewing experiences publicly, that interactivity and interconnectivity is a tremendous step forward for video-based content and such features can enhance countless user experiences.

The combinations of computing devices, of the Web, with video, with television, are exciting topics.  With any new transformative technologies, there are opportunities for broad discussion and for broad innovation.  I wanted to, again, complement the report and to broach for discussion video formats and interactive 3D graphics, educational television, C-SPAN, PBS, debate television, news television, and audience comments and feedback topics.

Argumentation and Reification

Upcoming argumentation ontological modeling topics include that argumentation claims can include reified RDF statements or triples.  Our upcoming argumentation modeling endeavors include such expressiveness.  The set of RDF statements or triples includes the various relationships between evidence and argumentation structures, the various relationships between argumentation structures, the statements which comprise the structures of evidence and argumentation and the statements which comprise structured knowledge, in general.