Semantic-based Framework for Personalized TV Content Management in a cross-media environment: A Personalized Program Guide on the Web, TV and Mobile

Contact e-mail: P.A.E.Bellekens # tue.nl; l.m.aroyo # cs.vu.nl

Application

General purpose and services to the end user

The main goal of the application is to support the user with a personalized access to distributed TV content (e.g. on the Web, TV Broadcast, local storage collections, and mobile devices). The framework supports two main applications – a Web-based remote control and a home media server (e.g. PVR, set-top box, media center). Both are used in a complementary way. Additional mobile application is also considered. The user searches for and browses TV content, sets up personal preferences and interest level for various topics. The application can recommend TV content to the user according to his accumulated interests and current context (both stored in a user model). The results from the searching, browsing and recommending of content are represented in dynamically generated TV-Anytime packages.

We integrate content from IMDB, Wikipedia, Web-based TV Guides, Movie-list.com. The descriptions of these collections follow different metadata schemes and contain values from different controlled vocabularies for content indexing.

Functionality examples

Browsing (“zapping”) TV content: The TV-Anytime packaging concept allows the user to “zap” TV- and related content beyond traditional channels, programs and providers. The content is organized in packages according to topics of user’s interest (e.g. Billy Wilder). In one package the user can navigate through different types of content (e.g. TV programs, pictures, audio and video files) coming from either the Web, broadcast or local storage. For example, the package on Billy Wilder might contain movies directed by him, posters from his movies, bio notes from Wikipedia. Next to this, a package also contains content about related topics (e.g. Ernst Lubitsch, Greta Garbo, Dean Martin, comedy, 1920-1960 movies, black&white).
Searching by textual queries: The user can directly query for content (e.g. “news London”, “golf 1940”, “documentary Alfred Hitchcock”, “movies tonight”). The querying is done on different fields describing the content (e.g. title, creator, year, genre, language, location, …). The search results are presented as TV-Anytime packages organized according to concepts from common vocabularies (e.g. TV-Anytime CS, Time Ontology, Geo Ontology, WordNet).
Linking common vocabularies: Linking thesauri like WordNet, TVA-CS, GeoNet, Time ontology allows us to refine the search query (increase recall and precision of the results) and find relevant content.
- WordNet - find synonym and alternative forms of query terms, e.g., “weather” = {“weather report”, “weather forecast”, etc.}.
- Geo Ontology – find related geographical areas, e.g., “London” = {City of London, Camden, Westminster, Greenwich, Greater London, England, UK}.
- Time ontology - determine temporal context e.g., “tonight” = {18:00 – 24:00}, or “this week” = {10/02 – 17/02}.
- TVA-CS - find related genres (e.g. “sports” = {sport reports, sport live, sport news, sport documentary, football game, etc.}

Application architecture

(See figures on original description, RucPersonalizedTv)

A web server handles user queries from a web browser and contains an interface generator for web pages. A back-end server gets the content related to the user request and responds with an on-the-fly created content package. A User Modeling Server keeps track of all user profiles and the contexts there are used in. An Ontology Server maintains the mappings to the vocabularies. A Metadata service maintains content metadata according to the TV-Anytime specification. The CRID Authority maintains unique identifiers of content elements.

Within this overall architecture, a semantic-based framework has been defined for personalized access of TV content. It consists of four layers: Content Retrieval & Serving layer = retrieval, transformation and integration of content from distributed sources and in various formats. Package Handling layer = clustering content into packages and resolving them via the CRID Authority. Personalization layer = refines user queries and filters content according to user profiles, context and stereotypes. Application Server layer = identifying users and devices.

Special strategies involved in the processing of user actions

(See figures on original description, RucPersonalizedTv)

The hierarchies of the TVAnytime, GeoNet and Times, as well as the WordNet synsets are used for automatic refining of user queries. Once the result is presented to the user, the related sub-sets of those hierarchies are also presented to the user. The user then can browse terms in those hierarchies (see Fig. 3) and find the appropriate concepts he is interested in. In this way he filters only the relevant content (e.g. from sports content, he can choose only the ones on swimming in the news reports after 18:00).

Integration between vocabulary-linked functions and other application functions

(See figures on original description, RucPersonalizedTv)

There is a Google-like free-text search interface - for each user query the system will add matches, proposals and alternatives from the vocabularies and from the current context and user models. If the user tries to search with an empty text input field, the system will interpret this as a request for a recommendation and try to find content fitting the user’s current context.

Vocabulary 1

Title

TV-Anytime ContentCS

General characteristics (size, coverage) of the vocabulary

TVAnytime ContentCS contains a hierarchy of television genres. The RDF/OWL version developed for the project, which only contains ‘subclassOf’ relations, is a translation of an XML term hierarchy. Currently it contains 685 classes.

The main categories of the TVA hierarchy are:

3.1 NON-Fiction
3.1.1 News
                3.1.1.9 Sports
        ...
3.2 Sports
        3.2.1 Athletics
        3.2.2 Cycling
        ...
3.3 (deprecated does not exist anymore)
3.4 Fiction
3.5 Amusement
3.6 Music
3.7 Interactive Games
3.8 Leisure/Hobby/Lifestyle
3.9 Adult

Language(s) in which the vocabulary is provided

English.

Vocabulary extract

<owl:Class rdf:ID="3.1.1">
      <rdfs:subClassOf rdf:resource="3.1"/>
      <rdfs:label xml:lang="en">News</rdfs:label>
      <rdfs:comment xml:lang="en">Time-sensitive information</rdfs:comment>
</owl:Class>
<owl:Class rdf:ID="3.1.1.1">
      <rdfs:subClassOf>
         <owl:Class rdf:about="3.1.1"/>
      </rdfs:subClassOf>
      <rdfs:label xml:lang="en">Daily news</rdfs:label>
</owl:Class>
<owl:Class rdf:ID="3.1.1.2">
      <rdfs:subClassOf rdf:resource=" 3.1.1"/>
      <rdfs:label xml:lang="en">Special news/edition</rdfs:label>
      <rdfs:comment xml:lang="en">one off program to carry specific news events e.g. coverage of breaking news such as train crash</rdfs:comment>
</owl:Class>

represents:

3.1.1 News
        3.1.1.1 Daily News
        3.1.1.2 Special news/edition

Structure explanation

Originally there are only subclassOf relations in the TVA hierarchy. We can use OWL and SKOS constructs, which can introduce links between related concepts and express the semantics of the genres better:

“Sports” (3.1.1.9) meaning 'sport news' ideally should be related to the category “sport”. Possibly it could be realized with SKOS:related or SKOS:subject / SKOS:primarySubject;
To improve textual matching all names containing ‘/’ or ‘()’ like ‘leisure/hobby’ or ‘Football (Soccer)’ could be split in ‘leisure’ and an alternative label like ‘hobby’ and ‘Football’ with an alternative label ‘soccer’.
Concepts, such as “Football (Soccer)”, “Football (Indoor)”, “Street soccer” and “Beach soccer”, which currently are represented at the same level in the hierarchy could be reorganized with the help of SKOS:narrower and SKOS:broader for instance.

Machine-readable representation of the vocabulary

http://wwwis.win.tue.nl/~ppartout/Blu-IS/Ontologies/TV-Anytime/PhaseI/Classifications/ContentCS.owl

Structure of the database used to currently manage the vocabulary

The vocabulary is stored in a Sesame RDF triple database.

Standards and guidelines considered during the design and construction of the vocabulary

Guideline 1 in http://www.cs.vu.nl/~mark/papers/Assem04a.pdf (converting thesauri to RDF/OWL ontologies) was followed.

Management of changes

Changes are made through direct editing. We do not divert from the original TV-Anytime specification and stay backward compatible with it.

Vocabulary 2

Title

Time ontology, Based on http://www.isi.edu/~pan/damltime/time.owl

General characteristics (size, coverage) of the vocabulary

The original Time ontology provided the means to construct time-specifications. We extended it with the instances needed for our application.

Vocabulary extract

A instance to exemplify the concept “morning” (mind some abbreviations to preserve outline):

<ProperInterval rdf:ID="morning">
        <rdfs:label>morning</rdfs:label>
        <begins rdf:resource="#instant_MorningStart"/>
        <ends rdf:resource="#instant_MorningEnd"/>
        <before rdf:resource="#noon"/>
</ProperInterval>

<Instant rdf:ID="instant_MorningStart">
        <inCalendarClock>
                <CalendarClockDescription rdf:ID="cCD_MorningStart">
                        <second rdf:datatype="XMLSchema#decimal">00</second>
                        <minute rdf:datatype="XMLSchema#nonNegInt">00</minute>
                        <hour rdf:datatype="XMLSchema#nonNegInt">06</hour>
                </CalendarClockDescription>
        </inCalendarClock>
</Instant>

<Instant rdf:ID="instant_MorningEnd">
        <inCalendarClock>
                <CalendarClockDescription rdf:ID="cCD_MorningEnd">
                        <hour rdf:datatype="XMLSchema#nonNegInt">10</hour>
                        <minute rdf:datatype="XMLSchema#nonNegInt">59</minute>
                        <second rdf:datatype="XMLSchema#decimal">59</second>
                </CalendarClockDescription>
        </inCalendarClock>
</Instant>

Structure explanation

For the structure of the Time ontology refer to: http://www.cs.rochester.edu/~ferguson/daml/

Machine-readable representation of the vocabulary

http://wwwis.win.tue.nl/~ppartout/Blu-IS/Ontologies/Time/time.owl

Structure of the database used to currently manage the vocabulary

The vocabulary is stored in a Sesame RDF triple database.

Management of changes

Changes are made through direct editing.

Vocabulary 3

Title

Geo ontology

General characteristics (size, coverage) of the vocabulary

The Geo ontology gives a simple structure where geographical locations are divided in country-province-city relations. The ontology is far from complete. It contains data for the following countries: the Netherlands, Denmark, the United Kingdom, Belgium, Spain, Greece and France. For every country all the provinces are present, but for every province only the main cities were added.

Vocabulary extract

<Country rdf:ID="TheNetherlands">
    <hasProvince>
      <Province rdf:ID="ZeelandProvince">
        <hasCity rdf:resource="#Sluis"/>
        <rdfs:label rdf:datatype="XMLSchema#string">Zeeland</rdfs:label>
      </Province>
    </hasProvince>
</Country>
<City rdf:ID="Sluis"/>

Machine-readable representation of the vocabulary

http://wwwis.win.tue.nl/~ppartout/Blu-IS/Ontologies/Geo/Europe.owl

Structure of the database used to currently manage the vocabulary

The vocabulary is stored in a Sesame RDF triple database.

Management of changes

Changes are made through Protégé.

Vocabulary 4

WordNet:

http://www.w3.org/2006/03/wn/wn20/rdf/wordnet-synset.rdf

http://www.w3.org/2006/03/wn/wn20/rdf/wordnet-glossary.rdf

http://www.w3.org/2006/03/wn/wn20/rdf/basic/wordnet-senselabels.rdf

Vocabulary mappings

Mapped vocabularies

In the picture presented in the attached document (RucPersonalizedTv, section 3), one can see all the links from a certain content item described in TVA to other vocabularies like WordNet, Time, Geo and the content classification hierarchy: the ranges of the different properties whose domain is TVA:ContentItem have as range the classes whose instances compose the different vocabularies of the application. title and keywords have Wn:synset as range, while productionDate and genre have respectively time:Instant and TVA:ContentCS as their range.

What the picture does not show is the interrelation between WordNet and all the other vocabularies at the value level, which allows for example to check every location of the Geo and every genre in the ContentCS hierarchy in WordNet to find synonyms.