Case Study: Enhancing Content Search Using the Semantic Web

Case Study: Enhancing Content Search Using the Semantic Web

Mike DiLascio, Siderean Software, and Justin Kestelyn, Oracle Corporation, USA

March 2007

General Description

The Oracle Technology Network (OTN) is the main source of technical information for the Oracle developer community. The Web site provides access to blogs, podcasts, discussion fora, product documentation, notifications of product releases, and software downloads. The dynamism, richness, and complexity of this information has made it difficult for traditional search techniques to provide access to data of interest.

Siderean Software and Oracle have worked to apply Semantic Web technologies to make data discovery and navigation more effective. The solution is based on the integration of Oracle Secure Enterprise Search and Siderean's Seamark Navigator. The Web site is available at: http://otnsemanticweb.oracle.com

This approach improves information access by aggregating many sources of content through a single portal. This approach enables a unified interface for searching and browsing the rich multi-media data, which is valuable in helping users to identify all information of interest with a single query. Users can also personalize the environment by selecting feed items of interest and by specifying a preferred layout for the information. To help users find the precise information of interest, the system provides visualization approaches such as tag clouds, contributor clouds, and timelines. Figures 1 and 2 show screen snapshots of the enhanced OTN Web site.

To keep the Web site current, the application pulls content of interest from multiple RSS feeds every hour. Seamark enhances the metadata provided by the RSS feed by using a combination of pattern-matching techniques to identify the subject matter of an item. Once the subject has been established, terms from an Oracle proprietary taxonomy are used to record the concepts and entities to which the item refers. As terms for the concepts and entities in metadata are standardized, feeds constructed using this technology tend to be more targeted, with fewer inappropriately included items. Since the same approach of associating metadata is used for all media types, these feeds can deliver relevant content from diverse media platforms.

This Semantic Web-based approach provides the annotation required to support a rich discovery experience based on dynamic navigation of entities and their relationships. Since terms are taken from an ontology, it becomes possible to group content that relates to the same parent concepts. This process, in turn, makes it easier for a user to comprehend many results in search. It also makes it easier to narrow results to a precise topic of interest.

Integration and deployment of the solution was very quick and easy as both Secure Enterprise Search and Seamark are available as Web Services. The OTN solution was deployed in approximately eight weeks. Future enhancements are planned to support additional data types, enable greater personalization, and extend visualization approaches.

snapshot of the Oracle OTN Web site

Figure 1. This figure shows a screen snapshot of the Oracle OTN Web site. Portlets have been chosen by a user to support simultaneous navigation across many repositories. The panel on the left includes facets that are available for navigation, and indicate the number of items available for each link.

screen snapshot highlights the
        results of a search for blogs that include the term 'Java'

Figure 2. The Oracle OTN Web site supports many visualization capabilities along with Semantic Web search and navigation capabilities. This screen snapshot highlights the results of a search for blogs that include the term “Java”, and were written between December 11, 2006, and January 5, 2007. The tag cloud indicates that Brian Duff and Alejandro Vargas have written a lot of these blogs. The bar chart shows the number of blogs that were written on each day over the chosen time period. A list of the search results is highlighted near the bottom of the page. It is possible to view blogs according to dimensions such as time, author, and topic.

Key Benefits of Using Semantic Web Technology