Chair: Eric Miller, W3C Semantic Web Activity Lead
David Wood and Tom Adams, Tucana Technologies, Inc.
The amount of Internet-accessible metadata is increasing rapidly. Much of this data is being published in the World Wide Web Consortium's Resource Description Framework (RDF) format. RDF metadata is directly published by Web logs ("blogs"), news sites (in the form of site summaries) and is the native format for metadata held in PDF documents. Similarly, the amount of enterprise metadata is rapidly increasing. Fittingly, several large commercial and government organizations in Europe and the United States have formally adopted RDF as their standard format for metadata interchange.
This increase in real-world metadata requires metadata repositories that are able to scale to enterprise levels, allow metadata to be distributed across many machines and provide different views of information based on security permissions. Modern enterprises also require features such as integration with existing systems and managability via standard protocols.
The Tucana Knowledge Server (TKS) has been developed to fill this evolving market need. Acknowledging the problems that traditional relational database management systems (RDBMSs) have with storing large quantities of RDF data, TKS implements a native RDF database and consists of high-level APIs, a query engine and an underlying data store. TKS is implemented entirely in Java and is a scalable, distributed, secure, transaction-safe database built on a directed graph data model and optimized for searching metadata.
A single instance of TKS has been used to store 350 million RDF statements. Future work is focused on scaling TKS to billions of statement during 2004, retaining its position as the most scalable RDF data store available. Multiple TKS instances can be combined and treated as a "virtual database", offering another approach towards scalability. Any TKS instance may be used as the entry point for such a "federated" query, and will subsequently query any number of remote servers, collect their intermediate results and join on them to produce a single, coordinated result. Large results will stream to disk as necessary to avoid out-of-memory conditions, and are also transparently streamed to client applications.
TKS implements security using the Java Authentication and Authorization Service (JAAS), allowing security to be outsourced to standard enterprise security providers. JAAS security principles (typically users and groups) are then mapped to permissions in a "security model" internal to the database. Statements in the security model limit read/write/create/delete permissions at a model level.
Commercial usage of the Tucana Knowledge Server will be discussed.
Ryan Lee, Stefano Mazzocchi, MIT
Simile- [1] Is a joint project conducted by the W3C, HP, MIT Libraries, and MIT Computer Science and Artificial Intelligence Laboratory. SIMILE seeks to enhance inter-operability among digital assets, schemas, metadata, and services, by leveraging and extending DSpace [2] and enhancing its support for arbitrary schemas and metadata, primarily though the application of RDF and Semantic Web techniques.
The SIMILE team has put together a prototype to demonstrate these ideas. It takes data from collections from Artstor [3] and OpenCourseWare [4], along with the Library of Congress Thesaurus of Graphic Materials [5], Library of Congress Authority records [6] via a prototype service created by OCLC [7], and the Wikipedia public domain encyclopedia [8], converts this data to RDF using the SKOS [9], VCard [10], Dublin Core [11] and IEEE LOM [12] schemas to represent the data. It then automatically identifies equivalences in the data using Levenshtein distances [13] to produce an OWL file to map between the datasets. This OWL file is then edited to resolve ambiguous equivalences, then the data is loaded into a novel browser that combines both faceted browsing and RDF relational browsing.
We would like to demonstrate this prototype at the Dev Day session, outline the technology that has been used in the prototype, and also discuss further work that needs to be done before this approach can be scaled up to production systems that deal with the data volumes we would expect in real life library deployment.
[1] http://web.mit.edu/simile/www/
[2] http://www.dspace.org/
[3] http://www.artstor.org/
[4] http://ocw.mit.edu/
[5] http://www.loc.gov/rr/print/tgm1/
[6] http://authorities.loc.gov/
[7]
http://www.oclc.org/research/projects/archive/alcme.htm
[8] http://www.wikipedia.org/
[9] http://www.w3.org/2004/02/skos/core
[10]
http://www.w3.org/TR/2001/NOTE-vcard-rdf-20010222/
[11] http://dublincore.org/
[12] http://kmr.nada.kth.se/el/ims/metadata.html
[13] http://www.merriampark.com/ld.htm
Eero
Hyvönen, Kim Viljanen, Samppa Saarela, Eetu
Mäkelä, Mirva Salminen
Helsinki Institute for Information Technology (HIIT),
University of Helsinki
P.O. Box 26, 00014 UNIVERSITY OF HELSINKI, FINLAND
http://www.cs.helsinki.fi/group/seco/
firstname.lastname@cs.helsinki.fi
We present from the developer's viewpoint the portal MuseumFinland -- Finnish Museums on the Semantic Web (http://museosuomi.cs.helsinki.fi). The system is based on seven RDF(S) ontologies consisting of some 10,000 classes and individuals. The underlying knowledge base contains some 4,000 cultural artifacts from the collections of three museums that use heterogeneous museums database systems. In addition, data from a register of archelogical sites in Finland was incorporated in the system.
The goals of developing the system were: 1) Provide the public a global view to the heterogeneous collections in Finland. 2) Provide the end-user with a content-based search-engine for finding objects of interest, and a semantic recommendation system for browsing the collections. 3) Create for the museums a national publication channel for publishing contents on the Semantic Web.
We show how the content from relational databases is converted via XML into RDF(S) for semantic interoperability. Two tools were created for content creation: Terminator for populating the terminological ontology and Annomobile for semi-automatic semantic annotation of database records.
The end-user services were implmented by a new Cocoon-based tool OntoView. This system is based ontwo servers: 1) The Java and Jena-based Ontogator for multi-facet search using ontologies and 2) the Prolog-based Ontodella providing semantic dynamic links for browsing. The RDF/XML query results obtained from these servers are transformed into dynamic web pages using the XSLT pipeline architechture of Cocoon. This approach turned out to be powerful and flexible. For example, an additional interface for using MuseumFinland by a mobile telephone could be created easily.
Dirk-Willem van Gulik, asemantics.com
This paper describes a name resolution mechanism suitable for loosely coupled applications without a central authority - a scenario we expect for many RDF applications where clouds of federated information progressively merge into a bigger cloud. The mechanism is based on Internet standards and it extends the basic name resolution mechanism with functions suitable for management of resources and resource descriptions available in data clouds. The mechanism is used as a common platform in our RDF applications, it drives the RDF gathering and supports the editing processes. The mechanism is presented using a existing production web management system where Web resources belonging to different internal and external providers are described in RDF and integrated into a common application.
Paul Ford, Associate Web Editor, Harpers.org
The Harper's Magazine website, Harpers.org, was built using a hand-coded Semantic Web framework. In this presentation the site's programmer and co-editor, Paul Ford, describes how he made the case for the Semantic Web to Harper's editors, and how problems regarding editing, maintenance, advertising, and design were solved (or not solved).
The presentation will then shift to a demonstration and explanation of work in progress, including the migration of the site from static, pre-cached pages created from an in-memory triple cache to a dynamic, queryable site based on the open-sourced Radar Networks Triplestore. Interface issues, positive and negative reader feedback, and the complexities of managing semantically-tagged content will be discussed.
Peter Haase, Steffen Staab, Frank van Harmelen & SWAP-Team
The advantages of Peer-to-Peer architectures over centralized approaches have been well advertised, and to some extent realized in existing applications: no centralized server, robustness against failure of any single component, scalability both in data-volumes and number of connected parties.
However, the large degree of distribution of Peer-to-Peer systems is also the cause of a number of new problems: the lack of a single coherent schema for organizing information sources across the Peer-to-Peer network hampers the formulation of search queries, and answers to a single query often require the integration of information residing at different, independent and uncoordinated peers. Finally, query routing and network topology are significant problems.
The research community has recently turned to the use of semantics in Peer-to-Peer networks to alleviate these problems. In particular, the use of ontologies and of Semantic Web technologies in general has been identified as promising for Peer-to-Peer systems.
We present the Bibster system [1], an application of the use of semantics in Peer-to-Peer systems. Bibster is aimed at researchers that share bibliographic metadata. Currently, many researchers in computer science keep lists of bibliographic metadata in BibTeX format, that they must laboriously maintain manually, for which they don't have an easy overview, and that has greatly varying quality. At the same time, many researchers are willing to share these resources, provided they do not have to invest work in doing so.
Bibster exploits ontologies in importing data, query formulation, query-routing and answer presentation:
Bibster is fully implemented on top of the JXTA platform, and is about to be rolled out for field testing.
[1] http://bibster.semanticweb.org/
[2]
http://ontobroker.semanticweb.org/ontos/swrc.html
[3] http://daml.umbc.edu/ontologies/classification
Daniel Oberle, Steffen Staab, Rudi Studer, Raphael Volz
Ontologies serve various needs in the Semantic Web, like storage or exchange of data corresponding to an ontology, ontology-based reasoning or ontology-based navigation. Building a complex Semantic Web application, one may not rely on a single software module to deliver all these different services. The developer of such a system would rather want to easily combine different - preferably existing - software modules (e.g. ontology editors and stores, inference engines, crawlers etc.). So far, however, such integration of ontology-based modules had to be done ad-hoc, generating a one-off endeavor, with little possibilities for re-use and future extensibility of individual modules or the overall system.
We present an infrastructure that facilitates plug'n'play engineering of ontology-based modules and, thus, the development and maintenance of comprehensive Semantic Web applications, an infrastructure which we call "Application Server for the Semantic Web (ASSW)" [1]. Existing Application Servers typically comprise functionality like connectivity and security, flexible handling of software modules, monitoring, transaction processing etc. The Application Server for the Semantic Web will help to put the Semantic Web into practice because it adopts and augments this idea for easier development of Semantic Web applications. In addition, semantic technology is used within the server itself what allows us to achieve an even greater functionality than existing Application Servers [2].
We introduce requirements and design decisions leading to the conceptual architecture of an Application Server for the Semantic Web on slides in order to give the audience a better overview. In addition, we would describe our implementation effort, called KAON SERVER, which is part of the KAON tool suite [3] and currently work in progress. The KAON SERVER makes use of the Java Management Extensions (JMX) - an open technology and currently the state-of-the-art for component management - and is developed in the context of WonderWeb [5], an EU IST funded project, whose aims are, among others, a tight integration of existing tools like ontology editors, stores and inference engines. A prototypical client interaction has been realized: The ontology editor OilEd [4] acts as a client and semantically queries the KAON SERVER for required inference engines and RDF stores. We would demonstrate this interaction and also the server's ontology, how it is affected by component deployment, the management console and more.
Current versions of the KAON SERVER can be obtained from the KAON website [3] together with comprehensive documentation and user's guide.
[1] Daniel Oberle, Steffen Staab, Rudi Studer, Raphael Volz Supporting Application Development in the Semantic Web. ACM Transactions on Internet Technology (TOIT) 4 (4). November 2004. to appear
[2] Daniel Oberle, Marta Sabou, D. Richards, Raphael Volz An ontology for semantic middleware: extending DAML-S beyond web-services In OTM 2003 Workshops, volume 2889 of LNCS. October 2003.
[3] http://kaon.semanticweb.org
[4] Sean Bechhofer, Ian Horrocks, Carole Goble, Robert Stevens. OilEd: a Reason-able Ontology Editor for the Semantic Web. Proceedings of KI2001, volume 2174 of LNAI. 2001.
[5] http://wonderweb.semanticweb.org
Jim Hendler, Aditya Kalyanpur, Daniel Krech, Evren Sirin
The Web Ontology Language, OWL, differs from traditional ontology languages in several important ways. Where earlier languages have been used to develop tools and ontologies for specific user communities (particularly in the sciences and in company-specific e-commerce applications), they were not defined to be compatible with the architecture of the World Wide Web in general, and the Semantic Web in particular. As discussed in the OWL FAQ OWL rectifies this by providing a language which uses the linking provided by RDF to add the following capabilities to ontologies:
Yet most of the tools built to date for OWL have derived from traditional ontology work, and don't yet meet the needs of OWL developers working in the Semantic Web environment. The MINDSWAP group has been building a toolkit of free and open-source tools that focus on the design and capabilities of OWL. In this demo we will show some of these tools including:
In addition, we will present a new release RDFLIB, Python library for working with RDF, including an RDF/XML parser/serializer, a TripleStore, an InformationStore and various store backends.
Ralph Swick, W3C / MIT
A typical day in the life of a participant in a W3C activity requires the use of the Web, e-mail, irc, and the telephone. One spin-off project of our Semantic Web Advanced Development work at MIT is to exploit the opportunities presented by combining these systems interactively and in real-time. Zakim-bot and logger-bot (aka RRSAgent) are experiments in interactive meeting support tools that are both producers and consumers of data in the Semantic Web. In this presentation we will describe some of the current interfaces and some of the aspirations for new work.
Aleman-Meza, Amit Sheth, I. Budak Arpinar, Chris Halaschek and the SemDIS team
LSDIS Lab, Computer Science, UGA
The emergent Semantic Web community [SW] needs common infrastructure for evaluating new techniques and software which use machine processable data. Since ontologies are a centerpiece of most approaches, we believe that for evaluating and comparing tools for quality, scalability and performance, and for developing benchmarks for different classes of semantic technologies and applications, the Semantic Web community needs an open and freely available ontology with a large knowledge base (or description base) populated with real facts or data, reflecting real world heterogeneity of knowledge sources. If the use of tools is to be for advanced semantic applications, such as those in business intelligence and national security, then instances in the knowledge base should be highly interconnected. Thus, we present and describe a Semantic Web Technology evaluation Ontology (SWETO) test-bed [SWETO]. In particular, we address the requirements of a test-bed to support research in semantic analytics, as well as the steps in its development, including ontology creation, semi-automatic extraction, and entity disambiguation. SWETO has been developed as part of a NSF funded project using Freedom [Semagix], a commercial product from Semagix based in part on an earlier academic research [Sheth et al 2002], and is being made available openly for any non-commercial use.
Initially, SWETO was developed to be a large scale dataset for testing algorithms for discovery of semantic associations. The schema component of the ontology reflects the types of entities and relationships available explicitly (and implicitly) in Web sources. Given that we have available the use of Semagix Freedom, the selection of Web sources narrowed down to open, trusted sources, with metadata available having (semi-) structured layout for the viability of extraction and crawling. Essentially, with the Freedom toolkit, we created knowledge extractors by specifying regular expressions to extract entities from data sources. As the sources are ëscrapedí and analyzed by the extractors, the extracted entities are stored in appropriate classes in an ontology. Given that we extracted semantic metadata from a variety of heterogeneous data sources, including Web pages, XML feed documents, intranet data repositories, etc., entity disambiguation is a crucial step. Freedomís disambiguation techniques were used for automatically resolving entity ambiguities in 99% of the cases, leaving less than 1% for human disambiguation (about 200 cases).
Given that SWETO is intended for ontology benchmark purposes, we continue to populate the ontology with diverse sources thereby extending it in multiple domains. Version is populated with well over 800,000 entities and over 1.5 million relationships, with the next larger release due out soon. SWETO access is available through browsing, XML serialization, and will soon be available though a Web service. SWETO has been used internally (LSDIS Lab) for discovery and ranking of semantic associations. Externally, our collaborators at UMBC are exploring trust extensions for SWETO, whereas within industry applications, Semagix uses it for evaluating fast semantic metadata extraction and enhancement in Marianas SDK.
SWETO is an effort of the SemDIS team, with significant effort in using Freedom by Gowtham Sannapareddy. It is partially funded by NSF-ITR-IDM Award # 0325464 and NSF-ITR-IDM Award # 0219649.
[Semagix] http://www.semagix.com
[SemDis] http://lsdis.cs.uga.edu/Projects/SemDis
[Sheth et al 2002] Sheth, A., Bertram, C., Avant, D., Hammond, B., Kochut, K., Warke, Y.: Managing Semantic Content for the Web. IEEE Internet Computing, 6(4), 80-87. (2002).
Dennis Quan, IBM
The early growth of the Web is due in great part to the ease with which one can "play" with Web technologies such as HTML and JavaScript. Our paper on Haystack to be presented during the conference, "How to Make a Semantic Web Browser", shows how our Haystack system can be used to easily browse RDF metadata. In this talk we extend this notion to discuss how to create RDF metadata and write scripts that manipulate RDF models using Haystack's Adenine programming language and the Eclipse-based development environment that is built into Haystack. We show some examples of how one can use this environment to quickly leverage existing RDF sources and prototype interactive, RDF-based applications.