Squiggle: an application framework for model-driven development of real-world Semantic Search Engines
Contact e-mail: irene.celino # cefriel.it, emanuele.dellavalle # cefriel.it, dario.cerizza # cefriel.it, andrea.turati # cefriel.it
Application
General purpose and services to the end user
Squiggle is a framework to support the development of domain-specific search engines that exploit the semantics of domain ontologies to improve the search functionalities. It supports both the conceptual indexing phase and the semantic search (the runtime interaction).
A more detailed description can be found at http://squiggle.cefriel.it, where you can also find the links to some running search engines: Squiggle Music to find songs and artists (at http://squiggle.cefriel.it/music) and Squiggle Ski to find images of alpine skiers (at http://squiggle.cefriel.it/ski).
Functionality examples
- Syntactic search: when the user submits a textual query, Squiggle performs a traditional syntactic search over the index
- Semantic Interpretation of user query: besides the syntactic search, in the meantime, Squiggle analyzes the user query in order to identify possible meanings of the request. This activity is performed by "comparing" the content of the query with the labels of the concepts contained in the domain ontology. when Squiggle identifies a matching result, Squiggle displays its preferred label in a lateral box (a "Did you mean...?" disambiguation box), using the language of the user by exploiting the support of xml:lang in RDF. The user can therefore disambiguate between the identified meanings of his/her query by selecting the one(s) that fit the original request.
- Semantic Search: when the user operates the disambiguation on the suggested meanings of the previous step, Squiggle performs a semantic search over its indexes and returns all the results that, during the conceptual indexing phase, were indexed against the selected concept(s), disregarding the possible syntactic variants of the textual annotations.
- Semantic Suggestions: after the user disambiguation, the meaning of the query, in terms of concepts from the domain ontology, is specified; therefore, Squiggle is able to suggest other "meanings" to the user for expanding his/her search to related contents. For example, Squiggle is natively able to exploit some SKOS primitives (e.g., skos:related, skos:broader, skos:narrower, skos:relatedPartOf, etc.) and can be configured to exploit domain-specific relations.
Application architecture
Squiggle is composed of two main parts: the Conceptual Indexer, which takes as input the contents to be indexed and the domain ontology and produces as output a set of indexes, and the Semantic Searcher, which queries the indexes to return matching results and semantic suggestions in response to the user. There are two kinds of indexes: the syntactic ones that are queried for textual matching (based on Apache Lucene) and the semantic ones that are queried for ontological matching (based on Sesame). In the running systems, the components are on a single server, however there's no technical reason to avoid the distribution of some of the components.
Data to be indexed are obviously distributed over the network, like for any other search engine, as well as it is possible to distribute the ontologies used to annotate and describe those data.
Special strategies involved in the processing of user actions
As explained before, the vocabulary is used in the semantic interpretation of the query (by accessing to all the labels in the knowledge base) and during the semantic suggestions (by following some relationships between the concepts).
Integration between vocabulary-linked functions and other application functions
The final user is provided with an interface he/she is already accustomed to, i.e. a textual search engine; however, the semantically-enriched functionalities are hidden "behind the scenes" and are employed to give the user the value added of the semantic searching.
During semantic interpretation of query, Squiggle compares it both against the preferred labels (skos:prefLabel) the alternative labels (skos:altLabel) and misspelled labels (skos:hiddenLabel).
For semantic suggestions, Squiggle exploits the relationships between the identified concepts and other concepts within the ontology, so as to propose query expansion.
Additional references
Two domain specific search engines were built on top of Squiggle framework and are available on-line:
Squiggle Music http://squiggle.cefriel.it/music
Squiggle Ski on http://squiggle.cefriel.it/ski
Some publications about Squiggle are available on the web at http://swa.cefriel.it/Publications#squiggle-pub.
Vocabularies
Title
The Squiggle framework uses the SKOS vocabulary. Squiggle Music uses a Music ontology. Squiggle Ski uses a Ski ontology.
General characteristics (size, coverage) of the vocabulary
The Music ontology contains more that 2 millions triples in RDF/OWL. The knowledge base, derived by freely accessible sources like MusicBrainz (http://www.musicbrainz.org) and MusicMoz (http://www.musicmoz.org), describes artists and bands, songs, albums, music genres, etc. The Ski ontology contains more that 2000 triples in RDF/OWL. The knowledge base was derived by information of the International Ski Federation (http://www.fis-ski.com/) about athletes, disciplines, races, podiums, etc.
Language(s) in which the vocabulary is provided
The Music ontology is not multilingual (the names of artists and songs are not "translatable"). The Ski ontology contains the name of the disciplines in 7 languages (English, Italian, German, French, Swedish, Norwegian and Finnish.)
Vocabulary extract
Music ontology:
- Artists and Bands: Beatles, John Lennon, Red Hot Chili Peppers, etc.
- Songs: "All you need is love", "Imagine", "Otherside", etc.
- Music genres (the arrow below means "is broader than"):
Rock --> Heavy Metal --> Death Metal World, Celtic, Pop --> Celtic Pop
Ski ontology:
- Athletes: Giorgio Rocca, Hermann Maier, Benjamin Raich, etc.
- Disciplines: Slalom, Giant Slalom, Downhill, etc.
Structure explanation
The domain vocabularies in Squiggle use hyperonymy/hyponymy, meronymy/holonymy (part-of relation), multiple wordings (homonymy/pseudonymy/synonymy) and generic semantic relationship (when two items are "related").
Music ontology:
- The music genres are related via skos:broader and skos:narrower relations
Rock skos:broader HeavyMetal
- The artists can be reciprocally connected
Queen skos:related PeterGabriel
and the bands can be connected to their components
JohnLennon skos:relatedPartOf Beatles
- The artists can have multiple labels
Beatles skos:prefLabel "The Beatles" Beatles skos:altLabel "Fab four" Beatles skos:hiddenLabel "Beetles"
- The songs are connected to the performing artists
Imagine isPerformedBy JohnLennon JohnLennon performs Imagine performs rdfs:subPropertyOf skos:related
Ski ontology:
- Athletes are related to the disciplines they practice
GiorgioRocca practice GiantSlalom practice rdfs:subPropertyOf skos:related
- Disciplines have different names in different languages
GiantSlalom skos:prefLabel "Giant Slalom" (English) GiantSlalom skos:altLabel "Slalom Gigante" (Italian) GiantSlalom skos:altLabel "Riesenslalom" (German)
Machine-readable representation of the vocabulary
All the data is rendered in RDF/OWL using some of the SKOS primitives. The ontologies are not publicly available on the Web, hereafter some sample triples.
Music ontology:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:m="URN:it:cefriel:music#" <rdf:Description rdf:about="URN:it:cefriel:music#artist-8bfac288-ccc5-448d-9573-c33ea2aa5c30"> <skos:prefLabel>Red Hot Chili Peppers</skos:prefLabel> <skos:altLabel>RHCP</skos:altLabel> <skos:altLabel>The Red Hot Chili Peppers</skos:altLabel> <skos:hiddenLabel>The Red Hot Chilli Peppers</skos:hiddenLabel> <skos:hiddenLabel>Red Hot Chilli Peppers</skos:hiddenLabel> <skos:hiddenLabel>Red Hot Chilly Peppers</skos:hiddenLabel> <m:hasStyle rdf:resource="URN:it:cefriel:music#style-2150036357"/> <m:hasStyle rdf:resource="URN:it:cefriel:music#style-2147564081"/> <m:hasStyle rdf:resource="URN:it:cefriel:music#style-2149982882"/> <m:performs> <rdf:Description rdf:about="URN:it:cefriel:music#song-1599985063"> <skos:prefLabel>Californication</skos:prefLabel> </rdf:Description> </m:performs> <m:performs rdf:resource="URN:it:cefriel:music#song-0432808595"/> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:music#style-2149982882"> <skos:prefLabel>Punk</skos:prefLabel> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:music#style-2147564081"> <skos:prefLabel>Pop</skos:prefLabel> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:music#style-2150036357"> <skos:prefLabel>Rock</skos:prefLabel> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:music#song-0432808595"> <skos:prefLabel>Otherside</skos:prefLabel> </rdf:Description> </rdf:RDF>
Ski ontology:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ski="URN:it:cefriel:ski#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:skos="http://www.w3.org/2004/02/skos/core#"> <rdf:Description rdf:about="URN:it:cefriel:ski#Athlete"> <skos:prefLabel>Athlete</skos:prefLabel> <skos:altLabel>Atleta</skos:altLabel> </rdf:Description> <ski:Athlete rdf:about="URN:it:cefriel:ski#athlete-1298393383"> <ski:practice rdf:resource="URN:it:cefriel:ski#Slalom"/> <ski:practice rdf:resource="URN:it:cefriel:ski#Combined"/> <skos:prefLabel>ROCCA Giorgio</skos:prefLabel> </ski:Athlete> <rdf:Description rdf:about="URN:it:cefriel:ski#Slalom"> <skos:altLabel>Slalom Speciale</skos:altLabel> <skos:prefLabel>Slalom</skos:prefLabel> <skos:altLabel>SlalÄm</skos:altLabel> <skos:altLabel>SL</skos:altLabel> <skos:altLabel>Speciale</skos:altLabel> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:ski#Combined"> <skos:altLabel>Combinata</skos:altLabel> <skos:prefLabel>Combined</skos:prefLabel> </rdf:Description> </rdf:RDF>
Software applications used to create and/or maintain the vocabulary, features lacking for the case
The vocabulary maintenance is performed through an RDF/SKOS editor.
Structure of the database used to currently manage the vocabulary
Squiggle uses Sesame repositories with a MySQL backend to store the knowledge bases in RDF format, using Sesame pre-defined structures.
Standards and guidelines considered during the design and construction of the vocabulary
In the modeling of our SKOS-based ontologies, we made use of the "Quick Guide to Publishing a Thesaurus on the Semantic Web" (http://www.w3.org/TR/swbp-thesaurus-pubguide/) and the "SKOS Core Guide" (http://www.w3.org/TR/swbp-skos-core-guide/).
Additional references
The sources of information to build the knowledge bases are:
MusicBrainz (http://www.musicbrainz.org) and MusicMoz (http://www.musicmoz.org) for the Music ontology;
the website of the International Ski Federation (http://www.fis-ski.com/) for the Ski ontology.
Vocabulary mappings
[Note from the editor: this contribution refers to some mappings, but this is more at the meta-language level, with domain-specific relations being mapped to standard SKOS or DC properties, as examplified in previous section.]