Tagging Content in the Semantic Web 2.0
Collaborative Tagging systems like delicious are increasing their user community day by day. Tagging in a Web 1.0 environment leads to a more efficient bookmarking, because interessting stuff can be categorized under several user defined topics. This decreases the search effort in daily practice. But there occure several problems in tagging contents with undefined terms:
- Different terms with the same or similar meaning cannot be related to each other.
- The same term with several different meanings cannot be separated within his meaning context.
- Relations like is a or part of cannot be identified or even expressed.
The importance of these semantic relations cannot be found on a single user tagging system. Here the user tags his contents with his own treasury of words. So he tags the same meaning in most cases with the same term. In collaborative (multi user) tagging systems, several user vocabularies meet each other, one meaning is tagged with several terms and information retrieval measures like Precision and Recall show bad results.
Tagging in a Semantic Web 2.0 environment
The Semantic Web 2.0 can increase collaborative tagging systems in many ways:
- Semantic relations can be expressed in semantic web languages like (RDF(S), PIMO).
- Tags are no more identified by their unclear label. Each tag has his own global uris. So ambiguities in term labels (e.g Java) are managed inherently.
- Classifications in ontologies: instance of, subclass of can be stored simply in ontologies.
- Global knowledge bases called folksonomies like Wikipedia gather semantic backgrounds for tags being explained there. It may be reasonable to relate the wiki URL for an explanation of a tag as a global URI for this tag.
Definition of a tag
There are different ways to define a tag, depending on the scientific perspective.
Definition from out the data structure perspective
A tag defines an index in form of a keyword for a given content. It is used in structural languages as a label to markup a given content.
This is the primary definition of a tag used in lightwight tagging systems like delicious. There is no definiton of the meant semantic for a given tag labeling a certain content. This leads to the known problems of indexing contents with different terms but similar meanings.
Definition from out a philosophic or ontological perspective
A tag defines a given concept. So it doesn't label a certain content, but just the concept. The concept occures in the text. (A concept is an abstract, universal idea, notion or entity that serves to designate a category or class of entities, events or relations. [http://en.wikipedia.org/wiki/Concept]).
This definition needs an explanation of the labeled concept to let others know what it is about. The preferences for such a tag management includes an existing knowledge base, to store the explanations. This may be done via ontology implementations. The advantage of such a well defined tag management is the clear understanding in using such a tag with his concept to find related contents. It is also possible to share these tags with other users, because a publication of concept - content relation can be defined as a communication approach to send informations via a given channel of a tagging system. So the sender puts the content and the related concepts into the channel to ensure that the reciever can understand that the content is about the given concepts. This is a completely new requirement designing a collaborative tagging system.
The role of definitions in a semantic tagging system
Definitions explain certain concepts. They define the meaning of a tag in a given content context. This overcomes term ambiguities an solves the three problems meantioned at the beginning. Luckily Aristoteles defined the structures of good definitions in his words: Definitio fit per genus proximum et differentiam specificam (Loosely translation) You create a definition by relating the higher abstraction of a given concept and mention the existing concept specific differences. With this structure in writing definitions they become machine readable with respect to hypernym extraction. Additionally to this, the extraction of semantic related terms describing the concept can be focused on the definiton. Now it's possible to handle concepts with existing information retrieval technologies, because they can be described with a set of keywords.
ConTag is a diploma thesis written by Benjamin Horak working in the German Research Center for Artificial Intelligence. The main goal is to develop a Collaborative Tagging System in a semantic desktop environment. The implementation will use the Gnowsis Semantic Desktop. Here an easy to use personal ontology called PIMO(Personal Information Model) can be used as a knowledge base to store tagging informations.
Collaborative Tagging in a semantic environment means that everyone can participate from eachother's knowledge! So please join the ConTag community to help building an effective system, which is able to save much more time in finding the resources you need. I will collect your needs, wishes and proposals to ensure that ConTag will please the common users's needs.
Current discussion topics
Here is a list on open problems and yet not finished thoughts. You're welcome to join this discussion.
Tagging textual definitions
Classification can perhaps be simplified by an language processing approach which tags only textual definitions from wikipedia, dict - books or other sources.
Tagthe.net can identify locations, metatopics, persons, the current language and other interesting classes. It's necessary to ensure, that tags are instances of known classes in your personal ontology. (e.g Kaiserslautern is instance of city). So we're in need of more classification services for several topics:
- locations- I'm going to write a RESTful web service to answer the question whether a term is a city or not and in which country it is located. Google is on the way to implement an interesting location service.
- famous persons, authors, celebrities etc.
- music, I will have a look at http://www.freedb.org
- etc. (Please go on in completing this list.)
Tags have to be transformed into a standard form to be understood. So stemming algorithms, thessaurus methods, maybe an n-gram approach will be used to ensure that one term is stored only once into your personal ontology.
A major problem in collaborative tagging systems is ambiguity. When a user add his tag, this keyword results disconnected from the existing ones: thus there is no way to know how a specific tag is semantic related to others. Users can use the same term to describe different meanings (polysemy), and they can use distinct terms but with the same meaning (synonymy). To overcome these issues it is essential to provide a support for semantically interrelating tags, just as ontologies do for concepts and instances.
[new thoughts and topics should be inserted here.]
- MailTo(firstname.lastname@example.org) Benjamin Horak, diploma student at DFKI, Kaiserslautern, Germany
- Eyal Oren
- Haklae Kim
- Leo Sauermann
- Cédric Mesnage, phd student in Lugano, Switzerland.
- DomenicoGendarmi , phd student in Bari, Italy.
- [please write your name, institution and email adress here, so that we can contact us in important needs.]
[Please add all your known resources here. Additionally you can tag web sites in delicious with the term ConTag] (see http://del.icio.us/tag/ConTag)
- DFKI - German Research Center for Artificial Intelligence.
- NEPOMUK - The Social Semantic Desktop .
- Gnowsis - Semantic Desktop.
- http://opennlp.sourceforge.net - Natural language processing projects.
- dict - A Dictionary Server Protocol.
- http://www.idealliance.org/proceedings/xtech05/papers/02-07-04/ - Connecting Social Content Services using FOAF, RDF and REST
- GEOnet Names Server Free data about geographic names.
- Tag Ontology
- http://www.w3.org/TR/2004/NOTE-grddl-20040413/ Grddl - Getting RDF directly from HTML pages, when its embedded. Think of piggy-bank on steroids. If you have RDF content already on the page - reduced need for using services to find possible tags.
Supporting Web Services
- http://www.tagthe.net - helps you in tagging textual content (tag extraction).
- http://dict.org - Online dictionaries implementing the dict protocol.
- http://developer.yahoo.net/search/myweb/index.html - Web 2.0 Web Services(tag extraction)
Existing tagging systems
- Leaftag A tagging framework for linux desktops.
- tagcloud A system to build uppon rss feeds a tag cload (e.g term environment of extracted tags)
Test Documents with given Classifications
- Reuter News Corpus (available at DFKI)
- Media:ConTag$ConTag.pdf A short poster about concept matchings and the general architecture of ConTag