Information Gathering

A task force of the SweoIG, started in November 2006

This wiki page is a discussion of the SWEO IG and not final yet. Sentences like X will do Y should be read we propose X will do Y. Until end of March 2007, the discussion should be finished.

The goal of the information-gathering task is to identify existing Semantic Web resources, and to ensure that they are easy for others to find and utilize going forwards. Information is gathered about tools, products, code, demos, papers and books. Existing collections of information are imported and syndicated. The data format of gathered resources will be RDF and should conform to a set of RDF/S vocabularies recommended by us. The gathered resources are presented on a website, can be annotated and searched. Users can find these resources based on a categorization ontology. This website is named Semantic Web Information Portal, or short SWIP.

http://www.w3.org/2001/sw/sweo/public/Info/ - information gathering input form
/PortalPlans - planning the Semantic Web Information Portal
/DataSources - data sources to integrate in the portal
/DataVocabulary - gathered data should conform to these RDF vocabularies
/HowtoPublish - how to publish information that it will be syndicated by SWEO
/ClassificationOntology - gathered data will be categorized according to these SKOS concepts
/Discussion - input for our plans from SWEO members
/WishList - things that need to be done, bug reports, improvements
/RecommendedTutorials

People working on this task

LeoSauermann - task lead
PasqualePopolizio
IvanHerman
LeeFeigenbaum - interested in an end-user friendly representation for non-sw people (aka the Portal)
UldisBojars
Kingsley Idehen
DannyAyers
BenjaminNowack

Timeline

Start: November 2006
Planning and Definition: Jan-Feb 2007
Implementation: March-April 2007
Setting up Contacts to external authors: April 2007
Information Gathering ready and finished: 1st May 2007

Open Action Items

LeoSauermann suggested the name Semantic Web Information Portal, or short SWIP... please give feedback
DONE: Contact people that run existing lists and aks if they plan to continue their lists and why they picked the tools they use. Also ask them if they would be interested to have their list integrated and used by the SWEO for a information gathering website. found by Ivan, is a good starting point. - got feedback from one person saying he would submit data if we provide a format. that's enough for the moment.
create a common data format that people can provide data: LeoSauermann, kidehen and bengee.
- drafts see /DataVocabulary and /ClassificationOntology
check if we get data in this format:
work on the portal, make it gentle and light collection of resources
find responsible people to manage important lists
find responsible people for technical implementation
find responsible people for web design (colors and good looks)

Information Gathering By SWEO

Many information sources about the Semantic Web already exist. We find tools, products, projects, tutorials, standards and other documents on the web. Others have already started generating lists and comprehensive portal websites to access this information. SWEO aims to collect these lists and gather all resources. The existing collections cover certain topics (like Dave Beckett's list on Semantic Web tools) or are organized by type or audience (the Semantic Web portal lists specifications and presentations of the W3C Team and Working/Interest Group staff; these data are also accessible directly in RDF though using, partially, its own vocabulary for historical reasons...).

These existing resources should be reused and the responsible authors will be involved by asking them to continue their existing work. SWEO will provide a standardized way how to represent such resource lists (an RDF vocabulary) and an extensible ontology to classify the resources (an ontology). Existing lists will be re-used by crawling their data and integrating it into a database managed by SWEO.

License and Legal Issues

Our goal is to reuse and repurpose the aggregated data sources in a variety of contexts.

When syndicating, we only copy existing information and make it available, because a URL to the source is always available and the sources are visible, users can check under what license the content is.

We need the agreement by the authors that their data is available under a Creative Commons license. Thus, this will be a restriction and upload and crawl time. Of course the more hurdleas we place at content gathering time, the less content we end receiving. That said, the content will be ultimately more valuable without inadvertent legal exposure.

In a nutshell we want to produce Openly accessible data that benefits everyone.

SWEO (via this Information Gathering effort) acting as a clearing house between RDF content providers and consumers. Providers agree to our licensing terms, and consumers agree to use the content according to the license terms.

We have picked the CreativeCommons license for the collected data

http://creativecommons.org/licenses/by/3.0/

There are two aspects: the license agreement which contributors agree on when submitting their content, and the license that we offer to the users of the syndicated data.

It would be possible to tag each item with a license, and pass on the license with the item. But this would mean that the users of the syndicated data would have to check for each item. This would be complicated.

An alternative to CC license was the W3C software license, but because of the wide adoption of CC we favored CC.

Feedback on licenses

(taken from the mailinglist)

sum: we use CC-by-attribution.

Benjamin Nowack: I'm fine with a CC license. I don't have any license attached to content published at rdfer.com, but could try to auto-add

Ivan Herman: I vote for the CC approach. Widely used, widely adopted and, last but not least, widely known. license information to locally maintained graphs/descriptions.

Chris Bizer: We don't have an explicit license on the website yet. But this is clearly something we should have. I guess we will go for creative commons attribution, so consider all our content cc attribution.

Michael Bergman: My site is published under the Creative Commons license, 2.5, which I will likely update to 3.0. Either of those or the W3C is acceptable.

Suggested Resource Tagging Guidelines and Ontology

Orri: As a base suggestion for a tagging guideline for SW related resources, I'd suggest using the following del.icio.us or other tags in addition to sweo.

Go here to find the tagged items:

[1]

The SWEO information gathering ontologies are here

/OntologyDiscussion
/DataVocabulary - a RDF vocabulary for the information items and lists of them.
/ClassificationOntology - an ontology to classify information items (give them types)

Motivations for InformationGathering

What do you think this task force should do?

= LeoSauermann, 2.3.2007

I want to stick to the timeline posted here. I want to have a working way of gathering information items and display them on a portal website, giving good lists of "recommended tutorials to read when you start learning inference".

To put it to a sarcastic point: All these discussions about SIOC/SKOS/FOAF/.... stop us from teaching people about the semantic web. We don't need SIOC, SKOS, FOAF, etc to do that, we could even use an RDF:Bag of RDF:resource with rdfs:label, rdfs:comment and rdfs:seeAlso links to websites - thats perfectly enough to gather information. We are the Semantic Web Education and Outreach interest group, not the "We define a perfect vocabulary for gathering information W3C technical group". We should gather good tutorials, ask people to write introductionary material on the semantic web and THEN provide this teaching material in some kind of RSS feed.

History

15.2.2007: LeoSauermann recompiled the whole page.
- gathered all important words mentioned and written it into the Ontology
~10.3.2007: DannyAyers moved /Discussion and /PortalPlans to separate pages

----