SIOC/BrainStorming/2007-12-13

From W3C Wiki

First of the SIOC/BrainStorming sessions.

SIOC Brainstorming Session on 13th Dec. 2007

Everyone: please post comments in the comments section (even if you didn't attend the session) or add to any point as you see fit!

Attendees

  1. Uldis Bojars
  2. Thomas Schandl
  3. Gabriela Vulcu
  4. Sheila Kinsella
  5. Conor Hayes
  6. John Breslin
  7. Richard Cyganiak
  8. Adam Westerski
  9. Stephane Corlosquet
  10. Giovanni Tummarello
  11. Vassilios Peristeras


SIOC core - media of communication

  • in DERI John, Uldis, Thomas and 3-4 more interns to come next year (Alexandre, John, Sergio, Tuukka)
  • #sioc IRC channel on irc.freenode.net (SIOC enabled logs). It would be great to have more people from DERI there!
  • SIOC dev mailing list (google)

John is to make the SIOC dataset for 7 million posts of boards.ie available for a coding competition for the best use (for a prize money of € 1000 or more).


Discussion topics

Vassilios: Maybe we should organize future discussion / brainstorming sessions around core topics that cover every aspect about SIOC:

  • The ontology per se
  • Tools
  • Applying SIOC
  • Architecture for SIOC systems, performance issues
  • Promoting use of SIOC

Any other topics?

Future steps

  • Improve website and documentation (outdated pages on sioc-project.org)
  • Improve and update tools
  • Develop new tools/uses for SIOC data


Application of SIOC

  • Data analysis, RDF statements
  • SIOC widgets
  • Business apps, privacy + access control
  • Distributed conversation, crosslinks between blogs
  • Somehow integrate Sindice with SIOC - find a use case for that
  • There are also enterprise applications using SIOC, such as seesmic (SIOC used for internal data structure) and TalkDigger
  • Nice to have: SIOC buttons for SIOC enabled websites as a means to advertise the use of SIOC (branding) and to give visual feedback that something happend after you installed the plugin - but the best would be if clicking on these button actually will trigger a useful application for SIOC data

The promotion of the use of SIOC just depends on the functionality it will enable - if people see that they can do something new & useful with it, it will spread on its own and people will write their own exporters.

Killer question: What cool things can we do with SIOC?


SIOC for Sindice

They want to develop Sindice more in a use case driven manner.

  • Use Sindice to search posts or whatever content from e. g. one user
  • Embed Adam's (not yet released) AJAX widget in SIOC enabled blogs (currently this widget allows to find RDF online). Based on their inverse functional property index this could be pretty cool.
  • Even during the stage of creating a blog comment Sindice could be useful for looking up the author's data. Similar Sindice could be useful for finding topics and good URIs for these topics to annotate what post is about. They want to have an entity based API, instead of a document based API for Sindice.
  • Giovanni: provide some way to allow explicit tagging of articles by which the user can explicitly make some SIOC annotation e. g. when posting a message to a bulletin board, there could be a form field labeled "topic URI" where the author can give a URI for whatever he is writing about. That would immensely raise the value of the data of the entire thread.


SIOC for Ecospace

Ecospace built a SIOC exporter for BSCW and they are developing a second exporter for another application called Business Consultance (?). They want to combine the two exporters and a yet-to-be-built SIOC explorer, which should enable the user to query both systems or to e. g. to produce a common view of different projects dates in one calendar. Currently it is not possible to combine calendar data of two different projects (not even just within BSCW). It should enable querys like "give me all the documents that have been uploaded last night in all applications".

→ Use SIOC as a meta language to annotate this data in order to combine and extract information. The goal is to give the user access to multiple application through a single interface.


SIOC for developer's communication

Giovanni: Using SIOC (or an extension to it) to aggregate the data in a development process (bug tracking, project management,...) The number of communication channels is growing and becomes overwhelming.
Developers should use links if the talk about a topic. If a message contains a link it should be exported as SIOC, that could help smushing things together.

Related EU-Projects:

  • OKKAM - could be useful for finding URIs for entities
  • Romulus - about making developers more productive. The use of semantic web technologies for Romulus is proposed (for use of annotation of issues insinde bug tracking). Within this there could be funds to try to put some SIOC insinde.


SIOC for mailing lists

  • DERI mailing lists need to get SIOC-ified (using SWAML).
  • Richard: Develop a tool that you install on your server and subscribe this tool to the mailing lists you are interested in. It receives the mails and exports SIOC. → cool thing: You don't have to talk to the list operator to set this up!
  • Giovanni: Develop a sponger that you point to the public mailman archive to harvest data.


SIOC for aggregating web contributions

Develop a web application that allows the average web user to aggregate all his articles from various bulletin boards, blog comments, etc. With that he could track all his contributions on a single web page, see who answered to his posts and to different threads, present a collection of his most useful contributions from differnt sites, etc.


More SIOC exporters

  • Planet-PHP
  • MovableType (used on Planet RDF) - but it is written in Perl, is there anyone proficient in Perl who could easily write an exporter? Maybe with a better outreach to the Planet RDF community we could find someone (who has an interest in getting his data in SIOC) to make such a plugin. → maybe we need a bigger "SIOC marketing team" which spreads the word about SIOC and its plugins.



Prolongation of brainstorming session

Continued discussion with Adam, Richard, Stephane, Thomas and Uldis


Pings

One source of data for Sindice is produced by pings. At the moment the various SIOC exporters don't ping web services (Semantic Radar on the other hand does ping - but only when a page is viewed by someone)

There was some concern that maybe not everyone that installs an exporter wants to have pings to be sent.

→ We should modify the exporters to include pings, but make it clear which services are pinged and have the option to switch it off.


Protocol for SIOC

A protocol hasn't yet been described for SIOC - there is only an implicit protocol in how to craft URLs..

  • Is there need to discuss formally what URLs can be expected?
  • Richard: Take care not to prescribe how people have to manage their URLs, in some cases they might not be able to follow that prescription.
  • Richard is sceptical of protocols for specific vocabs
  • Uldis: Is this a problem that isn't sufficiently addressed by current RDF access protocols - topic for a paper, but not a priority.


SIOC and Tabulator

For the use with Tabulator (and possibly other applications) the various related SIOC documents created by the exporters need to have more (redundant) links to each other. Usually SIOC exporters have a link structure that points from a site to all its users and all its forums, latter in turn point to the posts they contain. It would be useful to have not only theses top-down links, but additionally e. g. links from a user page to all the posts a user has written to allow navigation in other directions (see Whiteboard 2 top left corner).


Migration using SIOC data

  • SIOC as a way to migrate blog, forum, mailing lists, etc. Export your data in RDF and import it into other System.
  • Uldis: achieve "social media contribution portability" using SIOC - Allow different views of same information, as someone might e. g. read mailing lists, but not things you post on your blog


SIOC feeds

We don't have an object "Feed". We only have a linked list of posts, so you can do incremental crawling of posts, but you can't do that for e. g. comments. A crawler has to visit every post on a site and check if it has new comments - very inefficient.

  • We need a object "Feed" (for comment feed, author feed, maybe a generic "event" feed of new items,... )

Some blog engines support comment feeds - so there would be infrastructure that could be used.

Richard: It would also be solvable on http level: Add a header with information of last modification to e. g. RDF snippet of a post in order to speed up crawling. Then you could get only things that have been modified since a specified date. Efficiency of this depends on the structure of the data: You don't need to crawl all the comments again, when the page they are on has not been modified.

  • add "last modified" to SIOC exporters

Ulids: List of what last changed is easier to implement than "last modified header"


Crawling SIOC data

How important is incremental crawling? Has crawling to be (near) real time? That can be tough for really big sites (like boards.ie with 8 million RDF pages).

Alternatively: use some strategies like google does with their indexes (google also use feeds, crawls often changing pages more frequently, ...)? Which other people are in DERI that have more experience in smarter crawling?


Storing SIOC data

Give users possibility to have a RDF store mirror of their blog/forum (synced in real-time by receiving pings or by being queryed every x minutes). That would make it possible to have arbitary Sparql queries.

  • Make some kind of side-kick to any SIOC producing engine, so that the user just has to install one plugin that also provides a SPARQL endpoint.


Provide another way to query SIOC data

Richard would like to see a SIOC API/protocol that gives automatic RDF agents the possibility to do some limited querying.

Currently many people publish snippets of RDF, but there are access patterns which you can't easily model for that. E. g. Tabulator is limited to just browsing through SIOC data space using these snippets (basically just following rdfs:seeAlso links).

Alternatively to a full SPARQL endpoint he suggests extending the SIOC protocol, so that RDF access agents like Tabulator are enabled to do some simple querying - like showing posts of a particular author or from a specified date range (only things that people would typically ask for, not for things that can't be easily answered).

Richard could collect use cases / ideas for that kind of thing.


Whiteboard 2


Next session

Possible emphasis of next brainstorming session in January: Find more useful applications for SIOC data. See page for that session.

Comments section

Please add your comments here.

These may come in handy for presentations or referencing ideas we have for SIOC. Must see if I can convert from CDR to SVG too. --Cloud 17:22, 17 December 2007 (UTC)