The first and possibly most important aspect of SemTech 2009 is that… it
happened! I must admit that back in April-May, when the
conference’s Web Site did not include any news of the program yet,
I was a bit concerned that the general economic malaise would kill
this year’s conference. O.k., I might have been paranoiac, but I
think some level of concern was indeed legitimate. And… not only
did the conference happen as planned, but the numbers were
essentially the same as last year’s (over 1000). I think that by
itself is an important sign of the interest in Semantic
Technologies. Kudos to the organizers!
A general trend that was reaffirmed this year: by now, Semantic Web
technologies are the obvious reference points for almost all
presentations, products, etc, that were presented at the event.
RDF(S), RDFa, OWL, SPARQL, etc, have become household names; newer
specs like SKOS or POWDER may not have been as widely referred to
yet, but I am sure that will come, too. Linked Data (and,
more specifically, the Linked Open Data cloud) were almost
ubiquitous this year while I do not believe that it was even
mentioned last year. That is a huge change (although I still miss
real “user facing” applications of LOD to show up; some, like
Talis’
system deployed at UK universities, were presented but not as
part of the regular conference). All that being said, I somehow
seem to have missed more sessions than last year, which make my
impressions more patchy. There were several journal interviews that
I could not escape, hallway discussions that were great but made me
miss a presentation here and there… I guess this is what happens
when you have such a number of people around!
Tom Tague (from Open Calais) gave a very nice opening keynote.
His talk was actually not on Open Calais (he did that in
2008), but rather on his experience in talking to different people
who tried to start up new ventures in the Semantic Web area (a
quote from his talk: “in 80% of the discussions I did not
understand what the vendors wanted, and I walked away with my
cheque book intact… Simplify!”). The main areas that he looked at
were tools, social, advertising, search, publishing, user
interface. One of the remarks I liked was on search: in his view
(and I think I agree with that) Semantic Technologies may not be
really interesting for general search (where the
statistical, i.e., brute force methods work well) but for
specialized, area-specific search tools (things like GoPubMed or applications deployed at,
eg, Eli
Lilly or experimented with at Elsevier
come to my mind as good examples). Similarly, these technologies
are not necessarily of interest for general, “robotic” publication
tools like Google’s news, but for high quality publishing, with
possible editorial oversight (reducing costs and difficulties).
(He also had a nice text on one of his slides: “Web2.0: Take Web
1.0, add a liberal dash of social, generous amounts of user
generated content, atomize your content assets and stir until fully
confused”:-)
Tom Gruber talked about his newest project: SIRI. A super-duper
personal assistant running on an iPhone with conversational (voice
directed) interface. The group behind it integrates a bunch of info
on the Web (the “usual” stuffs like restaurants and travel sites),
categorize them, and hide the complexities behind a sexy user
interface. The problem I have is that I just do not see how this
would scale. I see one of the major promises of the Semantic Web
getting data in RDF out there so that such, essentially mash-up
applications would become much easier to create and maintain. Until
then, it is really tedious… On a more personal note, I am not sure
I would like the voice conversational interface. I know that I have
never used the voice commands on my phone for example; I do not
feel comfortable with it. But, well, that is probably only me…
Chime Ogbuji made a really nice presentation on the system they
have developed at the Cleveland Clinic. Great combination of RDF,
OWL, and SPARQL. The interesting aspect (for me) was that usage of
a medical expert system called Cyc, which is used to convert the
doctor’s question in natural language (insofar as a question full
of medical jargon can be considered as “natural”:-) into,
essentially, a SPARQL query. The medical ontologies are used to
direct this conversion process, and then the triple store could be
queried through the generated query. Impressive work. (Part of it
was documented in a W3C
use case, but this presentation had a different emphasis.)
Unfortunately, I had to skip Peter Mika’s presentation on the
SearchMonkey
experiences, I will have to look at his slides… But, as a last
minute addition to the program, the organizers succeeded in getting
Othar Hansson and Kavi Goel to talk about Google’s rich sniplets. I
have already blogged on
this a few weeks ago but this presentation made the goal of the
project way more understandable. Essentially, by recognizing
specific microformat or RDFa
vocabularies, they can improve the user experience by adding extra
information on the search result. It is interesting to observe the
difference between Yahoo! and Google in this respect: both of them
use microformats/RDFa for the same general goal but, whereas Yahoo!
relies on the community providing applications and on users
personalizing their own search result page, Google controls the
output in a generic way that does not require further user actions.
It will be interesting to see how these differences influence
people’s usage patterns. There were some discussion on the Google’s
choice on vocabularies; the presenters made it quite clear that
they are perfectly happy using other vocabularies (eg, vCard or
FOAF) if they become pervasive, and this is a discussion that
Google plans to engage with the community. There is of course a
chicken-and-egg issue there (if a vocabulary is known by Google,
then it will be more widely used, too), and this is cleary an area
to discuss further. But these are details. The very fact that both
Yahoo! and Google look at microformats and RDFa is what
counts! Who would have thought just about a year ago?
I was not particularl impressed by the Semantic Search panel. I
had the impression that the participants did not really know what
they should say and talk about:-(
Nice presentation by Jeffrey Smitz from Boeing on a system
called SPARQL Server pages. Essentially: the user can use similar
structures like, say, a PHP page, ie, a mixture of HTML tags and
server “calls”, except that this “calls” refer to SPARQL queries
against a triple store on the server. Their system also includes
some rule based OWL reasoning on the server side, although I am not
sure I got all the details. All in all, the system seemed a bit
complex, but the general approach is interesting! And it is nice to
see that a company like Boeing seems to make good use of
RDF+OWL+SPARQL; it would be good to know more…
I missed Zepheira’s presentation on freemix which is a shame, but, well, it
happens. But I did play with freemix before travelling to San
Jose; I called it “Exhibit for the masses”. And this, I
think, is a fair characterization. David Huynh’s exhibit is a really
nice tool, but it is not easy to use it. On the other hand, it took
me about 2 minutes to make a visualization
of a json data set I used for an exhibit page elsewhere…
Andraz Tori talked about Common tag, a small
vocabulary that, for example, can be used when marking up texts
with tags (something that engines like Zemanta or Open Calais do).
Bringing the RDF and the tagging worlds together is really
important; I am very curious how successful this initiative will
be…
The keynote on the last day was from the New York Times (by Evan
Sandhaus and Robert Larson). It was quite interesting to see how a
reputable journal like the NYT has developed a tradition of
indexing, abstracting, cataloging articles, how these are archived
and searched. Impressive. It is also great that the
NYT Annotated Corpus has been released to the Research
community. I did not know about that and, I presume, this must be a
great resource for a lot of people active in the are of, say,
natural language processing. Finally they announced their intention
to release their thesaurus in a Semantic Web format, to add a
“blob” to the Linked Data Cloud. They still have to work out the
details (and expect feedback from the community) and I would hope
they would publish a SKOS thesaurus and might even annotate the
news items on their web site using this thesaurus in RDFa. But
something in this space will happen, that is for sure!
Other reputable newspapers, like Le Monde, the Guardian, NRC
Handelsblatt, el Pais, will you follow?
I also had my share of talking: gave an intro tutorial
to SW, gave an overview of
what is happening at W3C (quite a lot this year, including the
finalization of POWDER, OWL 2, and SKOS!) and participated at an
OWL 2 panel (with Mike Smith, Zhe Wu, Deb McGuinnis, and Ian
Horrocks). I was quite happy with the tutorial and the way the
panel went; the audience for the talk could have been a bit larger.
But, well…
It was a long week, long trips, not much sleep… but well worth
it!
Posted in Semantic Web, Work Related Tagged: Google, Linked Data,
OWL, POWDER, RDFa, Resource Description Framework, Semantic Web,
semtech2009, SKOS, Yahoo