November 20, 2009

W3C Q&A Weblog

Default Prefix Declaration

@page { size: A4 portrait; margin: 2cm} @media screen { body {width: 20cm; margin-left: auto; margin-right: auto} } body {font-size: 12pt} pre.code {font-family: monospace} pre {margin-left: 0em} ul.nolabel { margin: 0; margin-left: -2.5em} ul.naked li { list-style-type: none } ol ol {list-style-type: lower-alpha} div.ndli { margin-bottom: 1ex } .math {font-family: 'Arial Unicode MS', 'Lucida Sans Unicode', serif} .sub {font-size: 80%; vertical-align: sub} .termref {text-decoration: none; color: #606000} div.toc h2 {font-size: 120%; margin-top: 0em; margin-bottom: 0em} div.toc h4 {font-size: 100%; margin-top: 0em; margin-bottom: 0em; margin-left: 1em} div.toc h1 {font-size: 140%; margin-bottom: 0em} div.toc ul {margin-top: 1ex} .byline {font-size: 120%} div.figure {margin-left: 2em} div.caption {font-style: italic; font-weight: bold; margin-top: 1em} i i {font-style: normal} Default Prefix Declaration

Default Prefix Declaration

Table of Contents

1. Disclaimer

The ideas behind the proposal presented here are neither particularly new nor particularly mine. I've made the effort to write this down so anyone wishing to refer to ideas in this space can say "Something along the lines of [this posting]" rather than "Something, you know, like, uhm, what we talked about, prefix binding, media-type-based defaulting, that stuff".

2. Introduction

Criticism of XML namespaces as an appropriate mechanism for enabling distributed extensibility for the Web typically targets two issues:

  1. Syntactic complexity
  2. API complexity

Of these, the first is arguably the more significant, because the number of authors exceeds the number of developers by a large margin. Accordingly, this proposal attempts to address the first problem, by providing a defaulting mechanism for namespace prefix bindings which covers the 99% case.

3. The proposal

Binding
Define a trivial XML language which provides a means to associate prefixes with namespace names (URIs);
Invoking from HTML
Define a link relation dpd for use in the (X)HTML header;
Invoking from XML
Define a processing instruction xml-dpd and/or an attribute xml:dpd for use at the top of XML documents;
Defaulting by Media Type
Implement a registry which maps from media types to a published dpd file;
Semantics
Define a precedence, which operates on a per-prefix basis, namely xmlns: >> explicit invocation >> application built-in default >> media-type-based default, and a semantics in terms of namespace information items or appropriate data-model equivalent on the document element.

4. Why prefixes?

XML namespaces provide two essentially distinct mechanisms for 'owning' names, that is, preventing what would otherwise be a name collision by associating names in some way with some additional distinguishing characteristic:

  1. By prefixing the name, and binding the prefix to a particular URI;
  2. By declaring that within a particular subtree, unprefixed names are associated with a particular URI.

In XML namespaces as they stand today, the association with a URI is done via a namespace declaration which takes the form of an attribute, and whose impact is scoped to the subtree rooted at the owner element of that attribute.

Liam Quin has proposed an additional, out-of-band and defaultable, approach to the association for unprefixed names, using patterns to identify the subtrees where particular URIs apply. I've borrowed some of his ideas about how to connect documents to prefix binding definitions.

The approach presented here is similar-but-different, in that its primary goal is to enable out-of-band and defaultable associations of namespaces to names with prefixes, with whole-document scope. The advantages of focussing on prefixed names in this way are:

  • Ad-hoc extensibility mechanisms typically use prefixes. The HTML5 specification already has at least two of these: aria- and data-;
  • Prefixed names are more robust in the face of arbitrary cut-and-paste operations;
  • Authors are used to them: For example XSLT stylesheets and W3C XML Schema documents almost always use explicit prefixes extensively;
  • Prefix binding information can be very simple: just a set of pairs of prefix and URI.

Provision is also made for optionally specifying a binding for the default namespace at the document element, primarily for the media type registry case, where it makes sense to associate a primary namespace with a media type.

5. Example

If this proposal were adopted, and a dpd document for use in HTML 4.01 or XHTML1:

<dpd ns="http://www.w3.org/1999/xhtml">
 <pd p="xf" ns="http://www.w3.org/2002/xforms"/>
 <pd p="svg" ns="http://www.w3.org/2000/svg"/>
 <pd p="ml" ns="http://www.w3.org/1998/Math/MathML"/>
</dpd>

was registered against the text/html media type, the following would result in a DOM with html and body elements in the XHTML namespace and an input element in the XForms namespace:

<html>
 <body>
  <xf:input ref="xyzzy">...</xf:input>
 </body>
</html>

by Henry S. Thompson at November 20, 2009 02:06 PM

November 18, 2009

MWI Team Blog

The Pythia casts mobileOK spells

Web authoring tools ease publication process. Simplicity comes with some loss of control over the generated content. There is hardly anything an authoring tool user may do to improve her content when the W3C mobileOK Checker reports that pop-up windows should not be used. So what?! I do not have any of these pop-up links in my content!

The underlying theme can be updated, but this approach works up to a point when e.g. the post would best be split into multiple pages when delivered on mobile devices. Authoring tools that do not provide content adaptation mechanisms need to be extended to be able to serve mobile-friendly content to mobile devices.

I have been working on an open-source suite of tools written in PHP lately, named mobileOK Pythia, designed to help generate mobileOK content and more generically speaking to help adapt content to fit the properties of the requesting device. Here is a short overview of the outcome of this work. More information (including crucial information about the choice of Pythia as a name ;)) can be found in the documentation of mobileOK Pythia.

This work is part of the MobiWeb 2.0 project supported by the European Union's 7th Research Framework Programme (FP7).

Plug-ins for WordPress and Joomla!

WordPress and Joomla home pages with the mobileOK Pythia plug-in

From a user's point of view, the visual and hopefully useful outcome of this work is the creation of the mobileOK Pythia plug-ins for WordPress and Joomla! that make it possible to generate mobileOK content with these tools.

The plug-ins feature:

  • Device identification: based on WURFL, an open-source DDR published as an XML file, and accessed through a standard DDR Simple API interface.
  • Content adaptation to fit the properties of the requesting device in terms of e.g. screen size, script support, page size limit.
  • Theme switching: possibility to switch to a more mobile-friendly theme when the requesting device is identified as mobile.
  • POWDER: a machine-readable mobileOK claim for the Web site can be automatically created and served using a POWDER document. The POWDER document is made discoverable through the addition of a Link HTTP header field as decribed in the POWDER Primer.
  • W3C mobileOK Checker link: a link to the W3C mobileOK Checker is added next to the authoring input form to be able to assert the mobile-friendliness of the created content while it is being written.
  • mobileOK theme: a mobileOK template may be installed with the plug-in.

The development of a third plug-in for Moodle has started but it is still work in progress.

There exist other plug-ins that provide similar functionality (see for instance WordPress Mobile Plugin, WordPress Mobile Pack, Mobilebot 1.0 or WAFL: Mobile Content Adaptation). mobileOK Pythia separates tool-specific functionalities from tool-agnostic libraries to ease porting to other tools. In particular, the plug-ins wrap the same extensible libraries:

  • AskPythia to identify and retrieve the properties of the requesting device.
  • TransPythia to adapt content based on the properties of the requesting device.

AskPythia

AskPythia is an open-source conforming implementation of the Device Description Repository Simple API in PHP. It is not a DDR but a wrapper to existing DDRs.

AskPythia ships with an implementation on top of the WURFL database that maps WURFL capabilities to properties defined in the Device Description Repository Core Vocabulary standard. Support for other DDRs is welcome!

Check AskPythia's documentation for more information.

TransPythia

TransPythia is a transcoding library that adapts content (HTML, CSS, images) based on the capabilities of the requesting device. The library ships with a set of transcoding actions that are particularly adapted to mobile devices and that may be extended as needed.

Main transformations are:

  • Images conversion and adaptation: adapts images to match the requesting device's list of supported image formats and to fit the screen size. Removes images that cannot be converted or that are still too big for mobile consumption after conversion.
  • Pagination: a generic pagination algorithm that may be used to paginate HTML pages or HTML fragments when the requesting device is identified as a mobile device.
  • Tables linearization: to remove nested tables and linearize tables when the requesting device does not support them.

Check TransPythia's documentation for more information.

Feedback

If you would like to comment, contribute, report bugs or simply tell us what you think, you are very welcome! Feel free to send an email to the public-mobile-dev@w3.org mailing-list (with public archives).

by Francois Daoust at November 18, 2009 04:55 PM

November 16, 2009

W3C Q&A Weblog

W3C community bridges unicorns and werewolves #tpac09


theme-tpac09.png

The theme photo for W3C presentations at the TPAC09 showed the Natural Bridges state beach of Santa Cruz, California. We met in Santa Clara (not far from Santa Cruz) 2-6 November in order to bridge various communities and bring them together. For example, bringing together the HTML 5 browser folks and the extensibility folks was a goal. We joked this goal was called "Unnatural Bridges".

Broadening the W3C community was one of the themes of TPAC09, and was reflected in talks as well as participation.

For the first time ever, we invited the public to gather for an afternoon of discussion and networking, the Developer Gathering (the minutes are now available). Ian Jacobs aligned fantastic speakers who regaled us when they presented the latest on various open standards in development. Feedback from the #w3cdev demos included many "very cool", "absolutely amazing", "video element", "impressive", "geolocation", "accelerometer", "APIs", "nice possibilities", "features".

I thought the event went very well and think W3C should organize more. Please let us know what sort of event would appeal to you (e.g., with speakers as we had this time, or more like a bar camp, or a mix). If you blogged about #w3cdev, please, share a pointer in a comment!

TPAC is our biggest yearly event. Each year about 300 people who participate in various W3C groups meet face-to-face to exchange ideas, resolve technology issues, and socialize. My sense is that for most people involved, TPAC is their favorite W3C meeting of the year.

We tracked micro-blogosphere feedback on #tpac09. We expanded the number of people we follow (I'm not yet quite caught up with the additions I wanted to make, so excuse us if we're not yet following you). Likewise, at the occasion of TPAC and the Developer Gathering, a significant number of people also expanded their contact list and started to follow us (yay!). In Santa Clara @dckc said, '@w3c has ~5000 followers'. This is still growing, ~6300 now!

A bit of a mystery to me as I reviewed the tweets is the unicorn meme. I have no idea who started it and why, but unicorns were mentioned, portraited (it even made it to our theme photo!), tweeted, and interjected.

Oh, and I mentioned werewolves in the title. Although not (yet) a resident on our meetings agenda, Werewolves attacked the villagers almost every night at TPAC! Led by fantastic emcee @dontcallmedom, many people enjoyed the battles of minority against majority, the games of suspicion, trust, lies, doubts and beliefs. Nightly werewolf encounters are such fun in person.

J'accuse! Werewolves game photo. By Amy van der Hiel J'accuse! Werewolves game photo. By Amy van der Hiel

If you attended TPAC09 and would like to give feedback, we'd appreciate if you took the WBS survey. I welcome additional feedback, or a pointer to your blog entry in a comment to this entry.

by Coralie Mercier at November 16, 2009 10:26 PM

November 13, 2009

Michael(tm) Smith

WebKit adds support for the HTML5 <ruby> element

If you don’t know what the HTML5 ruby element is, you might want to take a minute to first read the section about the ruby element in the HTML5 specification and/or the Wikipedia article on ruby characters. To quote from the HTML5 description of the ruby element:

The ruby element allows one or more spans of phrasing content to be marked with ruby annotations. Ruby annotations are short runs of text presented alongside base text, primarily used in East Asian typography as a guide for pronunciation or to include other annotations. In Japanese, this form of typography is also known as furigana.

I give a specific example further down, but for now I want to first say that the really great news about the ruby element is that last week Google Chrome developer Roland Steiner checked in a change (r50495, and see also related bug 28420) that adds ruby support to the trunk of the WebKit source repository, thus making the ruby feature available in WebKit nightlies and Chrome dev-channel releases.

A simple example

The following is a simple example of what you can do with the ruby element; make sure to view it in a recent nightly or dev-channel release. Note that the text is an excerpt from the source of a ruby-annotated online copy of the short story Run, Melos, Run by the writer Osamu Dazai, which I came across by way of Piro’s info page for his XHTML Ruby add-on for Firefox (and which I mention a bit more about further below).

きのうの豪雨で山の水源地は<ruby>氾濫<rp>(</rp>
<rt>はんらん</rt><rp>)</rp></ruby>し、濁流
<ruby>滔々<rp>(</rp><rt>とうとう</rt><rp>)</rp>
</ruby>と下流に 集り、猛勢一挙に橋を破壊し、どうどうと
響きをあげる激流が、<ruby>木葉微塵<rp>(</rp>
<rt>こっぱみじん</rt><rp>)</rp></ruby>に<ruby>橋桁
<rp>(</rp><rt>はしげた</rt><rp>)</rp></ruby>
を跳ね飛ばしていた。

If you don’t happen to have Japanese fonts installed, here’s a screenshot of the source for reference:

ruby source markup

Notice that the actual annotative ruby text (which I’ve highlighted in yellow in the source just for the sake of emphasis) is marked up using the rt element as a child of the ruby element, and the text being annotated is the node that’s a previous sibling to that rt content as a child of the ruby element. The final new element in the mix is the rp element, which is simply a way to mark up the annotative ruby text with parenthesis, for graceful fallback in browsers that don’t support ruby.

So here’s the rendered view of that same text:

見よ、前方の川を。きのうの豪雨で山の水源地は氾濫(はんらん)し、濁流滔々(とうとう)と下流に集り、猛勢一挙に橋を破壊し、どうどうと響きをあげる激流が、木葉微塵(こっぱみじん)に橋桁(はしげた)を跳ね飛ばしていた。

And here is a screenshot of how it should look in a recent nightly or dev-channel release:

ruby rendered view

Notice that the annotative ruby text is displayed above the ruby base it annotates. If you instead view this page in a browser that doesn’t support the ruby feature, you’ll see that the ruby text is just shown inline, in parenthesis following the ruby base it annotates. So the feature falls back gracefully in older browsers.

If you’re not accustomed to reading printed books and magazines and such in Japanese, you may be feeling underwhelmed by the example above. But for authors and developers and content providers in Japan who want to finally be able to use on the Web this very common feature of Japanese page layout from the print world, getting ruby support into WebKit is a huge win, and something to be very excited about.

Support in other browsers

Current versions of Microsoft Internet Explorer also have native support for ruby, and you can also get ruby support in Firefox by installing Piro’s XHTML Ruby add-on (and for more details, see his XHTML ruby add-on info page) — so we are well on the way to seeing the HTML5 ruby feature supported across a range of browsers.

by Michael(tm)Smith (mike@w3.org) at November 13, 2009 02:36 AM

November 12, 2009

Ivan's Blog

Pay to be free…


I may not be well informed, so this may be a known approach for some of you, but it is the first time I see this…

There has been a tension between (scientific) publishers and authors for a while on whether one is allowed to put one’s publication on the Web. When dealing with traditional publishers the author usually gives away his/her copyright and the papers are rarely available on the Web (which is a source of constant frustrations to readers). Fortunately, this is not always the case; for example, the proceedings of the World Wide Web conference series are published by ACM, but the papers are nevertheless available on the Web for free (thanks to IW3C2).

Well, a counter-proposal from a publisher is quite amazing. A Hungarian publisher, Akadémiai Kiadó, offers authors a deal, called the “Optional Open Article”: if you pay the nice sum of 900€, then the paper is also put onto an on line edition and is made freely available on the Web. (The fact that it is then freely available is clear in the agreement posted on the web site). Pay for your freedom. Isn’t this wonderful?

And, to make it clear: this is a very prestigious publisher in Hungary, is related to the Hungarian Academy of Sciences and, therefore, the prime publishers locally of Hungarian scientists…

I find it appalling.  But this may only be me.

Posted in Social aspects, Work Related Tagged: publishing

by Ivan Herman at November 12, 2009 04:00 PM

November 11, 2009

W3C Q&A Weblog

W3C Cheatsheet for developers

Yesterday, as part of the W3C Technical Plenary day, I got the opportunity to introduce a new tool that I had been working on over the past few weeks, the W3C Cheatsheet for Web developers.

Screenshot of the W3C Cheatsheet on a phone

This cheatsheet aims at providing in a very compact and mobile-friendly format a compilation of useful knowledge extracted from W3C specifications — at this time, CSS, HTML, SVG and XPath —, completed by summaries of guidelines developed at W3C, in particular the WCAG2 accessibility guidelines, the Mobile Web Best Practices, and a number of internationalization tips.

Its main feature is a lookup search box, where one can start typing a keyword and get a list of matching properties/elements/attributes/functions in the above-mentioned specifications, and further details on those when selecting the one of interest.

The early feedback received both from TPAC participants after the demo and from the microblogging community has been really positive and makes me optimistic that this tool is filling a useful role.

This is very much a first release, and there are many aspects that will likely need improvements over time, in particular:

  • I would like the cheatsheet to cover more content — from specifications not yet released as standards as well as from topics not yet covered (e.g. JavaScript interfaces),
  • some people have reported that there might be accessibility problems with the current interface, that I’m eager to fix once I get specific bug reports,
  • the cheatsheet doesn’t work in IE6 (and probably even in later versions), and it would be nice to make it work at least somewhat there.

The code behind the cheatsheet is already publicly available, and I’m hoping others will be interested to join me in developing this tool — I’m fully aware that the first thing that will need to get others involved will be some documentation on the architecture and data formats used in the cheatsheet, and I’m thus hoping to work on that in the upcoming few weeks.

In the meantime, I very much welcome bug reports and suggestions for improvements, either by private email to me (dom@w3.org) or preferably to the publicly archived mailing list public-qa-dev@w3.org.

by Dominique Hazaël-Massieux at November 11, 2009 04:32 PM

November 07, 2009

Don't call me DOM

Using /etc/xml/catalog with org.apache.xml.resolver

I have just reported the bug in the w3c-dtd-xhtml Ubuntu package that had prevented me from using the Apache XML Catalog resolver to use local XHTML DTDs rather than the on-line ones when using the Saxon XSLT processor.

Hitting the on-line DTDs on every invokation of Saxon unnecessarily burdens the W3C Web site. I had already found guidance on how to use the Apache XML Catalog resolver to avoid that, but it wouldn’t work with the default XML catalog list provided by Ubuntu in /etc/xml/catalog for the XHTML DTDs.

After some investigation, it appeared that the use of a bogus URL as a SystemID in the intermediaries XHTML catalog files prevents the proper parsing of these catalogs, and thus make the local DTDs undiscoverable.

With the patch provided in my bug report, I can now happily use /etc/xml/catalog with saxon and never hit the network when transforming XHTML files:


java -cp /usr/share/java/xml-commons-resolver-1.1.jar:path-to-saxon8/saxon8.jar -Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1 net.sf.saxon.Transform -novw -r org.apache.xml.resolver.tools.CatalogResolver         -x org.apache.xml.resolver.tools.ResolvingXMLReader         -y org.apache.xml.resolver.tools.ResolvingXMLReader XMLFile XSLTFile

by Dom at November 07, 2009 11:02 AM

November 06, 2009

Ivan's Blog

ISWC2009 2-3


Second day

In fact, there is much less to say… In the morning I was on two workshops; I was at the Uncertainty Reasoning on the SW one for a while, but then I was asked to participate at a panel at the Semantics for the Rest of Us one, so I had to switch. This was a bit unfortunate, because I could not really ‘dive in’ to any of the two. And my afternoon was taken up by ‘networking’, catching up with some people on many many issues that are not worth blogging (yet?).

I listened to Kathryn Laskey’s presentation on how to combine probability theory in the mathematical sense (the good old Kolmogorov axiomatic theory on probability that I learned at university in a distant past…) with first order logic. I cannot claim to have really understood all the details but it made me curious enough to put reading her paper on my to do list…

As for the panel “Little vs Large Semantics: What’s next for the Semantic Web languages?”, with Leigh, Kendall and Ora on the panel besides me… it was not that exciting, I must admit. Maybe the main message I take away from it was the passionate request of Chris Welty to re-open RDF (see also Pat’s keynote below!).

Third day (well, first real conference day)

Preamble: I would have wanted to add links to papers. And I couldn’t: I have not found the papers on the Web. Neither on Springer’s site nor elsewhere. I may have missed a reference somewhere, if somebody knows then tell me. But if the papers are not available, I think it is a shame…

The conference began with a keynote of Pat Hayes. Entertaining and also thought provoking; Pat is a great speaker. What really interested me is his talk on ‘RDF Redux’; I was actually anxious to listen to that one at SemTech last June but he had to call this off back then. So he repeated it here. This is typically the kind of talk that needs more thinking afterwards to understand it (and Pat has promised to write it down!), but he essentially proposed to re-think and re-do some of the fundamentals of RDF semantics. Instead of set-based model theory which we have today, and which makes the treatment of b-nodes, shall we say, a bit complicated (some would use harsher words:-) we should consider RDF graphs as ‘things’ on a ’surface’ (think of it as a real surface on a sheet of paper) and b-nodes are just ’scratches on that  surface’. (A bit like ‘context’ of a graph?) Because these surfaces are different from one graph to the other, when a merge occurs then in fact a new surface is created where the unified graph is put, and the issue of b-nodes becomes natural (instead of the ‘renaming’ procedure that the current semantics document describes). Pat claims that the whole semantics could be re-written that way and none of the current RDF implementations would change. But one can go one step further: there may be different kinds of surfaces (eg, negations) and surfaces can have a name (a bit like named graphs) and all can be put together to provide a powerful semantics for these entities. His further claim was that such an extended semantics of RDF could be powerful enough to describe, conceptually, RDFS or even OWL, ie, the semantics should not be layered any more.

No way I would accept all this argumentation on face value:-), so I have to think about this and, mainly, read whatever Pat may want to write down to understand it. In the meantime, I may have to look into the concepts of conceptual graphs, and the Peircian notation of logic that Pat referred to as inspiration…

A more general take away (see also Chris’ remark above): maybe it is time to look into RDF again? A scary thought. Touching to something that is fundamental on the SW has to be done with extreme care… We will see.

There were two papers in the same session that were very close in subject and topic: one of Jesse Weaver and Jim Hendler on the parallel materialization of RDFS graphs and the one of Jacopo Urbani et al on using MapReduce for RDFS reasoning. (Sigh…, this is where I would like to put a reference!) Both aimed at similar challenges, namely the materialization of RDFS inference results of a graph using parallel computing methods. And there was one more similarity: both had some sort of a classification of the rules in the rule set described in the RDF Semantics document to help improving the processing. (Eg, to analyze which rules should be duplicated among processing nodes and which one can be handled without, or which one need a special treatment for a map-reduce pair). It seems that it would be worthwhile to see if some of these classifications (‘ontology rules’ and the like) could be extended to OWL 2 RL (Jesse Weaver told me afterwards that they want to look into this).  But, to put things into perspective: we are the points when billions of triples can be expanded with relative ease. Who would have thought a few years ago? There was also a remark on one of Jesse’s slide (I do not remember the exact wording) which said that RDFS is insanely parallelizable:-)  It was a really interesting session.

The SW in use session included  a paper from Landong Zuo et al “Supporting multi-view network analysis to understand company value chains”.  Integrating a bunch of data in the UK on companies, integrating them in an RDF store, and let users get information on the ‘value chains’, ie, how companies relate to one another as producers/consumers. Technically, the interesting point was the fact that users had the possibility to interactively add new relationships, new classifications to the system, essentially new rules that could be evaluated. The whole system seemed to be a really cool, a well engineered and well functioning machinery. As the speaker put it, although all conclusions drawn from the system could be found by the users by analyzing databases, but it would take weeks to do what this system can give them in a few minutes. This is exactly the kind of message we need for the outside  world about the usefulness of Semantic Web technologies.

On another session Martin Szomszor presented an experiment they conducted at the ESWC conference, combining RFID-based personal badges with an underlying SW system. The resulting system could be used to show personal contacts among delegates, could help people find others with similar interest, could retrace later whom one met at what point (“I remember talking to that chap, but I do not remember his name!”), etc. Lots of privacy issues, for example, but I would have liked to see that in practice, that is for sure!

Stéphane Corlosquet’s presentation on SW and Drupal was really exciting. I already knew about the plans of Drupal 7 to incorporate RDF management from the start, that all Drupal 7 pages will be annotated via RDFa. The RDFa community has been  fairly excited about that for a while now. But the work done by Stéphane and others provide some additional modules that makes it easy to add a SPARQL endpoint to a Drupal based site easily, to import other RDF content, or to manage the vocabularies used on the pages and the like. They already have such a system running with the current Drupal, but these modules will become part of the standard Drupal 7 module set that one can download from the drupal site. And that is cool.  It significantly lowers the barrier to build Web sites that are prepared to be part of the Linked Data cloud, even if the system administrators are not SW experts. I expect this to open up quite a lot of possibilities…

Off to the next day! More paper and the presentation of the Semantic Web challenge finalists…

Posted in Semantic Web, Work Related Tagged: drupal, parallel computing, probabilistic reasoning, RDF, RDFa, RDFS, SPARQL

by Ivan Herman at November 06, 2009 10:31 AM

Decentralyze - Programming the Data Cloud

Venn diagram of rule system elements, grouped into RIF profiles


This morning, I gave the keynote address (my slides) at RuleML 2009. I assumed the audience would be fairly familiar with rule systems and rule technologies, but not necessarily with RIF, the Semantic Web in general, or my sense of the future of the Semantic Web (for which RIF is important).

Those slides may be boring for some of you. The interesting new bits:

  1. I made a new diagram for the RIF dialects. During the talk, I presented it as a successive reveal: the chaos of rule system features, the BLD grouping, the PRD grouping, and the Core intersection. Here’s the final slide:

    Venn diagram of rule system elements, grouped into RIF profiles

  2. I wanted to convey that the Semantic Web means real change, and I wanted emotional impact, so I took a shotgun approach and enumerated a list of likely data sources that was big enough to have some surprises for most folks; then I moved into a long list of things you could do with that data. Some things on the list were boring (having impact only by showing how long the list is), but some got the desired wide eyes and shocked sounds as people realized this just might happen. I think it went over well.

    My list of types of data we’ll be seeing:

    • From producers: product information
    • From sellers: product and service offerings
    • Customer support (instructions, upgrades, …)
    • Social network (who you trust, who interests you)
    • Personal information shared with friends
    • Public records (financial, legal, political, …)
    • Science (medical, environmental, economic, …)
    • News, Blogs, Public photos, videos
    • Event listings (performances, meetings)
    • Review, opinions, product experiences, preferences
    • Personal location, location history
    • Financial transactions

    Which led into my general scenario: you’re in a store about to buy something, but first you scan it with your phone and look up a little more information about it. What might you look up?

    • Its price at other stores, nearby
    • Its price for delivery, and how how long you’d have to wait
    • Maybe: where it was made, and under what conditions
    • Is it’s producer a good corporate citizen?
    • Does its producer agree with your political views (uh oh)
    • How many houses does its CEO own?
    • Did your spouse/housemate just buy some? Or something like it?
    • How your friends feel about this product
    • How Consumer Reports (or some such service) reviewed it
    • Product liability suits
    • Endorsements
    • Maybe: Payment for Endorsements
    • For electronics: compatibility information
    • For mechanical items: How repairable is it? MTTF, MTTR
    • For food: nutritional information, health benefits and risks
    • Demographics of this brand, these products

That’s all for now. I saw the Bellagio Fountain from my 29th floor
hotel room late last night, but I’d like to see it up close.

by sandhawke at November 06, 2009 01:35 AM

November 05, 2009

MWI Team Blog

W3C Cheatsheet for developers

Screenshot of the W3C Cheatsheet on a phone

I’ve been working over the past few weeks on a nifty little tool that summarizes a number of W3C technologies, including the Mobile Web Best Practices, in a mobile-friendly format, called the W3C Cheatsheet.

See my post in the W3C blog to learn more about it, and send your feedback!

by Dominique Hazael-Massieux at November 05, 2009 10:00 PM

November 02, 2009

Ivan's Blog

Promise hold (NYT and the LOD)


I was at the SemTech conference in June when Evan Sandhaus from the New Your Times gave a keynote and when he announced that the NYT would gradually publish many of their data as Linked Data using Semantic Web technologies. Unfortunately, I had to leave on the last day of the ISWC2009 last week when they announced to keep their promise, and release the first 5,000 subject headings tags to the LOD. Which is really great news.

I remember Evan saying in Santa Clara (maybe privately, I do not remember that detail) that they are newcomers in this area, and it will be difficult to get it right (and, well, there are bugs, as, for example, Eric Hellman or Richard Cyganiak pointed out in their respective blogs). But I think we should really applaud when such a promise is held…

Posted in Semantic Web, Work Related Tagged: Linked Data, new york times

by Ivan Herman at November 02, 2009 10:41 AM

October 31, 2009

Decentralyze - Programming the Data Cloud

sandhawke


Here’s what I think should be standardized at some point, soon, in the Semantic Web infrastructure. These items are at various levels of maturity; some are probably ready for a W3C Working Group right now, while others are in need of research. They are mostly orthogonal and most can be handled in independent efforts. (I would lean against forming a single RDF Working Group to handle all of this; that would be slower, I think.)

To be clear, when I say “RDF 2″ I mean it like OWL 2: an important step forward, but still compatible with version 1. I’m not interested in breaking any existing RDF systems, or even in causing their users significant annoyance. In some traditions, where the major version number is only incremented for incompatible changes, this would be called a 1.1 release. In contrast, at W3C we normally signal a major, incompatible change by changing the name, not the version number. (And we rarely do that: the closest I can think of is CSS->XSL, PICS->POWDER, and HTML->XHTML). The nice thing about using a different name is it makes clear that users each decide whether to switch, and the older design might live on and even win in the end. So if you want to make deep, incompatible changes to RDF, please pick a new name for what you’re proposing, and don’t assume everyone will switch.

This is partially a trip report for ISWC, because the presentations and especially the hallway and lounge conversations helped me think about all this.

Note that although I work for W3C, this is certainly not a statement of what W3C will do next. It’s not my decision, and even if it were, there would be a lot of community discussion first. This is just my own opinion, subject to change after a little more sleep. Formally the decisions about how to allocate W3C resources among the different possible standards efforts are made by W3C management guided by the the folks who provide those resources, via their representatives on the Advisory Committee (AC). If the direction of the W3C is important to you or your business, it may be worthwhile to join and participate in that process.

1. RDF and XML interoperation

There’s a pretty big divide between RDF and XML in the real world. It’s a bit like any divide between different programming languages or different operating systems: users have to pick which technology family to adopt and invest in. It’s hard to switch, later, because of all the investment in tools, built systems, educations, and even socially networks. (People who use some technology build social and professional relationships other people who use the same technology. Thus we have an XML community, an RDF community, etc. Few people are motivated to be in both communities.)

I think we should have better tools for bridging the gap, technologically, so that when data is published in XML, it’s easy for RDF consumers to use it, and when the data is published in RDF, it’s easy for XML consumers to use it.

The leading W3C answer is GRDDL, which I think is pretty good, but could use some love. I’d like to see support for the transforms being in Javascript, which I think is probably the dominant language these days for writing code that’s going to run on someone else’s computer. It certainly has a bigger community than XSLT. I’d probably support Java bytecode, too.

I would also like to see some way to support third-party GRDDL, where the transform is provided by someone not associated with either the data provider or data consumer. Nova Spivack gave a keynote where he talked about this feature of T2. They’re focused on HTML not XML, but the solution is probably the same.

Beyond GRDDL, I think there’s room for a special data format that bridges the gap. I’ve called it “rigid rdf” or “type-tagged xml” in the past: it’s a sub-language of RDF/XML, or a style of writing XML, which can be read by RDF/XML parsers and is also amenable to validation and processing using XML schemas. Basically you take away all choices one has in serializing RDF/XML.

I note the The Cambridge Communiqué is ten years old, this month. It proposed schema annotation as an approach, and that’s not a bad one, either. I haven’t heard of anyone working on it recently, but maybe that will change if the XML community starts to see more need to export RDF.

Amusingly, while I was talking to Gary Katz from MarkLogic about this, he mentioned XSPARQL as a possible solution, and I pointed out Axel Polleres (xsparql project leader) was sitting right next to us. So, they got to talk about it. XSPARQL doesn’t excite me, personally, because I don’t use either SPARQL or XQuery, but objectively, yes, it might solve the problem for some significant userbase.

2. Linked Data Inference

For me, an essential element of a working Linked Data ecosystem is automatic translation of data between vocabularies. If you provide data about the migration of frogs in one vocabulary, and my tools are looking for it in another one, the infrastructure should (in many cases) be able to translate for us. We need this because we can’t possibly agree on one vocabulary (for any given domain) that we’ll all use for all time. Even if we can agree for now, we’ll want this so that we can migrate to another vocabulary some time in the future.

Inference using OWL (and its subsets like RDFS) provides some of this, but I don’t think it’s enough. RIF fills in some more, but the WG did not think much about this use case, and there’s might be some glue missing. Maybe we can get WG Note out of RIF to help this along.

I’d like us to be clear about first principles: when you’re given an RDF graph, and you’re looking for more information that might be useful, you should dereference the predicate IRIs to learn about what kinds of inference you’re entitled to do. And then, given resources and suitable reasoners, you should do it. That is, the use of particular IRIs as predicates implies certain things, as defined by the IRI’s owner. The graph is invoking certain logics by using those IRIs. (Of course you can always infer things that were not implied, but as among humans, those “inferences” are really just guesses you are making. They have quite a different status from true implications.)

If this is put together properly, and the logics are constructed in the right form, I think we’ll get the dynamic, on demand translation I’m looking for. I imagine RIF could be very useful for this, but reasoner plugins written in Javascript of Java bytecode could be a better solution in some cases.

Some of my thinking here is in my workshop keynote slides, but later conversations with various folks, especially Pat Hayes and TimBL, helped it along. There’s more work to do here. I think it’s pretty small, but crucial.

3. Presentation Syntaxes

RDF, OWL, and RIF all have hideous primary exchange syntaxes and some decent not-W3C-recommended alternative serializations. I’m not really sure what can practically be done here that hasn’t been done.

At very least, I’d like to see a nice RDF-friendly presentation syntax for RIF. A bit like N3, I suppose. I did some work on this; maybe I can finish it up, and/or someone else can run with it.

OWL 2 has 3+n syntaxes, where n is the number of RDF syntaxes we have. Exactly one of those syntaxes is required of all consumers, for interchange. I’ll be interested to see how this plays out in the market.

4. Multi-Graph Syntax

Most systems that work with RDF handle multiple graphs at the same time. Sometimes they do this by storing the triples in a quad store, with the fourth entry being a graph identifier. This works pretty well, and SPARQL supports querying such things.

We don’t have a way to exchange multiple graphs in the same document, however. N3 has graph literals (originally called contexts), and there was some work under the term named graphs, which is kind of the opposite approach.

Personally, I don’t yet understand the use case for interchanging multiple graphs in one document, so I’m not sure where to go with this.

Hmmm. I guess RIF could be used for this. You can write RDF triples as RIF frame facts, and the rif:Document format allows multiple rulesets, each with an optional IRI identifier, in the same document. ETA: RIF also gives you an exchange syntax where you can syntactically put literals in the subject and use bnodes as predicates, if you want. But now you’re technically exchanging RIF Frames instead of RDF Triples.

5. RDF Graph Validation

When writing software that operates on RDF data, it’s really nice to know the shape of the data you’ll find. It’s even nicer, if software can check to see if that’s actually what you got. And if reasoners can work to fill in any missing peices.

I don’t exactly understand how important or unimportant this is. It’s closely related to the Duck Typing debate. Whatever mechanisms make duck typing work (eg exception handling, reflection, side-effect-free programming) probably help folks be okay without graph validation. But I think folks trained on C++/Java or XML Schema would be much happier with RDF if it had this

The easiest solution might be using rigid RDF. One could probably also do it with SPARQL, essentially publishing the graph patterns that will match the data in the expected graphs.

The most interesting and weird approach is to use OWL. Of course, OWL is generally used to express knowledge and reason about some application domain, like books, genes, or battleships. But it’s possible to use OWL to express knowledge about RDF graphs about the application domain. In the first case, you say every book has one or more authors, who are humans. In the second case, you say every book-node-in-a-valid-graph has one or more author links to a human-node in the same graph. At least that’s the general idea. I don’t know if this can actually be made to work, and even if it can, it risks confusing new OWL users about one of the subjects they’re already seriously prone to get wrong.

6. Editorial Issues

Finally, I’d like some portions of the 2004 RDF spec rewritten, to better explain what’s really going on and guide people who aren’t heavily involved in the community. This could just be a Second Edition — no need for RDF 2 — because no implementations changes would be involved.

I’d like us to include some practical advice about when/how to use List/Seq/Bag/Alt, and reification, maybe going so far as to deprecate some of them (IMHO, all but List). Maybe bring in some of the best-practice stuff on publishing and n-ary relations.

I understand Pat Hayes would like to explain blank nodes differently, explicitly introducing the notion of “surfaces” (what I would call knowledge bases, probably). Personally, I’d love to go one step farther and get rid of all “graph” terminology, instead just using N-Triples as the underlying formalism, but I might a minority of one on that.

ETA: Of course we should also change “URI-Reference” to “IRI”, and stuff like that.


Okay, that’s my list. What’s yours? (For long replies, I suggest doing it on your own blog, and using trackback or posting a link here to that posting.) Discussion on semantic-web@w3.org is fine, too.

by sandhawke at October 31, 2009 12:28 AM

October 29, 2009

Ivan's Blog

ISWC2009 4-5


Fourth day

Shame on me, but I missed the morning keynote… I was a bit late arriving to the conference site and I got stuck in a conversation at breakfast. Things happen…

The most notable event in the morning, at least for me, was the SPARQL WG panel. All members of the Working Group (me included) were on the panel and the room was full. I mean, full, people were standing in the back. And I regard that as a success by itself, it shows not only the overall importance of SPARQL, but the real interest around the new version, ie, SPARQL 1.1 (in case you have missed it, the first working draft has just been published a few days ago). Lee Feigenbaum (co-chair of the group) gave a quick overview of the new features and then questions came.

The difficulty of the SPARQL 1.1 work is that it has to find a balance between what is realistic to standardize in a relatively short time frame and what could be good to see in a new query language. As a consequence, there are features that the community has discussed but have not made it into the document, or only in a simple format. That came up during the discussion but I had the impression that the audience, by and large, understood this balance. Actually, for some, the set of new features were even too much for an efficient implementation. I have the feeling that  the WG will have to publish a separate conformance document (a bit like OWL 2 has), because there is a certain confusion on whether a conforming SPARQL implementation will have to implement, say, update or inference regimes or not. That clearly came up through the questions. Anyway, remember one email address (yes, it is a bit of a mouthful): public-rdf-dawg-comments@w3.org this is where comments have to be sent on SPARQL 1.1!

I chaired a session on the use track in the afternoon.  The paper of Daniel Elenius et al on reasoning about resources (for military exercises) was interesting to me because it was based on reasoning with relatively large OWL ontologies plus rules. The OWL ’side’ was not very complex (Daniel referred to DLP, today I would say probably OWL 2 RL) but extended with extra rules. What this shows that when RIF will be finished and published, the combination of OWL with RIF may become very important for tons of practical applications. (As an aside, a nice little joke from Daniel: what is the system used by the military today when planning for exercises? The system is called BOGSAT. It stands for ‘Bunch Of Guys Sitting Around a Table’…)

Roland Stuhmer gave a very different style presentation on how user events (clicks, combination of clicks, etc) can be collected, categorized, and integrated into an application, analyzed with some rules for, eg, targeted ads. The system is based on harvesting not only the structure of the Web page, but annotations appearing in the Web page via RDFa. The result is an RDF structure describing the events that can be sent to a server, analyzed locally, distributed, etc. Nice usage of RDFa, but also important to have a Javascript API that can retrieve the RDF triplets from the RDFa structure attached to a specific node. (B.t.w., the old graphics standards of the 80’s and 90’s, called GKS or PHIGS, had notions of combined event structures with different event types. I do not remember all the details any more, but may it be worth looking at those again in a modern setting?)

Personally, the highlight of the day was the presentation of the semantic web challenge finalists. I was member of the jury, which meant that I had to review the submissions in advance and we had two very enjoyable discussions with the rest of the jury on the submissions. We had the first selection the day before, and this time all finalists gave their presentations and demos. And it was a tough task to choose (that is why we had such long discussions:-) because, well, the submissions were great overall. I do not really want to analyze each of the entries; I do not think it would be appropriate for me in this position. But the winner entry for the challenge, namely TrialX, really made a great impression on me. In short, the application is a consumer-centric tool through which patients can find matching clinical trials where they want to participate; it also helps those who organize those trials, etc. It is some sort of a matchmaking tool using all kinds of medical ontologies and vocabularies, public health record data and the like. We should realize the importance of this: here is a great Semantic Web application, winner of the challenge, which is really an application, not only demonstration, already deployed on the Web (soon as an iPhone app, too), and, to be a bit dramatic, may (and possibly has already) save lives. What else to we want as a proof that this technology is not only an academic exercise any more?

Fifth day

Only a partial day for me, as far as the conference goes, because I had to fly out before the end… But I could listen to the last keynote of the conference, ie, that of Nova Spivack.

Not surprisingly, Nova talked about Twine-2, a.k.a. T2. I did not really know what T2 was to be, I only heard that Twine, ie, T1, is moribund. As Nova acknowledged, it is too complicated, it is too hard for users to really figure it out; in fact, most of the users used it for search. Which is not the strongest feature of T1 in the first place.

So T2 is (well, will be) all about semantically backed search. It semantically indexes the Web, with an attempt to extract semantic information from the pages. The user interface would then be some sort of, essentially, faceted interface that would automatically classify the search hit results into different tabs; the user can use these tabs, drill down along other categories, etc. So far nothing radically new, though the user interface Nova showed was indeed very clean and nice. All this is done, internally, via vocabularies/ontologies, using RDF, RDFS, or OWL.

The interesting aspect of T2 (at least as far as I am concerned) is the incorporation of collective knowledge. First of all, T2 will include a system whereby users can add vocabularies that T2 will use in categorization. Users can get back those ontologies in OWL/RDF, they can improve them, etc. The other tool they will provide is a means to help semantically index pages that are, by themselves, not semantically annotated. This can be done via a Firefox extension; users can identify parts of the web pages (I presume, essentially, the DOM nodes) and associate these with classes of specific ontologies. The extension produces an XSLT transformation that can be sent back to the T2 system. Some social mechanism should of course be set up (eg, webmasters annotating their own pages should get a higher priority than third party annotators) but, essentially, it is some sort of a GRDDL transformation by proxy: T2 will have information on how to find transformation to semantically index specific pages without requiring the modification of the pages themselves (in contrast to GRDDL where such transformation is to be referred to from the page itself).

Of course, the system was a bit controversial in this community; indeed, it was not clear whether T2 would make use of the semantic information that do exist in pages already (microformats, RDFa, …) let alone the Linked Open Data information that is already out there. When asked, Nova did not seem to give a clear answer though, to be fair, he did not specifically say no and he also said that the semantic index might be put back to the public in the form of linked data. To be decided. It is also not fully clear whether those proxy-GRDDL transformations would be available for the community at large (hopefully the answer is yes…). It will be interesting to see how it plays out (T2 comes out in beta sometimes early 2010). Certainly a project to keep an eye on.

From a slightly more general point of view it is also interesting to note that two out of the three Semantic Challenge winners are also semantic search engines with different user interfaces (though sig.ma and VisiNav definitely do use the LOD cloud, no question there…). Definitely an area on the move!

I had the time and, frankly, the energy to really listen to only one more paper in the regular track, namely the paper on functions of RDF language elements, by Bernhard Schandl. A nice idea: imagine a traditional spreadsheet, where each cell is a collection of resources from an RDF Graph, or functions that can manipulate those resources (extract information, produce new set of resources, etc). Just like a spreadsheet, if you modify the underlying graph, ie, the resources in a cell, everything is automatically recalculated. Because, just like for a spreadsheet, a function can refer to the result of another function in another cell, one can do fairly complicated transformation and information extraction quite easily. Neat idea, to be tried out from their site.

That is it for ISWC2009. I obviously missed a lot of papers, partly because social life and hallway conversations sometimes had the upper hand, and sometimes simply because there were too many parallel sessions. But it was definitely an enriching week… See you all, hopefully, at ISWC2010, in Shanghai!

Posted in Semantic Web, Work Related Tagged: health care, OWL RL, RDFa, Rules, semantic search, SPARQL, spreadsheet

by Ivan Herman at October 29, 2009 11:59 PM

W3C Q&A Weblog

W3C Developer Gathering Next Week; Registration Closes Today

Next week's W3C Developer Gathering will bring together some great speakers:

  • Leslie Daigle (ISOC) on Internet Ecosystem Health
  • Mark Davis (Unicode Consortium) on controversies around international domain names
  • Brendan Eich (Mozilla) on "ECMA Harmony and the Future of JavaScript"
  • Fantasai on CSS, with help and demos from the "CSS Strike Force": Tab Atkins, David Baron, Simon Fraser, and Sylvain Galineau
  • Philippe Le Hégaret (W3C) on community-built browser test suites.
  • Kevin Marks (OWF) on OpenID, OAuth, OpenSocial
  • Arun Ranganathan (Mozilla) on what's new in APIs

I will be hosting the gathering (5 November in the afternoon). We've planned for some fun give-aways to be revealed at the meeting. Registration closes today, although we will admin walk-ins at a higher rate next week.

If you can't join us in person, you can follow the meeting on IRC; more details are available on the meeting page.

I hope you will join us next week.

by Ian Jacobs at October 29, 2009 03:30 PM

October 27, 2009

Advogato blog for connolly

27 Oct 2009

Roach motel indeed; sidekick XMLRPC service is no more

I went back to the most recent (2008-03) of my calendar sync items in the DIG breadcrumbs research blog and got hipwsgi.py from palmagent fired up, only to get "Connection refused" from pimapi.prod1.dngr.net.

Uh-oh.

I thought I could write off the sidekick altogether at that point, but:

  1. Organizing weekend todo lists works with the sidekick in a way that I haven't managed to duplicate: lists on paper don't sort themselves by priority and due date; Google calendar tasks don't sync with the sidekick (nor with any usable android app that I could find).
  2. How do I call my brother from the car? Using a mobile phone without my contacts would be like using the Web without DNS.
  3. Android's crummy appointment notification reminded me how much I rely on a gizmo to beep when I'm supposed to stop coding and go to my appointment with the Doctor. Delegating this to a gizmo goes back to ~1996 when I first got a Psion PDA. (see some python code for psion files)

I'm not sure how I'm going to muddle thru this mix of google calendar/contacts stuff and sidekick phone... maybe I can use SMS reminders for calendar stuff, but you never know how long those things are going to take to be delivered; T-Mobile seems to deliver them 13 hours later in some cases.

For now, I'm going to pickle some state...

#swig notes

old calendar notes/links, circa 1999/2000

palmagent code: r423:4a5a8b2d237c 2009-05-01)

repository of sidekick data from palmagent/hipwsgi.py: 32:31a84807d214 2009-02-26

another repository of my PIM data: 596:6faa7311f865 2009-04-2, 595:b20e1f7fa468 2008-09-10

October 27, 2009 05:27 PM

Ivan's Blog

ISWC2009 I.


20091026046This year’s ISWC is held in Chantilly, Virginia. In a nice conference building in a beautiful park with autumn colours that, for reasons I do not really know, is always much more striking and amazing in America than in Europe. It is a bit of a pity that it is so far from Washington but, well, you can’t get it all…

First day: tutorials.

(For me, because there were also a bunch of workshops.) In the morning I was at the tutorial on how to consume Linked Open Data, by Juan Sequeda, Jammie Taylor, Patrick Sinclair, and Olaf Hartig; in the afternoon I went to the one on legal and social frameworks for sharing data on the Web, by Leigh Dodds, Jordan Hatcher, Tom Heath, and Kaitlin Thaney.

Juan and his  friends had actually a difficult task, and that became clear right at the start during the intro of Juan: part of the audience did not really know what LOD was all about, whereas there were also others who were, shall we say, old timers on the subject. I think the speakers did a really good job in navigating through these constraints, making short introductions to what LOD is all about but talking about issues and showing examples that were interesting for all of us. Kudos to that. Issues were raised by the audience that were really to-the-point (who should create sameAs,  links, how trustworthy are they, how to choose vocabularies and how they map to one another, etc) and, in his closing slides, Juan actually gave a list of the open  R&D issues in LOD. Worth looking at those (and no reason to repeat the list here…). B.t.w., the slides of the tutorial are on line.

One very interesting technology I heard about that, shame on me, but I did know was a tool based on a traversal based execution scheme for SPARQL called sqin.  Olaf did a presentation on that. What essentially happens is  as follows. At the beginning the default graph of the SPARQL query is empty. However, the system would systematically fetch RDF triples by dereferencing URI-s in the query pattern, adding those to the default graph. The query is matched against it, some variable will match thereby ‘adding’ new URIs to the pattern. And the process starts again, possibly yielding a complete solution (or more) to the original query. At the end of the process, solutions will be found on the Web, even if the system itself does not have any ‘real’ data behind it at the start. Of course, no one can secure that all solutions will be found, and you need some ’seed’ URI-s in the original query pattern, but it nevertheless looks like a very powerful tool to explore, say, the LOD.  Very interesting!

Then there were some examples on how LOD is used. Jammie talked about Freebase, and how Freebase is, in fact, a way for everybody to easily add information to the LOD (after all, Freebase works like a wiki, and all the data is reflected on the LOD).  He also had a very important message that is worth repeating (go to his slides for the rest): it takes very little effort to add a republishing capability to your triples store based application, thereby extending the general LOD. So… do it! This is how the system evolves…

Patrick described a quite geeky system that the BBC folks have developed (hopefully will become public soon): take the BBC’s musical data in RDF (which is available), plus the LOD cloud, plus… an IRC bot. What you get is an IRC channel which will pick up data on music, including the sound tracks, photos, etc, and display it on the machine. I presume you  can give orders and preferences through the IRC. Obviously a geeky stuff not for the masses:-) but shows what you can do…

The afternoon tutorial on the Legal and Social frameworks was of course very different. I think one of the many, but maybe the most important aspect of this tutorial is that… it took place! This may sound a bit strange but it is important for all our community to realize that we will have issues around copyright, licensing, waivers, etc, when it comes to the Web of Data, whether we like these issue or not. Tutorials like this, written notes and information, etc, are essential. Let us face it: most of us do not understand the details of the legal issues. So I was simply listening and trying to absorb what I heard…

I do not want to repeat the details of what I heard here; one thing I learned over the years is that I should leave legal argumentations and descriptions to those who really understand that. Ie, look at the slides. It is worth it. But just to show the complexities: I did not know or fully realize that there are major differences what can or cannot be copyrighted among countries: for example, a phone book cannot be copyrighted in the US or Europe, but can in Australia. That the seemingly simple notion of ‘attribution’ can, in fact, become an endless pit when it comes to data and the queries thereof (eg, if I have a filter in a query that results in data, should I give an attribution to the fact that were, in fact, filtered out?). Etc.

There is also a takeaway message for me (though it may be quite trivial) among the things I learned. Tom showed some practical examples on how can one add, say, licensing information to data by adding some RDF triples. However, for a larger data set the licensing may be different within the dataset. Eg, if you retrieve data from somewhere, and you enrich it with additional metadata, the metadata itself may have a different licensing (it is yours) than the data that you use (which may have its own licence). What this means is that when you organize your data internally, you should think about the licensing information you will add well in advance: organize your URI-s accordingly, for example. If you don’t, and you want to add license at the end, you might find yourself in trouble! Sounds like a simple message, but it is important. (Reminds me of what accessibility people always say: if you take accessibility issues into account right at the beginning when you build up a Web site, it is not complicated; but if you have to add accessibility features after the facts, it may become hell…)

By the way, Leigh has made a kind of an overview of the current ‘blobs’ on the LOD cloud to see whether any kind of licensing information is available or not. He has an overview of the results in his slides. The main fact is: the majority of data sets has no information whatsoever (or, at least, nothing that can be found in about 10 minutes)…

It was a good day. Looking forward to the rest.

Posted in Semantic Web, Work Related Tagged: creative commons, data commons, Linked Data, Linked Data Cloud, science commons, SPARQL

by Ivan Herman at October 27, 2009 03:16 PM

October 22, 2009

Advogato blog for connolly

22 Oct 2009

G1 is so disappointing, I'm going back to the sidekick. Yes, the sidekick

I've been a happy sidekick user since December 2002. In fact, what really got me interested in the android/G1 was that Andy Rubin, the danger/sidekick lead designer, was working on it.

My first few minutes with the G1 were lots of fun: google street view with the accelerometer blew me away and I had downloaded a dozen apps in no time.

But while technically all these apps can do everything all at the same time, in practice, the experience sucks. When I have a thought to capture, as Nielsen's research shows, if I don't get .1 second response time from the home button, I become conscious of the mechanics, and after 1 second, I lose my train of thought.

Other critical day-to-day features such as "get my attention when I have an appointment or a text message comes in" don't work either. The G1 gives one little beep and puts an icon in the notification bar and then goes idle. If I happen to be in noisy traffic at the time, I lose. The sidekick continues to beep every 2 minutes, so that when I eventually get somewhere quiet, I'll notice.

And speaking of quiet, sound profile management on the G1 sucks. To put the phone in silent mode, you can hold down the red/end button until a menu appears. In big letters, it says "Silent Mode"; then it tiny letters under that, it says "sound is: on". Details, details, people!

Then, when you flip it to "sound is: off", it goes silent, but it doesn't vibrate. To put it in vibrate mode, you use the button on the side that controls the ringer volume, but you have to look at the screen to see when you've held it long enough. There's no one reliable gesture sequence for managing sound profiles.

Also, I'm forever forgetting to take the G1 back *out* of silent mode. I'm spoiled by the sidekick's scheduled sound profiles; every night at 11pm, it goes into "alarm clock" profile (where appointments that I set up ring loudly but incoming messages from others don't) and every morning at 8am, it goes back to normal mode. So even if I forget to take it out of silent mode, it's all set to go the next morning.

There's an award-winning 3rd party app (locale) for managing not just sound profiles but all sorts of other stuff like wifi and gps power-saving settings... and it's configurable not just by time, but also by GPS location, nearby wifi stations, and such. But... it doesn't work. That is: I couldn't recreate the simple "be quiet at night" configuration from the sidekick. Plus, it seemed to gunk up the performance of the gizmo.

As to the roach motel, no, I don't trust t-mobile/danger/Microsoft to manage my data; I keep my own copy using some homebrew software that uses their XMLRPC interface (in fact, I keep multiple copies sync'd with hg/mercurial).

The t-mobile web interface isn't nearly as nice as google's; that's probably the main thing I'll miss as I switch back to the sidekick. That and google maps (though the GPS stopped working about the 3rd time I dropped the G1).

The backlight on the screen was intermittent for a while, but power-cycling it would bring it back. Then, with the recent software update, the screen is dark all the time. So it's a choice between replacing the G1 and going back to the sidekick. (I'm keeping an eye on the palm pre and the iPhone is ubiquitous, but I extended my t-mobile contract by two years when I bought the G1 in Feb.)

Well, I just called T-Mobile customer service and asked them to switch me to the sidekick data plan. I guess we'll see how long it lasts.

see also: The Forgotten Sidekick


tags: mobile, android

p.s. older WearableGizmo notes suffer from in-progress get-out-of-Zope migration.

October 22, 2009 02:36 PM

October 16, 2009

MWI Team Blog

Device APIs on the way

Back in June, I noted that a new group that would work on Javascript APIs to access device features (such as a camera, an addressbook, a calendar, etc.) had been proposed for review to W3C Members.

Since then, not only was the group approved and started, but we even got our first publication out: a Working Group note describing the expected requirements for these device APIs.

Of course, that document may seem a bit abstract at a first glance: you'll see no API defined in there, nothing with which to play.

But if you think Device APIs are a great opportunity for the Web platform (on mobile and elsewhere), I strongly encourage you to take a look at that document and check if the requirements highlighted there match what you know you'll need from these APIs - and if they don't, please let the Working Group know!

by Dominique Hazael-Massieux at October 16, 2009 02:40 PM

Ivan's Blog

Seduce with free services?


I ran into this two times in a week. I hope it is just a coincidence…

The story is simple. You find some service on the Web which looks nice and helpful. There are various options: you may take a minimal service, which is free of charge, or you can also choose extra services for a fee. It sounds like a decent choice: if the minimal service fits your needs, you are happy, if you need more, you pay something. I presume we all use services like that.

But then… if you take the free option, you may get a mail after 2-3 years’ of  usage saying that sorry, the free service is discontinued next month; you are welcome to upgrade for the paying service, otherwise, well, good bye. As I said I got this type of mail twice in a week: one from a service giving a minimal synchronization of my phone’s calendar with Google’s, the other providing a simple email certificate for signing my mails. On a matter of principle I will not upgrade; I do not find this approach really acceptable.

So… will Gmail, WordPress, or other similar services decide that they have attracted enough customers, they can now start charging? As I said, I hope this was just a coincidence and not some sort of a general direction…

Posted in General, Private, Social aspects, Work Related

by Ivan Herman at October 16, 2009 07:01 AM

October 15, 2009

W3C Q&A Weblog

W3C Site Bugs!

We've received a number of helpful bug reports about the new site. I thought I should list a few here so that we can refer to them. We are working to have these particularly tricky ones fixed as quickly as possible.

  • In IE, if you select mobile or print modes, you can't get back to desktop mode.
  • In Safari, even if you select "desktop" mode you get mobile mode at narrow browser widths. Also, if you select "mobile" mode you get a mix of mobile and desktop at wider browser window widths.
  • In some browsers, you can't expand the expandable content sections; they snap back shut.

We are working on these fixes. I also welcome fix suggestions from the community. Thanks again to those who have sent comments to site-comments@w3.org

by Ian Jacobs at October 15, 2009 02:25 PM