November 07, 2009

Don't call me DOM

Using /etc/xml/catalog with org.apache.xml.resolver

I have just reported the bug in the w3c-dtd-xhtml Ubuntu package that had prevented me from using the Apache XML Catalog resolver to use local XHTML DTDs rather than the on-line ones when using the Saxon XSLT processor.

Hitting the on-line DTDs on every invokation of Saxon unnecessarily burdens the W3C Web site. I had already found guidance on how to use the Apache XML Catalog resolver to avoid that, but it wouldn’t work with the default XML catalog list provided by Ubuntu in /etc/xml/catalog for the XHTML DTDs.

After some investigation, it appeared that the use of a bogus URL as a SystemID in the intermediaries XHTML catalog files prevents the proper parsing of these catalogs, and thus make the local DTDs undiscoverable.

With the patch provided in my bug report, I can now happily use /etc/xml/catalog with saxon and never hit the network when transforming XHTML files:


java -cp /usr/share/java/xml-commons-resolver-1.1.jar:path-to-saxon8/saxon8.jar -Dxml.catalog.files=/etc/xml/catalog -Dxml.catalog.verbosity=1 net.sf.saxon.Transform -novw -r org.apache.xml.resolver.tools.CatalogResolver         -x org.apache.xml.resolver.tools.ResolvingXMLReader         -y org.apache.xml.resolver.tools.ResolvingXMLReader XMLFile XSLTFile

by Dom at November 07, 2009 11:02 AM

November 06, 2009

Ivan's Blog

ISWC2009 2-3


Second day

In fact, there is much less to say… In the morning I was on two workshops; I was at the Uncertainty Reasoning on the SW one for a while, but then I was asked to participate at a panel at the Semantics for the Rest of Us one, so I had to switch. This was a bit unfortunate, because I could not really ‘dive in’ to any of the two. And my afternoon was taken up by ‘networking’, catching up with some people on many many issues that are not worth blogging (yet?).

I listened to Kathryn Laskey’s presentation on how to combine probability theory in the mathematical sense (the good old Kolmogorov axiomatic theory on probability that I learned at university in a distant past…) with first order logic. I cannot claim to have really understood all the details but it made me curious enough to put reading her paper on my to do list…

As for the panel “Little vs Large Semantics: What’s next for the Semantic Web languages?”, with Leigh, Kendall and Ora on the panel besides me… it was not that exciting, I must admit. Maybe the main message I take away from it was the passionate request of Chris Welty to re-open RDF (see also Pat’s keynote below!).

Third day (well, first real conference day)

Preamble: I would have wanted to add links to papers. And I couldn’t: I have not found the papers on the Web. Neither on Springer’s site nor elsewhere. I may have missed a reference somewhere, if somebody knows then tell me. But if the papers are not available, I think it is a shame…

The conference began with a keynote of Pat Hayes. Entertaining and also thought provoking; Pat is a great speaker. What really interested me is his talk on ‘RDF Redux’; I was actually anxious to listen to that one at SemTech last June but he had to call this off back then. So he repeated it here. This is typically the kind of talk that needs more thinking afterwards to understand it (and Pat has promised to write it down!), but he essentially proposed to re-think and re-do some of the fundamentals of RDF semantics. Instead of set-based model theory which we have today, and which makes the treatment of b-nodes, shall we say, a bit complicated (some would use harsher words:-) we should consider RDF graphs as ‘things’ on a ’surface’ (think of it as a real surface on a sheet of paper) and b-nodes are just ’scratches on that  surface’. (A bit like ‘context’ of a graph?) Because these surfaces are different from one graph to the other, when a merge occurs then in fact a new surface is created where the unified graph is put, and the issue of b-nodes becomes natural (instead of the ‘renaming’ procedure that the current semantics document describes). Pat claims that the whole semantics could be re-written that way and none of the current RDF implementations would change. But one can go one step further: there may be different kinds of surfaces (eg, negations) and surfaces can have a name (a bit like named graphs) and all can be put together to provide a powerful semantics for these entities. His further claim was that such an extended semantics of RDF could be powerful enough to describe, conceptually, RDFS or even OWL, ie, the semantics should not be layered any more.

No way I would accept all this argumentation on face value:-), so I have to think about this and, mainly, read whatever Pat may want to write down to understand it. In the meantime, I may have to look into the concepts of conceptual graphs, and the Peircian notation of logic that Pat referred to as inspiration…

A more general take away (see also Chris’ remark above): maybe it is time to look into RDF again? A scary thought. Touching to something that is fundamental on the SW has to be done with extreme care… We will see.

There were two papers in the same session that were very close in subject and topic: one of Jesse Weaver and Jim Hendler on the parallel materialization of RDFS graphs and the one of Jacopo Urbani et al on using MapReduce for RDFS reasoning. (Sigh…, this is where I would like to put a reference!) Both aimed at similar challenges, namely the materialization of RDFS inference results of a graph using parallel computing methods. And there was one more similarity: both had some sort of a classification of the rules in the rule set described in the RDF Semantics document to help improving the processing. (Eg, to analyze which rules should be duplicated among processing nodes and which one can be handled without, or which one need a special treatment for a map-reduce pair). It seems that it would be worthwhile to see if some of these classifications (‘ontology rules’ and the like) could be extended to OWL 2 RL (Jesse Weaver told me afterwards that they want to look into this).  But, to put things into perspective: we are the points when billions of triples can be expanded with relative ease. Who would have thought a few years ago? There was also a remark on one of Jesse’s slide (I do not remember the exact wording) which said that RDFS is insanely parallelizable:-)  It was a really interesting session.

The SW in use session included  a paper from Landong Zuo et al “Supporting multi-view network analysis to understand company value chains”.  Integrating a bunch of data in the UK on companies, integrating them in an RDF store, and let users get information on the ‘value chains’, ie, how companies relate to one another as producers/consumers. Technically, the interesting point was the fact that users had the possibility to interactively add new relationships, new classifications to the system, essentially new rules that could be evaluated. The whole system seemed to be a really cool, a well engineered and well functioning machinery. As the speaker put it, although all conclusions drawn from the system could be found by the users by analyzing databases, but it would take weeks to do what this system can give them in a few minutes. This is exactly the kind of message we need for the outside  world about the usefulness of Semantic Web technologies.

On another session Martin Szomszor presented an experiment they conducted at the ESWC conference, combining RFID-based personal badges with an underlying SW system. The resulting system could be used to show personal contacts among delegates, could help people find others with similar interest, could retrace later whom one met at what point (“I remember talking to that chap, but I do not remember his name!”), etc. Lots of privacy issues, for example, but I would have liked to see that in practice, that is for sure!

Stéphane Corlosquet’s presentation on SW and Drupal was really exciting. I already knew about the plans of Drupal 7 to incorporate RDF management from the start, that all Drupal 7 pages will be annotated via RDFa. The RDFa community has been  fairly excited about that for a while now. But the work done by Stéphane and others provide some additional modules that makes it easy to add a SPARQL endpoint to a Drupal based site easily, to import other RDF content, or to manage the vocabularies used on the pages and the like. They already have such a system running with the current Drupal, but these modules will become part of the standard Drupal 7 module set that one can download from the drupal site. And that is cool.  It significantly lowers the barrier to build Web sites that are prepared to be part of the Linked Data cloud, even if the system administrators are not SW experts. I expect this to open up quite a lot of possibilities…

Off to the next day! More paper and the presentation of the Semantic Web challenge finalists…

Posted in Semantic Web, Work Related Tagged: drupal, parallel computing, probabilistic reasoning, RDF, RDFa, RDFS, SPARQL

by Ivan Herman at November 06, 2009 10:31 AM

Decentralyze - Programming the Data Cloud

Venn diagram of rule system elements, grouped into RIF profiles


This morning, I gave the keynote address (my slides) at RuleML 2009. I assumed the audience would be fairly familiar with rule systems and rule technologies, but not necessarily with RIF, the Semantic Web in general, or my sense of the future of the Semantic Web (for which RIF is important).

Those slides may be boring for some of you. The interesting new bits:

  1. I made a new diagram for the RIF dialects. During the talk, I presented it as a successive reveal: the chaos of rule system features, the BLD grouping, the PRD grouping, and the Core intersection. Here’s the final slide:

    Venn diagram of rule system elements, grouped into RIF profiles

  2. I wanted to convey that the Semantic Web means real change, and I wanted emotional impact, so I took a shotgun approach and enumerated a list of likely data sources that was big enough to have some surprises for most folks; then I moved into a long list of things you could do with that data. Some things on the list were boring (having impact only by showing how long the list is), but some got the desired wide eyes and shocked sounds as people realized this just might happen. I think it went over well.

    My list of types of data we’ll be seeing:

    • From producers: product information
    • From sellers: product and service offerings
    • Customer support (instructions, upgrades, …)
    • Social network (who you trust, who interests you)
    • Personal information shared with friends
    • Public records (financial, legal, political, …)
    • Science (medical, environmental, economic, …)
    • News, Blogs, Public photos, videos
    • Event listings (performances, meetings)
    • Review, opinions, product experiences, preferences
    • Personal location, location history
    • Financial transactions

    Which led into my general scenario: you’re in a store about to buy something, but first you scan it with your phone and look up a little more information about it. What might you look up?

    • Its price at other stores, nearby
    • Its price for delivery, and how how long you’d have to wait
    • Maybe: where it was made, and under what conditions
    • Is it’s producer a good corporate citizen?
    • Does its producer agree with your political views (uh oh)
    • How many houses does its CEO own?
    • Did your spouse/housemate just buy some? Or something like it?
    • How your friends feel about this product
    • How Consumer Reports (or some such service) reviewed it
    • Product liability suits
    • Endorsements
    • Maybe: Payment for Endorsements
    • For electronics: compatibility information
    • For mechanical items: How repairable is it? MTTF, MTTR
    • For food: nutritional information, health benefits and risks
    • Demographics of this brand, these products

That’s all for now. I saw the Bellagio Fountain from my 29th floor
hotel room late last night, but I’d like to see it up close.

by sandhawke at November 06, 2009 01:35 AM

November 05, 2009

MWI Team Blog

W3C Cheatsheet for developers

Screenshot of the W3C Cheatsheet on a phone

I’ve been working over the past few weeks on a nifty little tool that summarizes a number of W3C technologies, including the Mobile Web Best Practices, in a mobile-friendly format, called the W3C Cheatsheet.

See my post in the W3C blog to learn more about it, and send your feedback!

by Dominique Hazael-Massieux at November 05, 2009 10:00 PM

W3C Q&A Weblog

W3C Cheatsheet for developers

Yesterday, as part of the W3C Technical Plenary day, I got the opportunity to introduce a new tool that I had been working on over the past few weeks, the W3C Cheatsheet for Web developers.

Screenshot of the W3C Cheatsheet on a phone

This cheatsheet aims at providing in a very compact and mobile-friendly format a compilation of useful knowledge extracted from W3C specifications — at this time, CSS, HTML, SVG and XPath —, completed by summaries of guidelines developed at W3C, in particular the WCAG2 accessibility guidelines, the Mobile Web Best Practices, and a number of internationalization tips.

Its main feature is a lookup search box, where one can start typing a keyword and get a list of matching properties/elements/attributes/functions in the above-mentioned specifications, and further details on those when selecting the one of interest.

The early feedback received both from TPAC participants after the demo and from the microblogging community has been really positive and makes me optimistic that this tool is filling a useful role.

This is very much a first release, and there are many aspects that will likely need improvements over time, in particular:

  • I would like the cheatsheet to cover more content — from specifications not yet released as standards as well as from topics not yet covered (e.g. JavaScript interfaces),
  • some people have reported that there might be accessibility problems with the current interface, that I’m eager to fix once I get specific bug reports,
  • the cheatsheet doesn’t work in IE6 (and probably even in later versions), and it would be nice to make it work at least somewhat there.

The code behind the cheatsheet is already publicly available, and I’m hoping others will be interested to join me in developing this tool — I’m fully aware that the first thing that will need to get others involved will be some documentation on the architecture and data formats used in the cheatsheet, and I’m thus hoping to work on that in the upcoming few weeks.

In the meantime, I very much welcome bug reports and suggestions for improvements, either by private email to me (dom@w3.org) or preferably to the publicly archived mailing list public-qa-dev@w3.org.

by Dominique Hazaël-Massieux at November 05, 2009 09:49 PM

November 02, 2009

Ivan's Blog

Promise hold (NYT and the LOD)


I was at the SemTech conference in June when Evan Sandhaus from the New Your Times gave a keynote and when he announced that the NYT would gradually publish many of their data as Linked Data using Semantic Web technologies. Unfortunately, I had to leave on the last day of the ISWC2009 last week when they announced to keep their promise, and release the first 5,000 subject headings tags to the LOD. Which is really great news.

I remember Evan saying in Santa Clara (maybe privately, I do not remember that detail) that they are newcomers in this area, and it will be difficult to get it right (and, well, there are bugs, as, for example, Eric Hellman or Richard Cyganiak pointed out in their respective blogs). But I think we should really applaud when such a promise is held…

Posted in Semantic Web, Work Related Tagged: Linked Data, new york times

by Ivan Herman at November 02, 2009 10:41 AM

October 31, 2009

Decentralyze - Programming the Data Cloud

sandhawke


Here’s what I think should be standardized at some point, soon, in the Semantic Web infrastructure. These items are at various levels of maturity; some are probably ready for a W3C Working Group right now, while others are in need of research. They are mostly orthogonal and most can be handled in independent efforts. (I would lean against forming a single RDF Working Group to handle all of this; that would be slower, I think.)

To be clear, when I say “RDF 2″ I mean it like OWL 2: an important step forward, but still compatible with version 1. I’m not interested in breaking any existing RDF systems, or even in causing their users significant annoyance. In some traditions, where the major version number is only incremented for incompatible changes, this would be called a 1.1 release. In contrast, at W3C we normally signal a major, incompatible change by changing the name, not the version number. (And we rarely do that: the closest I can think of is CSS->XSL, PICS->POWDER, and HTML->XHTML). The nice thing about using a different name is it makes clear that users each decide whether to switch, and the older design might live on and even win in the end. So if you want to make deep, incompatible changes to RDF, please pick a new name for what you’re proposing, and don’t assume everyone will switch.

This is partially a trip report for ISWC, because the presentations and especially the hallway and lounge conversations helped me think about all this.

Note that although I work for W3C, this is certainly not a statement of what W3C will do next. It’s not my decision, and even if it were, there would be a lot of community discussion first. This is just my own opinion, subject to change after a little more sleep. Formally the decisions about how to allocate W3C resources among the different possible standards efforts are made by W3C management guided by the the folks who provide those resources, via their representatives on the Advisory Committee (AC). If the direction of the W3C is important to you or your business, it may be worthwhile to join and participate in that process.

1. RDF and XML interoperation

There’s a pretty big divide between RDF and XML in the real world. It’s a bit like any divide between different programming languages or different operating systems: users have to pick which technology family to adopt and invest in. It’s hard to switch, later, because of all the investment in tools, built systems, educations, and even socially networks. (People who use some technology build social and professional relationships other people who use the same technology. Thus we have an XML community, an RDF community, etc. Few people are motivated to be in both communities.)

I think we should have better tools for bridging the gap, technologically, so that when data is published in XML, it’s easy for RDF consumers to use it, and when the data is published in RDF, it’s easy for XML consumers to use it.

The leading W3C answer is GRDDL, which I think is pretty good, but could use some love. I’d like to see support for the transforms being in Javascript, which I think is probably the dominant language these days for writing code that’s going to run on someone else’s computer. It certainly has a bigger community than XSLT. I’d probably support Java bytecode, too.

I would also like to see some way to support third-party GRDDL, where the transform is provided by someone not associated with either the data provider or data consumer. Nova Spivack gave a keynote where he talked about this feature of T2. They’re focused on HTML not XML, but the solution is probably the same.

Beyond GRDDL, I think there’s room for a special data format that bridges the gap. I’ve called it “rigid rdf” or “type-tagged xml” in the past: it’s a sub-language of RDF/XML, or a style of writing XML, which can be read by RDF/XML parsers and is also amenable to validation and processing using XML schemas. Basically you take away all choices one has in serializing RDF/XML.

I note the The Cambridge Communiqué is ten years old, this month. It proposed schema annotation as an approach, and that’s not a bad one, either. I haven’t heard of anyone working on it recently, but maybe that will change if the XML community starts to see more need to export RDF.

Amusingly, while I was talking to Gary Katz from MarkLogic about this, he mentioned XSPARQL as a possible solution, and I pointed out Axel Polleres (xsparql project leader) was sitting right next to us. So, they got to talk about it. XSPARQL doesn’t excite me, personally, because I don’t use either SPARQL or XQuery, but objectively, yes, it might solve the problem for some significant userbase.

2. Linked Data Inference

For me, an essential element of a working Linked Data ecosystem is automatic translation of data between vocabularies. If you provide data about the migration of frogs in one vocabulary, and my tools are looking for it in another one, the infrastructure should (in many cases) be able to translate for us. We need this because we can’t possibly agree on one vocabulary (for any given domain) that we’ll all use for all time. Even if we can agree for now, we’ll want this so that we can migrate to another vocabulary some time in the future.

Inference using OWL (and its subsets like RDFS) provides some of this, but I don’t think it’s enough. RIF fills in some more, but the WG did not think much about this use case, and there’s might be some glue missing. Maybe we can get WG Note out of RIF to help this along.

I’d like us to be clear about first principles: when you’re given an RDF graph, and you’re looking for more information that might be useful, you should dereference the predicate IRIs to learn about what kinds of inference you’re entitled to do. And then, given resources and suitable reasoners, you should do it. That is, the use of particular IRIs as predicates implies certain things, as defined by the IRI’s owner. The graph is invoking certain logics by using those IRIs. (Of course you can always infer things that were not implied, but as among humans, those “inferences” are really just guesses you are making. They have quite a different status from true implications.)

If this is put together properly, and the logics are constructed in the right form, I think we’ll get the dynamic, on demand translation I’m looking for. I imagine RIF could be very useful for this, but reasoner plugins written in Javascript of Java bytecode could be a better solution in some cases.

Some of my thinking here is in my workshop keynote slides, but later conversations with various folks, especially Pat Hayes and TimBL, helped it along. There’s more work to do here. I think it’s pretty small, but crucial.

3. Presentation Syntaxes

RDF, OWL, and RIF all have hideous primary exchange syntaxes and some decent not-W3C-recommended alternative serializations. I’m not really sure what can practically be done here that hasn’t been done.

At very least, I’d like to see a nice RDF-friendly presentation syntax for RIF. A bit like N3, I suppose. I did some work on this; maybe I can finish it up, and/or someone else can run with it.

OWL 2 has 3+n syntaxes, where n is the number of RDF syntaxes we have. Exactly one of those syntaxes is required of all consumers, for interchange. I’ll be interested to see how this plays out in the market.

4. Multi-Graph Syntax

Most systems that work with RDF handle multiple graphs at the same time. Sometimes they do this by storing the triples in a quad store, with the fourth entry being a graph identifier. This works pretty well, and SPARQL supports querying such things.

We don’t have a way to exchange multiple graphs in the same document, however. N3 has graph literals (originally called contexts), and there was some work under the term named graphs, which is kind of the opposite approach.

Personally, I don’t yet understand the use case for interchanging multiple graphs in one document, so I’m not sure where to go with this.

Hmmm. I guess RIF could be used for this. You can write RDF triples as RIF frame facts, and the rif:Document format allows multiple rulesets, each with an optional IRI identifier, in the same document. ETA: RIF also gives you an exchange syntax where you can syntactically put literals in the subject and use bnodes as predicates, if you want. But now you’re technically exchanging RIF Frames instead of RDF Triples.

5. RDF Graph Validation

When writing software that operates on RDF data, it’s really nice to know the shape of the data you’ll find. It’s even nicer, if software can check to see if that’s actually what you got. And if reasoners can work to fill in any missing peices.

I don’t exactly understand how important or unimportant this is. It’s closely related to the Duck Typing debate. Whatever mechanisms make duck typing work (eg exception handling, reflection, side-effect-free programming) probably help folks be okay without graph validation. But I think folks trained on C++/Java or XML Schema would be much happier with RDF if it had this

The easiest solution might be using rigid RDF. One could probably also do it with SPARQL, essentially publishing the graph patterns that will match the data in the expected graphs.

The most interesting and weird approach is to use OWL. Of course, OWL is generally used to express knowledge and reason about some application domain, like books, genes, or battleships. But it’s possible to use OWL to express knowledge about RDF graphs about the application domain. In the first case, you say every book has one or more authors, who are humans. In the second case, you say every book-node-in-a-valid-graph has one or more author links to a human-node in the same graph. At least that’s the general idea. I don’t know if this can actually be made to work, and even if it can, it risks confusing new OWL users about one of the subjects they’re already seriously prone to get wrong.

6. Editorial Issues

Finally, I’d like some portions of the 2004 RDF spec rewritten, to better explain what’s really going on and guide people who aren’t heavily involved in the community. This could just be a Second Edition — no need for RDF 2 — because no implementations changes would be involved.

I’d like us to include some practical advice about when/how to use List/Seq/Bag/Alt, and reification, maybe going so far as to deprecate some of them (IMHO, all but List). Maybe bring in some of the best-practice stuff on publishing and n-ary relations.

I understand Pat Hayes would like to explain blank nodes differently, explicitly introducing the notion of “surfaces” (what I would call knowledge bases, probably). Personally, I’d love to go one step farther and get rid of all “graph” terminology, instead just using N-Triples as the underlying formalism, but I might a minority of one on that.

ETA: Of course we should also change “URI-Reference” to “IRI”, and stuff like that.


Okay, that’s my list. What’s yours? (For long replies, I suggest doing it on your own blog, and using trackback or posting a link here to that posting.) Discussion on semantic-web@w3.org is fine, too.

by sandhawke at October 31, 2009 12:28 AM

October 29, 2009

Ivan's Blog

ISWC2009 4-5


Fourth day

Shame on me, but I missed the morning keynote… I was a bit late arriving to the conference site and I got stuck in a conversation at breakfast. Things happen…

The most notable event in the morning, at least for me, was the SPARQL WG panel. All members of the Working Group (me included) were on the panel and the room was full. I mean, full, people were standing in the back. And I regard that as a success by itself, it shows not only the overall importance of SPARQL, but the real interest around the new version, ie, SPARQL 1.1 (in case you have missed it, the first working draft has just been published a few days ago). Lee Feigenbaum (co-chair of the group) gave a quick overview of the new features and then questions came.

The difficulty of the SPARQL 1.1 work is that it has to find a balance between what is realistic to standardize in a relatively short time frame and what could be good to see in a new query language. As a consequence, there are features that the community has discussed but have not made it into the document, or only in a simple format. That came up during the discussion but I had the impression that the audience, by and large, understood this balance. Actually, for some, the set of new features were even too much for an efficient implementation. I have the feeling that  the WG will have to publish a separate conformance document (a bit like OWL 2 has), because there is a certain confusion on whether a conforming SPARQL implementation will have to implement, say, update or inference regimes or not. That clearly came up through the questions. Anyway, remember one email address (yes, it is a bit of a mouthful): public-rdf-dawg-comments@w3.org this is where comments have to be sent on SPARQL 1.1!

I chaired a session on the use track in the afternoon.  The paper of Daniel Elenius et al on reasoning about resources (for military exercises) was interesting to me because it was based on reasoning with relatively large OWL ontologies plus rules. The OWL ’side’ was not very complex (Daniel referred to DLP, today I would say probably OWL 2 RL) but extended with extra rules. What this shows that when RIF will be finished and published, the combination of OWL with RIF may become very important for tons of practical applications. (As an aside, a nice little joke from Daniel: what is the system used by the military today when planning for exercises? The system is called BOGSAT. It stands for ‘Bunch Of Guys Sitting Around a Table’…)

Roland Stuhmer gave a very different style presentation on how user events (clicks, combination of clicks, etc) can be collected, categorized, and integrated into an application, analyzed with some rules for, eg, targeted ads. The system is based on harvesting not only the structure of the Web page, but annotations appearing in the Web page via RDFa. The result is an RDF structure describing the events that can be sent to a server, analyzed locally, distributed, etc. Nice usage of RDFa, but also important to have a Javascript API that can retrieve the RDF triplets from the RDFa structure attached to a specific node. (B.t.w., the old graphics standards of the 80’s and 90’s, called GKS or PHIGS, had notions of combined event structures with different event types. I do not remember all the details any more, but may it be worth looking at those again in a modern setting?)

Personally, the highlight of the day was the presentation of the semantic web challenge finalists. I was member of the jury, which meant that I had to review the submissions in advance and we had two very enjoyable discussions with the rest of the jury on the submissions. We had the first selection the day before, and this time all finalists gave their presentations and demos. And it was a tough task to choose (that is why we had such long discussions:-) because, well, the submissions were great overall. I do not really want to analyze each of the entries; I do not think it would be appropriate for me in this position. But the winner entry for the challenge, namely TrialX, really made a great impression on me. In short, the application is a consumer-centric tool through which patients can find matching clinical trials where they want to participate; it also helps those who organize those trials, etc. It is some sort of a matchmaking tool using all kinds of medical ontologies and vocabularies, public health record data and the like. We should realize the importance of this: here is a great Semantic Web application, winner of the challenge, which is really an application, not only demonstration, already deployed on the Web (soon as an iPhone app, too), and, to be a bit dramatic, may (and possibly has already) save lives. What else to we want as a proof that this technology is not only an academic exercise any more?

Fifth day

Only a partial day for me, as far as the conference goes, because I had to fly out before the end… But I could listen to the last keynote of the conference, ie, that of Nova Spivack.

Not surprisingly, Nova talked about Twine-2, a.k.a. T2. I did not really know what T2 was to be, I only heard that Twine, ie, T1, is moribund. As Nova acknowledged, it is too complicated, it is too hard for users to really figure it out; in fact, most of the users used it for search. Which is not the strongest feature of T1 in the first place.

So T2 is (well, will be) all about semantically backed search. It semantically indexes the Web, with an attempt to extract semantic information from the pages. The user interface would then be some sort of, essentially, faceted interface that would automatically classify the search hit results into different tabs; the user can use these tabs, drill down along other categories, etc. So far nothing radically new, though the user interface Nova showed was indeed very clean and nice. All this is done, internally, via vocabularies/ontologies, using RDF, RDFS, or OWL.

The interesting aspect of T2 (at least as far as I am concerned) is the incorporation of collective knowledge. First of all, T2 will include a system whereby users can add vocabularies that T2 will use in categorization. Users can get back those ontologies in OWL/RDF, they can improve them, etc. The other tool they will provide is a means to help semantically index pages that are, by themselves, not semantically annotated. This can be done via a Firefox extension; users can identify parts of the web pages (I presume, essentially, the DOM nodes) and associate these with classes of specific ontologies. The extension produces an XSLT transformation that can be sent back to the T2 system. Some social mechanism should of course be set up (eg, webmasters annotating their own pages should get a higher priority than third party annotators) but, essentially, it is some sort of a GRDDL transformation by proxy: T2 will have information on how to find transformation to semantically index specific pages without requiring the modification of the pages themselves (in contrast to GRDDL where such transformation is to be referred to from the page itself).

Of course, the system was a bit controversial in this community; indeed, it was not clear whether T2 would make use of the semantic information that do exist in pages already (microformats, RDFa, …) let alone the Linked Open Data information that is already out there. When asked, Nova did not seem to give a clear answer though, to be fair, he did not specifically say no and he also said that the semantic index might be put back to the public in the form of linked data. To be decided. It is also not fully clear whether those proxy-GRDDL transformations would be available for the community at large (hopefully the answer is yes…). It will be interesting to see how it plays out (T2 comes out in beta sometimes early 2010). Certainly a project to keep an eye on.

From a slightly more general point of view it is also interesting to note that two out of the three Semantic Challenge winners are also semantic search engines with different user interfaces (though sig.ma and VisiNav definitely do use the LOD cloud, no question there…). Definitely an area on the move!

I had the time and, frankly, the energy to really listen to only one more paper in the regular track, namely the paper on functions of RDF language elements, by Bernhard Schandl. A nice idea: imagine a traditional spreadsheet, where each cell is a collection of resources from an RDF Graph, or functions that can manipulate those resources (extract information, produce new set of resources, etc). Just like a spreadsheet, if you modify the underlying graph, ie, the resources in a cell, everything is automatically recalculated. Because, just like for a spreadsheet, a function can refer to the result of another function in another cell, one can do fairly complicated transformation and information extraction quite easily. Neat idea, to be tried out from their site.

That is it for ISWC2009. I obviously missed a lot of papers, partly because social life and hallway conversations sometimes had the upper hand, and sometimes simply because there were too many parallel sessions. But it was definitely an enriching week… See you all, hopefully, at ISWC2010, in Shanghai!

Posted in Semantic Web, Work Related Tagged: health care, OWL RL, RDFa, Rules, semantic search, SPARQL, spreadsheet

by Ivan Herman at October 29, 2009 11:59 PM

W3C Q&A Weblog

W3C Developer Gathering Next Week; Registration Closes Today

Next week's W3C Developer Gathering will bring together some great speakers:

  • Leslie Daigle (ISOC) on Internet Ecosystem Health
  • Mark Davis (Unicode Consortium) on controversies around international domain names
  • Brendan Eich (Mozilla) on "ECMA Harmony and the Future of JavaScript"
  • Fantasai on CSS, with help and demos from the "CSS Strike Force": Tab Atkins, David Baron, Simon Fraser, and Sylvain Galineau
  • Philippe Le Hégaret (W3C) on community-built browser test suites.
  • Kevin Marks (OWF) on OpenID, OAuth, OpenSocial
  • Arun Ranganathan (Mozilla) on what's new in APIs

I will be hosting the gathering (5 November in the afternoon). We've planned for some fun give-aways to be revealed at the meeting. Registration closes today, although we will admin walk-ins at a higher rate next week.

If you can't join us in person, you can follow the meeting on IRC; more details are available on the meeting page.

I hope you will join us next week.

by Ian Jacobs at October 29, 2009 03:30 PM

October 27, 2009

Advogato blog for connolly

27 Oct 2009

Roach motel indeed; sidekick XMLRPC service is no more

I went back to the most recent (2008-03) of my calendar sync items in the DIG breadcrumbs research blog and got hipwsgi.py from palmagent fired up, only to get "Connection refused" from pimapi.prod1.dngr.net.

Uh-oh.

I thought I could write off the sidekick altogether at that point, but:

  1. Organizing weekend todo lists works with the sidekick in a way that I haven't managed to duplicate: lists on paper don't sort themselves by priority and due date; Google calendar tasks don't sync with the sidekick (nor with any usable android app that I could find).
  2. How do I call my brother from the car? Using a mobile phone without my contacts would be like using the Web without DNS.
  3. Android's crummy appointment notification reminded me how much I rely on a gizmo to beep when I'm supposed to stop coding and go to my appointment with the Doctor. Delegating this to a gizmo goes back to ~1996 when I first got a Psion PDA. (see some python code for psion files)

I'm not sure how I'm going to muddle thru this mix of google calendar/contacts stuff and sidekick phone... maybe I can use SMS reminders for calendar stuff, but you never know how long those things are going to take to be delivered; T-Mobile seems to deliver them 13 hours later in some cases.

For now, I'm going to pickle some state...

#swig notes

old calendar notes/links, circa 1999/2000

palmagent code: r423:4a5a8b2d237c 2009-05-01)

repository of sidekick data from palmagent/hipwsgi.py: 32:31a84807d214 2009-02-26

another repository of my PIM data: 596:6faa7311f865 2009-04-2, 595:b20e1f7fa468 2008-09-10

October 27, 2009 05:27 PM

Ivan's Blog

ISWC2009 I.


20091026046This year’s ISWC is held in Chantilly, Virginia. In a nice conference building in a beautiful park with autumn colours that, for reasons I do not really know, is always much more striking and amazing in America than in Europe. It is a bit of a pity that it is so far from Washington but, well, you can’t get it all…

First day: tutorials.

(For me, because there were also a bunch of workshops.) In the morning I was at the tutorial on how to consume Linked Open Data, by Juan Sequeda, Jammie Taylor, Patrick Sinclair, and Olaf Hartig; in the afternoon I went to the one on legal and social frameworks for sharing data on the Web, by Leigh Dodds, Jordan Hatcher, Tom Heath, and Kaitlin Thaney.

Juan and his  friends had actually a difficult task, and that became clear right at the start during the intro of Juan: part of the audience did not really know what LOD was all about, whereas there were also others who were, shall we say, old timers on the subject. I think the speakers did a really good job in navigating through these constraints, making short introductions to what LOD is all about but talking about issues and showing examples that were interesting for all of us. Kudos to that. Issues were raised by the audience that were really to-the-point (who should create sameAs,  links, how trustworthy are they, how to choose vocabularies and how they map to one another, etc) and, in his closing slides, Juan actually gave a list of the open  R&D issues in LOD. Worth looking at those (and no reason to repeat the list here…). B.t.w., the slides of the tutorial are on line.

One very interesting technology I heard about that, shame on me, but I did know was a tool based on a traversal based execution scheme for SPARQL called sqin.  Olaf did a presentation on that. What essentially happens is  as follows. At the beginning the default graph of the SPARQL query is empty. However, the system would systematically fetch RDF triples by dereferencing URI-s in the query pattern, adding those to the default graph. The query is matched against it, some variable will match thereby ‘adding’ new URIs to the pattern. And the process starts again, possibly yielding a complete solution (or more) to the original query. At the end of the process, solutions will be found on the Web, even if the system itself does not have any ‘real’ data behind it at the start. Of course, no one can secure that all solutions will be found, and you need some ’seed’ URI-s in the original query pattern, but it nevertheless looks like a very powerful tool to explore, say, the LOD.  Very interesting!

Then there were some examples on how LOD is used. Jammie talked about Freebase, and how Freebase is, in fact, a way for everybody to easily add information to the LOD (after all, Freebase works like a wiki, and all the data is reflected on the LOD).  He also had a very important message that is worth repeating (go to his slides for the rest): it takes very little effort to add a republishing capability to your triples store based application, thereby extending the general LOD. So… do it! This is how the system evolves…

Patrick described a quite geeky system that the BBC folks have developed (hopefully will become public soon): take the BBC’s musical data in RDF (which is available), plus the LOD cloud, plus… an IRC bot. What you get is an IRC channel which will pick up data on music, including the sound tracks, photos, etc, and display it on the machine. I presume you  can give orders and preferences through the IRC. Obviously a geeky stuff not for the masses:-) but shows what you can do…

The afternoon tutorial on the Legal and Social frameworks was of course very different. I think one of the many, but maybe the most important aspect of this tutorial is that… it took place! This may sound a bit strange but it is important for all our community to realize that we will have issues around copyright, licensing, waivers, etc, when it comes to the Web of Data, whether we like these issue or not. Tutorials like this, written notes and information, etc, are essential. Let us face it: most of us do not understand the details of the legal issues. So I was simply listening and trying to absorb what I heard…

I do not want to repeat the details of what I heard here; one thing I learned over the years is that I should leave legal argumentations and descriptions to those who really understand that. Ie, look at the slides. It is worth it. But just to show the complexities: I did not know or fully realize that there are major differences what can or cannot be copyrighted among countries: for example, a phone book cannot be copyrighted in the US or Europe, but can in Australia. That the seemingly simple notion of ‘attribution’ can, in fact, become an endless pit when it comes to data and the queries thereof (eg, if I have a filter in a query that results in data, should I give an attribution to the fact that were, in fact, filtered out?). Etc.

There is also a takeaway message for me (though it may be quite trivial) among the things I learned. Tom showed some practical examples on how can one add, say, licensing information to data by adding some RDF triples. However, for a larger data set the licensing may be different within the dataset. Eg, if you retrieve data from somewhere, and you enrich it with additional metadata, the metadata itself may have a different licensing (it is yours) than the data that you use (which may have its own licence). What this means is that when you organize your data internally, you should think about the licensing information you will add well in advance: organize your URI-s accordingly, for example. If you don’t, and you want to add license at the end, you might find yourself in trouble! Sounds like a simple message, but it is important. (Reminds me of what accessibility people always say: if you take accessibility issues into account right at the beginning when you build up a Web site, it is not complicated; but if you have to add accessibility features after the facts, it may become hell…)

By the way, Leigh has made a kind of an overview of the current ‘blobs’ on the LOD cloud to see whether any kind of licensing information is available or not. He has an overview of the results in his slides. The main fact is: the majority of data sets has no information whatsoever (or, at least, nothing that can be found in about 10 minutes)…

It was a good day. Looking forward to the rest.

Posted in Semantic Web, Work Related Tagged: creative commons, data commons, Linked Data, Linked Data Cloud, science commons, SPARQL

by Ivan Herman at October 27, 2009 03:16 PM

October 22, 2009

Advogato blog for connolly

22 Oct 2009

G1 is so disappointing, I'm going back to the sidekick. Yes, the sidekick

I've been a happy sidekick user since December 2002. In fact, what really got me interested in the android/G1 was that Andy Rubin, the danger/sidekick lead designer, was working on it.

My first few minutes with the G1 were lots of fun: google street view with the accelerometer blew me away and I had downloaded a dozen apps in no time.

But while technically all these apps can do everything all at the same time, in practice, the experience sucks. When I have a thought to capture, as Nielsen's research shows, if I don't get .1 second response time from the home button, I become conscious of the mechanics, and after 1 second, I lose my train of thought.

Other critical day-to-day features such as "get my attention when I have an appointment or a text message comes in" don't work either. The G1 gives one little beep and puts an icon in the notification bar and then goes idle. If I happen to be in noisy traffic at the time, I lose. The sidekick continues to beep every 2 minutes, so that when I eventually get somewhere quiet, I'll notice.

And speaking of quiet, sound profile management on the G1 sucks. To put the phone in silent mode, you can hold down the red/end button until a menu appears. In big letters, it says "Silent Mode"; then it tiny letters under that, it says "sound is: on". Details, details, people!

Then, when you flip it to "sound is: off", it goes silent, but it doesn't vibrate. To put it in vibrate mode, you use the button on the side that controls the ringer volume, but you have to look at the screen to see when you've held it long enough. There's no one reliable gesture sequence for managing sound profiles.

Also, I'm forever forgetting to take the G1 back *out* of silent mode. I'm spoiled by the sidekick's scheduled sound profiles; every night at 11pm, it goes into "alarm clock" profile (where appointments that I set up ring loudly but incoming messages from others don't) and every morning at 8am, it goes back to normal mode. So even if I forget to take it out of silent mode, it's all set to go the next morning.

There's an award-winning 3rd party app (locale) for managing not just sound profiles but all sorts of other stuff like wifi and gps power-saving settings... and it's configurable not just by time, but also by GPS location, nearby wifi stations, and such. But... it doesn't work. That is: I couldn't recreate the simple "be quiet at night" configuration from the sidekick. Plus, it seemed to gunk up the performance of the gizmo.

As to the roach motel, no, I don't trust t-mobile/danger/Microsoft to manage my data; I keep my own copy using some homebrew software that uses their XMLRPC interface (in fact, I keep multiple copies sync'd with hg/mercurial).

The t-mobile web interface isn't nearly as nice as google's; that's probably the main thing I'll miss as I switch back to the sidekick. That and google maps (though the GPS stopped working about the 3rd time I dropped the G1).

The backlight on the screen was intermittent for a while, but power-cycling it would bring it back. Then, with the recent software update, the screen is dark all the time. So it's a choice between replacing the G1 and going back to the sidekick. (I'm keeping an eye on the palm pre and the iPhone is ubiquitous, but I extended my t-mobile contract by two years when I bought the G1 in Feb.)

Well, I just called T-Mobile customer service and asked them to switch me to the sidekick data plan. I guess we'll see how long it lasts.

see also: The Forgotten Sidekick


tags: mobile, android

p.s. older WearableGizmo notes suffer from in-progress get-out-of-Zope migration.

October 22, 2009 02:36 PM

October 16, 2009

MWI Team Blog

Device APIs on the way

Back in June, I noted that a new group that would work on Javascript APIs to access device features (such as a camera, an addressbook, a calendar, etc.) had been proposed for review to W3C Members.

Since then, not only was the group approved and started, but we even got our first publication out: a Working Group note describing the expected requirements for these device APIs.

Of course, that document may seem a bit abstract at a first glance: you'll see no API defined in there, nothing with which to play.

But if you think Device APIs are a great opportunity for the Web platform (on mobile and elsewhere), I strongly encourage you to take a look at that document and check if the requirements highlighted there match what you know you'll need from these APIs - and if they don't, please let the Working Group know!

by Dominique Hazael-Massieux at October 16, 2009 02:40 PM

Ivan's Blog

Seduce with free services?


I ran into this two times in a week. I hope it is just a coincidence…

The story is simple. You find some service on the Web which looks nice and helpful. There are various options: you may take a minimal service, which is free of charge, or you can also choose extra services for a fee. It sounds like a decent choice: if the minimal service fits your needs, you are happy, if you need more, you pay something. I presume we all use services like that.

But then… if you take the free option, you may get a mail after 2-3 years’ of  usage saying that sorry, the free service is discontinued next month; you are welcome to upgrade for the paying service, otherwise, well, good bye. As I said I got this type of mail twice in a week: one from a service giving a minimal synchronization of my phone’s calendar with Google’s, the other providing a simple email certificate for signing my mails. On a matter of principle I will not upgrade; I do not find this approach really acceptable.

So… will Gmail, WordPress, or other similar services decide that they have attracted enough customers, they can now start charging? As I said, I hope this was just a coincidence and not some sort of a general direction…

Posted in General, Private, Social aspects, Work Related

by Ivan Herman at October 16, 2009 07:01 AM

October 15, 2009

W3C Q&A Weblog

W3C Site Bugs!

We've received a number of helpful bug reports about the new site. I thought I should list a few here so that we can refer to them. We are working to have these particularly tricky ones fixed as quickly as possible.

  • In IE, if you select mobile or print modes, you can't get back to desktop mode.
  • In Safari, even if you select "desktop" mode you get mobile mode at narrow browser widths. Also, if you select "mobile" mode you get a mix of mobile and desktop at wider browser window widths.
  • In some browsers, you can't expand the expandable content sections; they snap back shut.

We are working on these fixes. I also welcome fix suggestions from the community. Thanks again to those who have sent comments to site-comments@w3.org

by Ian Jacobs at October 15, 2009 02:25 PM

October 13, 2009

W3C Q&A Weblog

W3C Site Launch

Today we launched the new W3C. We've been working on it for a while, so I'm happy that it is seeing the light of day.

Comments are flowing in, some touching on issues we identified when we announced the beta version. Here are a few:

  1. Is the CSS invalid? The CSS does not validate with the W3C CSS validator. We mentioned this as one limitation of the site back in March. As we wrote then, "Because of known interoperability issues, we have accepted to use CSS that does not validate with the CSS validator. Over time we hope to evolve towards valid CSS."
  2. Why do some pages (such as the graphics introduction, though there are others as well) look unfinished? They are; the generic template text is still there from the beta. We decided to launch the site even without all the content we hope to have. We think the site is a significant improvement over the old one, and so prefer to begin using it rather than wait for more content. The site will continue to evolve, and I hope much more easily. We are asking staff, Working Groups, and the community to help out and provide content. We'd love your help, and are happy to acknowledge your contributions on the pages. Let us know at site-comment@w3.org.
  3. Some of the rewritten Recommendations have formatting bugs. Unfortunately, one of our processing passes modified the markup and we didn't realize it; we'll be fixing those problems in place. For the moment we are only using the new templates for Recommendations (old and new). As we gain more experience and resolve formatting issues, we expect to apply the new templates to more publications. One advantage of the new approach will be that it will be easier to tell right up front when a specification has been superseded by another.

There are also a few rendering issues we are aware of and plan to fix over the next few days. Please tell us about any issues you encounter on site-comments@w3.org. Please be sure to tell us the URI of the page in question and what browser and OS you are using.

by Ian Jacobs at October 13, 2009 11:57 PM

October 09, 2009

Don't call me DOM

Web 2.0 illustrated

I am by no mean good at making graphics, but I very much like the idea of turning complex ideas into easier-to-grasp graphics.

As I was invited to talk about “Web 2.0″ a month ago at the WITFOR conference, I wanted to use a graphic that would illustrate Tim O’Reilly’s definition of Web 2.0 in 7 points.

I started to look for existing illustrations that I could re-use, but while there are many illustrations of what Web 2.0 is in general, I didn’t find any that focused on Tim’s “official” definition; since a big part of the message I wanted to convey was that Web 2.0 was not (only) a buzzword but was actually a fairly well-defined concept, I couldn’t just re-use any of these vague illustrations.

So I came up with the following illustrations (thanks to InkScape and OpenClipArt) that I’m releasing under a creative common license in the hope that they can be re-used and improved.

Web 1.0 Illustrated
“Web 1.0″ Illustrated by Dominique Hazael-Massieux
licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Creative Commons License (original SVG)

Web 1.0 Illustrated
Web 2.0 Illustrated by Dominique Hazael-Massieux
licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States License.
Creative Commons License (original SVG)

by Dom at October 09, 2009 11:18 AM

Prezi vs JessyInk

I gave a couple of talks over the past month, where I chose not to use the W3C-traditional HTML-based slides (using Slidy), but instead to use a more graphical approach, using two different tools:

I’m summarizing below my experiences with both these tools.

Prezi

Prezi is a Flash-based tool, that allows both to build and show 2D-presentations.

I stumbled upon Prezi a few months ago, and was rather impressed by the type of presentations it allows to build: instead of working on slides that replace each other as you progress in the presentation, it offers an infinite canvas on which you zoom-in and out, translate and rotate during the presentation.

Beyond the rather impressive effects it creates when navigating through the presentation, I found it a rather inspiring writing model too: instead of organizing a presentation as a linear succession of topics that are all on the same level, it encourages thinking of topics as a map where you would focus on some points while keeping the big picture available.

I was rather pleased by what I managed to build for my presentation on Web 2.0, and based on the positive feedback I got after the presentations, I think the “slides” themselves helped carry the message I wanted to carry – although I could only really present half of what I wanted at that occasion since my alloted time was cut in half the minute before I went on stage. Of course, on a topic such as “Web 2.0″, it certainly helped to add a bit of bells and whistles to the supporting material…

The editor that Prezi provides is quite good, and I found it fairly simple to learn how to use it after watching a couple of the Prezi-based tutorials provided on the site; there are still some rough edges, and some limitations into what you can do with that editor – in particular, you cannot animate a specific part of the canvas e.g. to make some information appear as you progress in your presentation. Some of it might be a design decision to keep the tool simple, and I expect the rest might get fixed over time since the company Prezi seems to be getting some good level of funding.

There are unfortunately some pretty big problems with Prezi that makes it difficult for me to consider using it on a long term basis:

  • it’s Flash-based, which means it relies on a non-standard technology, and for which it is hard to say how long it will remain readable; I don’t expect that many of my presentations will need to remain readable for a very long time, but it certainly doesn’t feel good to invest times in things whose lifetime is rather unclear;
  • the only local export you can get from the tool is a Windows executable; presumably that executable mostly runs a Flash object, but that makes it even less good for interoperability and preservability
  • since all of it is a blackbox, reusing elements from one presentation to the other, or re-using elements from somebody else presentation won’t work
  • I have no idea if anything has been invested to make the resulting Flash accessible, but given the way the editor works, I strongly suspects that it is not; this also means that the content of my presentations have very little chances to be indexed by search engines.
  • I didn’t find a way to make links from part of the presentation; while you wouldn’t necessarily follow links during the presentation itself, I have always found it a great assert for those that choose to look at your slides again after the presentation.

All in all, these problems appeared to be sufficiently big for me to explore other solutions that would allow to build the same kind of presentations, but without these limitations.

JessyInk/InkScape

The first time I heard about Prezi, I started looking for a possible equivalent in SVG, and I discovered that JessyInk was a pretty good candidate: it combines a Javascript library that deals with enabling simple navigation through a SVG document according to some conventions, with an extension to the fantastic InkScape SVG editor to make it possible to integrate effects, transitions and views from the editor itself.

But it wasn’t until a few weeks ago that I got confirmation that JessyInk now provides the tools needed to build Prezi-like effects, and so, when I was invited to talk on “W3C and the Social Web” at the 10th anniversary of the W3C Italian office I decided to give it a try to build my presentation.

The resulting “slides” were OK, but they clearly remain much more “slide-based” than what I would have done with Prezi.

A big reason for that is that JessyInk still uses slide as the basic unit for its operations – slides are based on Inkscape layers across which you can have transitions. This doesn’t encourage working on a completely 2D-based presentation, even though it allows fairly easily to zoom in and out in a particular slide.

Another problem with JessyInk is that the editing interface it offers is really sub-optimal; InkScape is a great SVG editor, but way too rich for the things you’re likely to need in a presentation-context – that said, having all this power at your hands can also help creativity in building diagrams and graphical illustration that you would likely dismiss as too complex in other environments.

The few user interfaces that the JessyInk extension adds to the editor are modal dialogs, with somewhat awkward wording and organization that don’t really make it intuitive to add effects. I was able to use it without too much trouble, but I don’t think I would feel comfortable recommending it to someone who doesn’t like fiddling with computers as much as I do.

I’m reasonably confident that the resulting slides are more accessible than the Prezi-ones, but I’m also pretty sure I would need to hand edit the (rather big) resulting SVG file to make it really accessible since again, the editor doesn’t provide easy ways to annotate the content you put in there. In particular, I’m not sure how accessible it is to use overlaid layers as based for separating content in slides.

Finally, the resulting slides are not visible for most Internet Explorer users (which cuts off a pretty large population), and I don’t think most search engines properly index SVG content either at this time.

That said, JessyInk offers the possibility to make local animations – one of the thing I was missing in Prezi; the interface doesn’t necessarily help to make the kind of effects I was looking for, but the potential is there and I think I could have gotten there with more time.

In the future

Despite the current defects I’ve found in JessyInk, I think I’m likely to try using it a few more times for presentations I would have to prepare, trying to force myself into reusing the 2D-concept rather than the easy path of using slides/layers.

I doubt I’ll find time to make proper bug reports to the JessyInk project on the usability of their interfaces, esp. as I am not sure how much leeway the Inkscape extension framework leaves there – but here is some hope that I would :)

I think I’ll look into getting reports on the accessibility of the resulting slides, though, since that’s clearly an important concern for me.

One thing I’m considering is turning my existing Prezi presentation into a JessyInk/SVG one, to evaluate how much of Prezi JessyInk can emulate, as well as see if it can help improve it; at the very least, it would provide me with a more reliable alternative of the presentation, but still nicer than the HTML version I had already built.

I think my ideal future for JessyInk would it for it be based on a more limited editing interface than JessyInk, possibly in a Web-based editor such as the one made possible by svg-edit, which would focus on the actual tasks you’re likely to consider when composing a presentation.

by Dom at October 09, 2009 11:00 AM

October 06, 2009

W3C Q&A Weblog

RIF and OWL

The W3C RIF Working Group has just published the RIF specification as a Candidate Recommendation. As a coincidence, the OWL 2 Working Group published the OWL 2 specification as Proposed Recommendation just a few days before. Ie, two major sets of technologies that can be used for various kinds of inferences on the Semantic Web have arrived to a high level of maturity almost at the same time. If everything goes as planned (I know, it never does, but one can still speculate) they will become Recommendations around the end of the year.

A group of questions I often get is: how do these two sets of recommendations relate to one another? Did W3C create competing, incompatible technologies that are in the same design space? Why having two? How can they be combined?

To answer this question one has to realize that the two sets of technologies represent different approaches. OWL 2 (and, actually, RDFS) rely, very broadly speaking, on knowledge representation techniques. Think of thesauri, of ontologies, of various classification mechanisms: one classifies and characterizes predicates, resources, and can then deduce logical consequences based on that classification. (On the Semantic Web this usually means discovering new relationships or locating inconsistencies.) RIF, on the other hand, is more reminiscent of logic programming (think of Prolog). Ie, if these and these relationships hold then new relationships can be deduced. (It must be said that RIF also includes a separate work on production rules, but they are fairly distinct from OWL 2, so let us forget about that for the moment.)

Would I want to use OWL 2 or rather RIF to develop an application? Well, it depends. Some applications are better formulated this way, others that way. There are a number of papers published on when one approach is better than the other, how certain tasks can or cannot be expressed using classification or rules, respectively, how reasoning is possible in one circumstances or the other. Very often it also boils down to personal experience and, frankly, taste: some feel more comfortable using rules while others prefer knowledge representation. I do not think it makes sense to claim that one is better than the other. Simply put: they are different and both approaches have their roles to play.

So far so good, the reader could say, but what about using OWL and RIF together?

One of the six recommendation track documents of RIF is called “RIF RDF and OWL Compatibility”. Because we are talking about formal semantics, this document is of course not an easy read. However, in layman's term, what it describes is how the two “sides”, ie, the rule and the classification sides, should work together on the same data set. It defines some sort of an interplay between two different mechanisms: the, shall we say, logic programming part and the knowledge representation part. Implementations doing both are a bit like hybrid cars: they have two parallel engines and a well defined connections between those two. That said, the document only defines what the combination means; whether, for example, engines will always succeed in handling the two worlds together in a finite time is not necessarily guaranteed in all cases. But we can be positive: in many cases (ie, by accepting restrictions here and there) this combination does work well, and there are, actually, good implementations out there that do just that.

A simple case where no problem occurs is the so called OWL 2 RL Profile. This profile has been defined by the OWL Working Group with the goal of being implementable fully via rule engines. This does not necessarily means RIF (I myself have implemented OWL 2 RL by direct programming in Python), but the fact that RIF could also be used is important. The RIF Working Group has therefore published a separate document (“OWL 2 RL in RIF”) which shows just that: it reformulates the rules for the implementation of OWL 2 RL as RIF rules (more exactly, RIF Core rules). Ie, a RIF implementation can just take those rules, import any kind of RDF data that also include OWL 2 statements, and the RIF engine will produce just the right inference results. How cool is that?

So, the answer to the original question is: yes, for many applications, RIF and OWL 2 can happily live together ever after…

Reblog this post [with Zemanta]

by Ivan Herman at October 06, 2009 03:52 PM

September 29, 2009

Ivan's Blog

OWL 2 RL closure


OWL 2 has just been published as a Proposed Recommendation (yay!) which means, in laymen’s term, that the technical work is done, and it is up to the membership of W3C to accept it as a full blown Recommendation.

As I already blogged before, I did some implementation work on a specific piece of OWL 2, namely the OWL 2 RL Profile. (I have also blogged about OWL 2 RL and its importance before, nothing to repeat here.) The implementation itself is not really optimized, and it would probably not stand a chance for any large scale deployment (the reader may want to look at the OWL 2 implementation report for other alternatives).  But I can hope that the resulting service can be useful in getting a feel for what OWL 2 RL can give you: by just adding a few triples into the text box you can see what OWL 2 RL means. This is, by the way, an implementation of the OWL 2 RL rule set, which means that it can also accepts triples that are not mandated by the Direct Semantics of OWL 2 (a.k.a. OWL 2 DL). Put it another way, it is an implementation of a small portion of OWL 2 Full.

The core of my implementation turned out to be really easy straightforward: a forward chaining structure directly encoded in Python. I use RDFLib to handle the RDF triples and the triple store. Each triple in the RDF Graph is considered, compared to the premises of the rules; if there is a match then new triples are added to the Graph. (Well, most of the rules contain several triples to match with, and the usual approach is to pick one and explore the Graph deeper check against additional matches. Which one to pick is important, it may affect the overall speed, though.) If, through such a cycle, no additional triples are added to the Graph then we are done, the “deductive closure” of the Graph has been calculated. The rules of OWL 2 RL have been carefully chosen so that no new resources are added to the Graph (only new triples), ie, this process eventually stops.

The rules themselves are usually simple. Although it is possible and probably more efficient to encode the whole process using some sort of a rule engine (I know of implementations based on, eg, Jena’s rules or Jess), one can simply encode the rules using the usual conditional constructs of the programming language. The number of rules is relatively high but nothing that a good screen editor would not manage with copy-paste. There were only a few rules that required a somewhat more careful coding (usually to take care of lists) or many searches through the graph like, for examples, the rule for property chains (see rule prp-spo2 in the rule set). It is also important to note that the higher number of rules does really not affect the efficiency of the final system; if no triple matches a rule then, well, it just does not fire. No side effect of the mere existence of an unused rule.

So is it all easy and rosy? Not quite. First of all, this implementation is of course simplistic in so far as it generates all possible deducted triples that include a number of trivial triples (like ?x owl:sameAs ?x for all possible resources). That means that the resulting graph becomes fairly big even if the (optional) axiomatic triples are not added. If the OWL 2 RL process is bound to a query engine (eg, the new version of SPARQL will, hopefully, give a precise specification of what it means to have OWL 2 RL reasoning on the data set prior to a SPARQL query) then many of these trivial triples could be generated at query time only, thereby avoiding an extra load on the database. Well, that is one place where a proof-of-concept and simple implementation like mine looses against a more professional one:-)

The second issue was the contrast between RDF triples and “generalized” RDF triples, ie, triples where literals can appear in subject positions and bnodes can appear as properties. OWL 2 explicitly says that it works with generalized triples and the OWL 2 RL rule set also shows why that is necessary. Indeed, consider the following set of triples:

ex:X rdfs:subClassOf [
  a owl:Restriction;
  owl:onProperty [ owl:inverseOf ex:p ];
  owl:allValuesFrom ex:A
].

This is a fairly standard “idiom” even for simple ontologies; one wants to restrict, so to say, the subjects instead of the objects using an OWL property restriction. In other words that restriction combined with

ex:x rdf:type ex:X .
ex:y ex:p ex:x .

should yield

ex:y rdf:type ex:A .

Well, this deduction would not occur through the rule set if non-generalized RDF triples were used. Indeed, the inverse of ex:p is a blank node, ie, using it in a triple is not legal; but using that blank node to denote a property is necessary for the full chain of deductions. In other words, to get that deduction to work properly using RDF and rules, the author of the vocabulary would have to give an explicit URI to the inverse of ex:p. Possible, but slightly unnatural. If generalized triples are used, then the OWL 2 RL rules yield the proper result.

It turns out that, in my case, having bnodes as properties was not really an issue, because RDFLib could handle that directly (is that a bug in RDFLib?). But similar, though slightly more complex or even pathological examples can be constructed involving literals in subject positions, and that was a problem because RDFLib refused to handle those triples. What I had to do was to exchange all literals in the graph against a new bnode, perform all the deductions using those, and exchange the bnodes “back” against their original literals at the end. (This mechanism is not my invention; it is actually described by the RDF Semantics document, in the section on Datatype entailment rules.) B.t.w., the triples returned by the system are all “legal” triples, generalized triples play a role during the deduction only (and illegal triples are filtered out at output).

Literals with datatypes were also a source of problems. This is probably where I spent most of my implementation time (I must thank Michael Schneider who, while developing the test cases for OWL 2 RDF Based Semantics, was constantly pushing me to handle those damn datatypes properly…). Indeed, the underlying RDFLib system is fairly lax on checking the typed literals against their definition by the XSD specification (eg, issues like minimum or maximum values were not checked…). As a consequence, I had to re-implement the lexical to value conversion for all datatypes. Once I found out how to do that (I had dive a bit into the internals of RDFLib but, luckily, Python is an interpretative language…) it became a relatively straightforward, repetitive, and slightly time consuming work. Actually, using bnodes instead of “real” literals made it easier to implement datatype subsumptions, too (eg, the fact that, say, an xsd:byte is also a xsd:integer). This became important so that the rules would work properly on property restrictions involving datatypes.

Bottom line: even for a simple implementation literals, mainly literals with datatypes, are the biggest headache. The rest is really easy.  (This is hardly the discovery of the year, but is nevertheless good to remember…)

I was, actually, carried away a bit once I got a hold on how to handle datatypes, so I also implemented a small “extension” to OWL 2 RL by adding datatype restrictions (one of the really nice new features of OWL 2 but which is not mandated for OWL 2 RL). Imagine you have the following vocabulary item:

ex:RE a owl:Restriction ;
    owl:onProperty ex:p ;
    owl:someValuesFrom [
      a rdfs:Datatype ;
      owl:onDatatype xsd:integer ;
      owl:withRestrictions (
          [ xsd:minInclusive "1"^^xsd:integer ]
          [ xsd:maxInclusive "6"^^xsd:integer ]
      )
   ] .

which defines a restriction on the property ex:p so that some its values should be integers in the [1,6] interval. This means that

ex:q ex:p "2"^^xsd:integer.

yields

ex:q rdf:type ex:RE .

And this could be done by a slight extension of OWL 2 RL; no new rules, just adding the datatype restrictions to the datatypes. Nifty…

That is it. I had fun, and maybe it will be useful to others. The package can also be honLib-IH/dist/RDFClosure.tar.gz">honLib-IH/dist/RDFClosure.tar.gz">downloaded and used with RDFLib, by the way…

Posted in Python, Semantic Web, Work Related Tagged: Description logic, Knowledge Representation, OWL, OWL RL, Python, RDF, RDFLib, Resource Description Framework, SPARQL, w3c

by Ivan Herman at September 29, 2009 03:34 PM