July 09, 2009

Decentralyze - Programming the Data Cloud

sandhawke


RIF is done, more or less.

When I say “done”, I don’t mean “done” like toast in the toaster is done, when it’s just perfectly crunchy, without quite being dry or hard. And I don’t mean “done” like a hacking project which is done at that precise moment when it stops being more fascinating than sleep or food or sunshine. No, RIF is “done” like a term paper, the night before it’s due. It meets the requirements, more or less, and the time has come to ship it.

Of course, the W3C process favors quality over speed, so instead of turning it in and walking away, we’ll have to do several rewrites, to address the teacher’s comments. In this case, we have at least three rounds of that. In the first round (called “last call”, which started last friday), the “teacher” is anyone who feels like reading and commenting. Then comes “candidate recommendation”, when we try to get everyone to implement it and give us comments as they do. (This is where OWL 2 is now). Finally, we’ll ask for a high level review from all W3C member organizations, as they decide whether to promote it from Proposed Recommendation to a full W3C Recommendation.

But still, it’s done like that term paper. It’s turned in, and now we wait for the review comments.

So what is RIF good for, anyway?

The consensus, Working Group answer is 26 pages long and rather in need of some polish, so here’s my short answer. Here’s why I’ve spent the last five years, working on it. (No need to cry for my lost youth, I did some other fun things during that time, too.)

We need RIF so that we don’t need standards any more.

If you’ve ever tried to use FOAF (arguably the most popular Semantic Web vocabulary), you may have noticed a little problem with representing names. Am I:

  • [ foaf:firstName "Sandro"; foaf:surname "Hawke"], or
  • [ foaf:givenname "Sandro"; foaf:family_name "Hawke" ], or just plain
  • [ foaf:name "Sandro Hawke" ] ?

Who knows? How can anyone decide? It’s a mess.

And, of course, this problem is repeated everywhere. Every ontology has it’s share of coin-flip design decision — decisions where you have no overwhelming engineering reason to make one choice over another. And every problem space has, or will soon have, a vast array of ontologies addressing it from many slightly-different angles.

I want data providers to publish using whatever ontology they know and love.

I want data consumers to consume (use) data in whatever ontology they know and love.

I expect RIF to be the glue in the middle, behind the scenes, in a fuzzy ball of linked-data rule-engine goodness.

Imagine Jos publishes using foaf:firstName and foaf:surname. Imagine Chris publishes using foaf:givenname and foaf:family_name. Imagine Gary writes an app which looks for foaf:name data. As long as the right RIF rules are present on the Web, in the right places, this should work. People using Gary’s app should see the data from Jos and the data from Chris, even though Gary never knew or cared about the vocabularies they used.

Of course, there’s some question as to what those rules should say. In the US, the givenname and the firstName can be treated as the same. Meanwhile, in Japan, the family_name is the firstName! And if you try to split a name back into firstName and surname, do you use the last space (as in “Sarah Jessica Parker”) or the first space (as in “Hillary Rodham Clinton”)?

I think the solution is to accept that there may be multiple rulesets, suitable for different purposes, and they may not be perfect. I explored this space to some degree in a different context: XTAN associates some “impact” (which might be called “semantic damage”) with each transformation (or ruleset). I think that can work.

So, there are still some details to work out. I’m not presenting the solution here; I’m just explaining why RIF interests me. Now you know.

(And no, I don’t really think this will entirely obviate the need for standards, but I think it will significantly reduce that need, taking some pressure off, and shifting some more work to the machines.)

by sandhawke at July 09, 2009 04:47 AM

July 05, 2009

Ivan's Blog

Dagstuhl Workshop on Semantic Web


Dagstuhl castleI have just come back from the Workshop “Semantic Web: Reflections and Future Directions”, held in Dagstuhl, Germany. Organized by John Domingue, Rudi Studer, Jim Hendler, and Dieter Fensel, the workshop positioned itself as the “second release” of a similar workshop that was held at the same place 10 years ago.

The first two days of the workshop were more traditional, in the sense that it was series of presentations and panels. This was the “reflection” part of the workshop: looking back to 10 years’ of history as well a peek into the current state of the art. It was interesting but, for my taste, a bit too long; the programme of the two days could have been compressed into one or, say, one and a half days. That would have given more time to the “future directions” part, ie, discussions in break out groups on various topics. I enjoyed those a lot: free flowing discussions on various topics, helping to exchange ideas, experiences, pointers at other works and results, and crystallizing possible future R&D issues. These discussions took place in a very pleasant, relaxed atmosphere among people who mostly knew one another already, ie, we could really concentrate on issues. Each group formulated a number of research goals for the years to come; some group also came up with more practical steps and goals.

As far as I know, the workshop organizers plan to collect all those research issues in some more coherent form, so we should watch this space. In what follows I just collect some issues that I took away from the workshop without the goal of being exhaustive; indeed, there were 6-7 parallel break out groups.

Issues around Web scale. This is clearly one of the major topics of the day. What happens when one has to deal with data containing billions of triples, when the data (ie, the triples) are “dirty”, ie, inconsistent, faulty, etc. Think of the Linked Open Data cloud, of data coming from sensor networks, mobiles, etc. Do we have to re-think all the notions that the Semantic Web inherited from the logic world, ie, completeness, meaning and consequences of consistency, what it means to get results for a query, etc? This is one area where opinions tend to diverge a lot. Some would prefer to completely put aside the traditional logic approaches (rules, descriptions logic, ontologies, OWL, etc), while others may argue that the advances in computing, in reasoning engines and methods are (and are expected to be) such that these methods should still be just as usable as before. As always, I hate any black-and-white statements… I do not think dismissing an area of technology is the right way but, also, other avenues, or new viewpoints should to be explored, too (e.g., how to react on inconsistencies, trying to get possibly incomplete results but whatever can be obtained within, say, 2 minutes, that sort of things). What approach would be used is very much dependent of the application. Anyway… Web scale is a major issue, everybody agrees on that!

Interaction. This is one of the break out groups that I did not attend, unfortunately. And obviously a hugely important direction of future R&D. Many Semantic Web applications today are such that their user interface is just standard because all Semantic Web related work happens behind the scenes, usually on the server side. However, on long term, there is a clear need for programs that could somehow directly show the data in some friendly way, programs that self-adapt themselves to the nature of the data. Not only for experts, but also for laypeople. Such environments may not only include extensions of current browsers but, eg, full desktop environments. Sort of intelligent, data-oriented user interfaces. A major research problem (user interface methodology is always a major problem, whether related to Semantic Web or not…), but also a hugely exciting research and development opportunities!

Vocabularies. There was a separate group on the management of vocabularies, which has identified a number of R&D issues: how does one describe a vocabulary, its interdependence with other vocabularies, how does one rank vocabularies… These are all fundamental question to solve to be able to find vocabularies for a specific purpose, to make specialized search. There are also issues around archiving, providing stable URI-s; last but not least (and this goes way beyond vocabularies only) major legal issues on what type of attribution, copyright or other legal machinery are to be used with vocabularies (it was good to have Tom Heath, who could tell us a bit about the datacommons’ approach). As an example of the many technological problems arising, the break-out groups coined the term “cherry picking of terms”. Although OWL has a mechanism for import, the practice of the RDF world is to use (ie, “cherry pick”) vocabulary terms (predicates, classes, etc) from various different vocabularies without necessarily taking the whole vocabulary, and certainly without using the owl:import predicates (think of routine usage of dc:title without importing the full Dublin Core vocabulary). How would a reasoner treat those? It may be a little bit easier to use a more rule based approach (like OWL RL) although it is not obvious how to cherry pick just the right amount of information on a, say, predicate. But Ian Horrocks also drew my attention on formal ontology modularization work that might be very relevant here; item added to my “to-be-read” list…

Provenance (and trust). One of the issues that popped up in all other break out groups; in consequence a separate one was formed on the second day of discussions. It is indeed one of the questions that anyone who talks about Semantic Web gets; in my personal view, having a clear “story” to tell about provenance is essential for a further deployment of this technology. The discussion in the group was really interesting because this issue raises a number of other questions, like the overall relationship of cryptographic techniques and the Semantic Web, what it means to have trust in context, what are the relationships to temporal or uncertainty reasoning, etc, etc, etc. It was also interesting for me to hear about other works, like the Open Provenance Model, albeit some of these were not necessarily done by Semantic Web people (eg, by the database community). We agreed that a Wiki page will be created (probably at RPI, set up by Deb McGuinnis) to collect information on this subject, and forming a W3C Incubator Group might also be in the books to provide a more thorough state-of-the-art. A long list of additional items to my “to-be-read” pile is coming…

And, of course, it was also good to meet a bunch of people, discuss things at lunch or dinner. This type of interaction is really fruitful. And there was also intensive twittering going on (using the #swdag2009 tag, pointing to a bunch of other reseources) although this time I did not twitter too much because I had problems with my wireless card:-(

It was a good meeting; thanks for the organizers. Would be good not to wait another 10 years for the next incarnation of this event…

Posted in Semantic Web, Work Related Tagged: Description logic, Knowledge Representation, OWL, Provenance, Resource Description Framework, Semantic Web, User Interface, Vocabulary management

by Ivan Herman at July 05, 2009 06:47 AM

Reinventing Fire

Sonnet to Liberty

For the Fouth of July, America’s Independence Day, here’s a poem by Oscar Wilde, from 1881:

Not that I love thy children, whose dull eyes
See nothing save their own unlovely woe,
Whose minds know nothing, nothing care to know,—
But that the roar of thy Democracies,
Thy reigns of Terror, thy great Anarchies,
Mirror my wildest passions like the sea,—
And give my rage a brother——! Liberty!
For this sake only do thy dissonant cries
Delight my discreet soul, else might all kings
By bloody knout or treacherous cannonades
Rob nations of their rights inviolate
And I remain unmoved—and yet, and yet,
These Christs that die upon the barricades,
God knows it I am with them, in some things.

by Schepers at July 05, 2009 04:39 AM

July 03, 2009

MWI Team Blog

A tool to "powder" mobileOK content

Version 1.2 of the W3C mobileOK Checker, released on Tuesday, helps Web authors focus on the failures that most affect the mobile-friendliness of their content, and returns the POWDER document Web authors may use as the basis of a mobileOK® conformance claim.

Expandable sections

The reports returned by the mobileOK Checker can be long. That's not a bad thing, failure points need to be clarified. That said, scrolling over a long list of details is a tedious process and does not reveal the big picture. The new version adds unobtrusive (as in "works fine when Javascript is not enabled") Javascript to hide/show details. Details are hidden by default, simply click on a failure message to reveal its details!

Expandable sections to focus on what's important for you.

Severity levels

A missing width attribute on an img element? That's a failure. Using frames? That's a failure. Obviously, the former case only slightly affects the mobile-friendliness of the page, while some mobile browsers won't even be able to render the page in the latter case. And yet both failures looked alike in the report, leaving the difficult task to evaluate the impact of a failure on the overall mobile-friendliness of the page to the reader.

Failure messages are prefixed with their severity level

Each failure now comes with a severity level:

  • critical: such failures typically prevent the rendering of at least part of the page on most mobile devices! Critical errors are highlighted using a yellow background.
  • severe: while such failures usually do not prevent the rendering of the page, they strongly impact the user experience.
  • medium: some mobile constraints are not appropriately taken into account, e.g. the browser needs to retrieve more data than actually needed to render the Web page.
  • low: useful improvements are possible.

Web authors who only have limited time available to fix failures may want to focus on the most severe failures first. The "Where to start..." section near the top of the report lists the top 3 failures to address right away.

Sprinkle POWDER on your mobileOK content

So your content is mobileOK? Congratulations! You may now wish to identify your content as mobileOK conformant. The Mobile Web Best Practices Working Group recently published the W3C mobileOK Scheme 1.0 note. It provides an overview of the mobileOK scheme and explains in particular how to claim mobileOK conformance.

One way to make such a claim is to use POWDER. When the page is mobileOK, the mobileOK Checker now returns a POWDER document you may use to advertise that the page is mobileOK®. For instance, the mobileOK checker returns the following POWDER document when http://www.w3.org/Mobile/ is checked:

<?xml version="1.0"?>
<powder xmlns="http://www.w3.org/2007/05/powder#">
 <attribution>
  <issuedby src="http://www.w3.org/data#W3C" />
  <issued>2009-07-03T08:37:21Z</issued>
  <supportedby src="http://validator.w3.org/mobile/" />
 </attribution>
 
 <dr>
  <iriset>
   <includeresources>http://w3.org/Mobile/</includeresources>
  </iriset>
 
  <descriptorset>
   <typeof src="http://www.w3.org/2008/06/mobileOK#Conformant" />
   <displaytext>The page is mobileOK</displaytext>
   <displayicon src="http://www.w3.org/2005/11/MWI-Icons/mobileOK.png" />
  </descriptorset>
 </dr>
</powder>

For more information on POWDER, please refer to the POWDER Primer.

... and more!

A few other features compose this summer release, such as the size of each resource that composes the page, or the repartition of points lost per severity level. The complete change log is detailed in the What's new? page.

Feedback welcome!

by Francois Daoust at July 03, 2009 12:50 PM

July 01, 2009

W3C Q&A Weblog

Data in the City

On Monday of this week I attended a hearing in New York City organized by the Technology and Government Committee of the New York City Council. On the agenda was a proposal (Int. No. 991) regarding the use of open standards for publishing New York city government data. I picked up a printed copy of the proposal and a summary when I walked into the hearing. To my surprise the handout referred to W3C by name (the online proposal does not) and included a reference to the recent publication of the eGovernment Interest Group Improving Access to Government through Better Use of the Web.

So I filled out a form requesting to speak. To my surprise, the Chair invited me to testify early in the hearing.

Before I spoke, however, a representative from the Mayor's Office voiced opposition to some specifics of the proposal. Earlier that day, at the Personal Democracy Forum elsewhere in the city, the Mayor himself announced several initiatives regarding publishing government data. This had generated some excitement, and a number of people who had been attending the conference (I had not) were present at the hearing.

The Mayor's Office cited 5 or 6 reasons why it opposed the particular proposal (which I trust will appear in the public record that I've not yet located) but the main ones I recall were cost and burden. I would paraphrase some of the exchange between the city council committee and the Mayor's office as follows:

  • City Council: Please put raw data on the Web.
  • Mayor's Office: We prefer publishing information that is less raw and more citizen-friendly.
  • City Council: Citizens won't know what they are missing unless you put it up there.
  • Mayor's Office: That will cost too much (e.g., scanning old documents). We have lots and lots of documents.
  • City Council: By choosing what to provide and massaging the data, you are not letting people make better use of it.
  • Mayor's Office: See the initiatives we just announced. We think that we are meeting customer needs (which we hear through surveys, complaints, etc.)
  • City Council: You shouldn't decide what people want. Let them decide.

W3C's eGovernment Interest Group has been working with a growing number of agencies to gather information that will help address these sorts of concerns. Now they will develop best practices and guidelines for publishing government data. This is not an area I know well, so I look forward to being able to refer to the eGov IG's findings. However, I'm sure New York City is not the first government to wrestle with the technology, the cultural issues ("why should I publish my data?"), and how to use taxpayer money to do this.

When my turn came to speak, I said something like this:

  • Thanks for using open standards.
  • Use W3C Semantic Web Standards to publish data. As a starting point, I referred to Tim Berners-Lee's recent draft of Putting Government Data online
  • Don't try to do everything at once. Start with what is already available electronically, for example.
  • Don't require agencies to coordinate through a single portal. Let them publish data at their own speed. Then aggregate (through a single portal if you wish and if people find that easy to use).
  • Participate in the eGovernment Interest Group.

I hope my summary here is backed up by the public record.

by Ian Jacobs at July 01, 2009 10:51 PM

ishida >> blog

Converter tool updated and moved

A new version of this very popular tool is now available, in a new location. Although it is currently labeled ‘beta’, I recommend that you use that instead, and change any links and bookmarks to the new location. There are a number of new features.

There is also a vastly improved code base. If you are one of the many people who have contacted me to ask how I coded the conversions, please take a look at the new javascript code. It is much cleaner and more compact.

New features include:

* New mixed input field and position of some fields changed.
* New field for conversion of 0x… notation hex escapes.
* Enabled invisible and ambiguous characters to be made visible in the XML output.
* Added support for all HTML entities in HTML/XML input.
* All code rewritten to use characters as the internal representation, rather than code points. Also, code is much smaller and cleaner, partly through use of regular expression matching.
* Various filters available for conversion, such as allowing ASCII or Latin1 characters to remain unconverted in NCR output.
* New icon to quickly select all contents of a field.

There is also a new demonstration feature.

If there are no issues raised/remaining in a couple of months, I’ll remove the beta tag.

by r12a at July 01, 2009 09:06 AM

June 30, 2009

W3C Q&A Weblog

Reflections on SemTech 2009

SemTech 2009, along with W3C's significant participation in it, is now behind us. Besides catching upon on emails, I have spent the past week reflecting on the enthusiasm, presentations, and flurry of activities that constituted this year's event in San Jose, 14 to 18 June.

One strong feeling I had while in San Jose, was a sense of /deja vu/ in the Web world. Stepping back, I realize that 2009 feels a lot like 1999 when I was consulting with Allaire (remember CFML and ColdFusion?) and attended their user group meetings teaming with enthusiastic Web developers with war stories about their successes and failures bringing Web development servers into organizations of all types and sizes.

Ten years ago, many enterprises were just getting onto the "e-commerce bus," having been either eclipsed or inspired by the likes of innovative Web-centric companies such as Amazon.com and eBay who launched in 1995, or early-adopter retailers like JCPenney whose understanding of the catalogue business put them online faster than many other retailers, or businesses for that matter. Many mainline companies were in various phases of their Web evolution in 1999 -- from brochureware to intranets to pilot customer-facing interactive sites. And keep in mind that ten years ago, Google was barely two.

In 1999 there was also a wide cross-section of skill sets and diversity of understanding about what the Web was, how it worked, and what people and tools to trust to bring one's vision onto the Web. I remember sitting in focus groups with a number of HTML Web designers who were impatient with their more senior corporate IT colleagues who insisted on clear roadmaps, risk assessments and cost-benefit analyses for the Web-based tools and technology solutions their companies were considering.

The Java developers, engineers and system architects in other discussion groups also weren't too keen on the irreverent attitudes and huge amounts of money being thrown at these young people, who just a few years earlier were teenagers playing video games at the arcades. But understanding and trust continued to build, innovation accelerated, communities with technical skills increased, and revenues skyrocketed as a direct result of vendors developing and companies embracing new Web technologies.

We fast forward to 2009 and see similar dynamics with Semantic Web technologies. There are the early adopters and evangelists who have already climbed aboard the "RDF-bus," understand what's possible with W3C's Semantic Web technology standards, and can point to impressive results in new tools, pilot projects and even robust deployments within organizations, governments, and enterprises.

Yet skeptics remain both in terms of understanding the paradigm shift that the Semantic Web brings, just as the early Web challenged the status quo, and in the legitimate need for better tools and long-term architectural considerations for how to successfully deploy Semantic Web technologies in large enterprises.

Like the early Web and the W3C standards and subsequent commercial tools, products and services that enabled its rapid growth, the W3C Semantic Web stack is highly stable today. The accelerating uptake of W3C Semantic Web standards, new tools and applications were part of the buzz at this year's Semantic Technologies Conference.

In addition to hearing and seeing many new use cases and case studies, the call for commercialization was clear, as was the amount of enthusiasm among the technologists doing good and exciting work. The community's call to publish and link data in RDF or RDFa is clearly being heard, with The New York Times joining the ranks of large data holders eager and willing to publish to the Linked Open Data Cloud.

Finally, the number of Semantic Web communities flourishing in cities coast to coast across North America and in Europe, is another healthy sign that the growth and adoption of Semantic Web technologies has not only "crossed the chasm" (in keeping with Geoffrey Moore's model), but has spawned strong beachheads of support among highly skilled technology professionals across business, industry, and government sectors.

It is my hope that at next year's Semantic Technologies Conference -- which is changing venues to San Francisco -- we will point to an even higher coordinate on the adoption curve and see amazing new results and impact from the use of W3C Semantic Web technologies. If I were Jean Luc Picard, I would, "Make it so." But for now, I'll continue in my role of education and outreach for W3C.... Look forward to seeing many of you throughout the year and at next year's conference!

by Karen Myers at June 30, 2009 01:38 PM

June 26, 2009

W3C Q&A Weblog

WCAG 2.0 in your mother tongue

I come from Egypt, live in Austria, work in France, and when I start speaking, some people think I'm American. I speak fluent German and English, but no matter what I do, some expressions and thoughts will always be easier for me in Arabic than in any other language. The expression "mother tongue" hits it rather well - it is the language where I feel most home and safe, despite it getting a little rusty over the years.

Come to think of it, the majority of the human population is probably more comfortable in a language other than English. It happens to be that English is the working language of W3C (and most international organizations) but that does not mean that other languages are not equally welcome at W3C. In fact, W3C encourages volunteers to contribute their valuable time and effort to translation of W3C standards and other resources.

I'm particularly proud of the Policy for Authorized W3C Translations which allows the production of translations that are recognized by W3C. This is especially useful for W3C standards such as Web Content Accessibility Guidelines (WCAG) 2.0, which are read and used by a large number of people. Besides Web developers, WCAG 2.0 is also used by decision makers, researchers, accessibility advocates, and people with disabilities from around the world.

Today the W3C Web Accessibility Initiative (WAI) announced the publication of the French Authorized Translation of WCAG 2.0. It is the first Authorized Translation of WCAG 2.0 and we expect others in Brazilian Portuguese, Catalan, Chinese, Czech, Danish, Dutch, German, Hindi, Hungarian, Italian, Japanese, Korean, Portuguese, Russian, Spanish, Swedish, and more to follow. There are also several unofficial translations available and in progress. The WCAG 2.0 Translations page lists completed and planned translations.

While this is an impressive list of translations, it is still only a small fraction of all existing languages. For instance, I am looking forward to being able to read WCAG 2.0 in Arabic. If we want to support the diversity of languages and cultures on the Web then we must continue to develop and promote such translations. Please engage and help us promote translations for W3C standards such as WCAG 2.0 in all languages of a truly World Wide Web.

by Shadi Abou-Zahra at June 26, 2009 03:08 PM

W3C team at SemTech

Some of us on the team had a pretty busy last week: indeed, Karen Myers, Sandro Hawke, Dave Raggett, Eric Prud'hommeaux, Ralph Swick, and I were at the Semantic Technologies 2009 conference in San Jose. Dave (together with Dianne Mueller from JustSystems) gave a presentation on XBRL and the Semantic Web, Eric gave a tutorial (together with Lee Feigenbaum, from Cambridge Semantics) on SPARQL, and I also gave an introductory SW tutorial and a presentation. And, of course, we all had hallway discussions, meetings, interviews… more than I even remember right now. A number of W3C members were also represented either as presenters or at their booth at the exhibition (or both). More than 1200 people in San Jose in spite of the economic malaise... This is pretty good!

I published a blog entry on right before my journey back to Europe (and an addendum because I forgot something in the original blog entry…) with much more details. If you are interested in more detailed impressions on the conference, you can read it there. Suffices it to say: it was a great week!

by Ivan Herman at June 26, 2009 09:16 AM

June 25, 2009

Michael(tm) Smith

QtWebKit has been ported to S60

Alessandro Portale has announced the “Tower” release of the Qt for S60. He writes that “there are three fresh modules: Phonon, QtSql and QtWebkit”, and adds that “QtWebkit on S60 is still considered experimental. However, You should already be able to start developing QtWebKit refined applications for the pocket.”

QtWebKit is a port of the WebKit browser engine to the Qt cross-platform application-development framework, and Qt for S60 is in turn a port of Qt to the Symbian OS, widely used on mobile devices.

by Michael(tm)Smith (mike@w3.org) at June 25, 2009 07:00 PM

June 24, 2009

W3C Q&A Weblog

Orthogonality of Specifications

,,

The general principle of platform design is that platforms consist of a set of standard interfaces. Standard interfaces allow substitution of components across the interface boundary, while independence of interfaces allow evolution of the interfaces themselves. In a PC, for example, the disk bus interface allows many different disk vendors to offer disk products independent of the model of display or keyboard, but the orthogonality of interfaces allow evolution of the interfaces themselves. If the display interface were linked to the disk interface too tightly, it wouldn't be possible to evolve ISA to SATA without updating VGA.

In the web platform, the three important interfaces are transport, format and reference, and the current definitions of those interfaces are HTTP, HTML and URI. The interfaces are standard, allowing many different implementations: HTTP standard lets you use HTTP servers from many vendors, the HTML standard lets you use many different HTML authoring tools or template systems, and the URI specification allows identification of many different components.

While HTTP is the current "common denominator"  protocol that all web agents are expected to speak, the web should continue to work if web content is delivered by other protocols -- FTP, shared file systems, email, instant messaging, and so forth.  HTTP as it has evolved has severe difficulties, and designing a Web that only works with HTTP as it is currently implemented and deployed would unfortunate. We should work harder to reduce the dependencies and isolate them.

HTML is the 'lingua franca', the common language that all agents are currently expected to be able to produce, process, read and interpret (or at least a well-defined subset of it). Having a common language is important for interoperability, but  the web should also work for other formats -- extensions to HTML  including scripting, DOM APIs, but also other formats and application environments such as XHTML, Java, PDF, Flash, Silverlight, XForms, 3D objects, SVG, other XML languages and so forth. Certainly HTML has it has evolved is overly complex for the purposes to which it is designed.

The URI is the fundamental element of reference, but the URI itself is evolving to deal with internationalization, reference to session state, IRIs, LEIRIs, HREFs and so forth. Many applications use URIs and IRIs, not just the formats described above but other protocols and locations, including databases, directories, messaging, archiving, peer-to-peer sharing and so forth.

The is just one of many communication applications on the global Internet; for web browsing to integrate will with the rest of the distributed networking, web components should be independent of the application, and work well with messaging, instant messaging,  news feeds, etc etc.

A sign of a breakdown of this architectural principle would be for a specification of a format (say HTML) to attempt to redefine, for its purposes, the protocol (say HTTP) or the method of reference (URI).  The specifications should be independent, or at least, dependencies isolated, minimized, reduced. If those other elements of the web architecture are incorrect, need to evolve to meet current practice or have flaws in their definitions, they need to evolve independently, so that orthogonality of the specifications and reusability of the components are the promoted.

There may well be reasons to link some features of HTML to the fact that it is delivered over an interactive protocol, but linking HTML directly to HTTP in a way that features would work only for HTTP and not for any other protocol with similar features – that would be unfortunate. It might not matter in the short-term (that’s all we have right now) but it is harmful to the long-term evolution of the web.

(Should go without saying, but just in case: this is a personal post, not reviewed by the TAG)

by Larry Masinter at June 24, 2009 01:01 PM

June 23, 2009

Michael(tm) Smith

Project Electrolysis: Multi-process Gecko

Chris Jones just posted a Multi-process Firefox, coming to an Internets near you write-up to his blog, about the Electrolysis project (coordinated by Benjamin Smedberg, with Joe Drew, Jason Duell, Chris himself, Ben Turner, and Boris Zbarsky) that will add multi-process (one tab per process) support to Gecko. Chris’s blog posting has a video demo that’s worth watching, and last week Ben Smedberg posted a Electrolysis: Making Mozilla Faster and More Stable Using Multiple Processes write-up to his blog that’s also worth reading.

For details on status on the ongoing project, see the Content Processes page on the Mozilla Wiki, or go talk with the developers in real time on the #content channel on irc.mozilla.org.

by Michael(tm)Smith (mike@w3.org) at June 23, 2009 08:57 AM

June 22, 2009

W3C Q&A Weblog

For Erik Naggum, in appreciation

Reading Michael Sperberg-McQueen's blog over the weekend, I came across news that Erik Naggum, an active member of the SGML community, going back many years, has died.

Michael writes:

Erik Naggum, dead? Is it possible? One person fewer who remembers the old days.

I never myself had any direct interactions with Erik, but I can say that it seems to me he did quite a lot to ensure that the old days would be remembered by those who came after. At least I can say that he helped me learn quite a lot about the history of the community and some of its important technologies -- because, at the time when I first learning about XML and SGML, I discovered in the SGML/XML Archive Sites section of Robin Cover's Cover Pages site a link to an SGML Repository at the University of Oslo Department of Informatics, with a note saying that the archive was "created and is supported by Erik Naggum."

Among the exhaustive range of resources there that can no longer be found anywhere else (including everything from PostScript sources for particular documents to complete archives of long-gone mailing lists), anybody with half an interest in SGML or even XML is likely to find something invaluable (in my case, one part that I'm particularly thankful for being able to find there were items related to the history and evolution of DocBook).

I never got around to contacting Erik to say thanks while he was still alive. So I hope this posting here can make up a little for my neglecting to have done that.

Erik, thanks.

by Michael(tm) Smith at June 22, 2009 03:27 AM

June 21, 2009

Michael(tm) Smith

Background rationale for layered architecture of WebKit JavaScript engine

A recent WebKit bug comment from Gavin Barraclough gives some insights into the rationale for the layered architecture and MacroAssembler used in the current JavaScript engine in WebKit. Some excerpts from that comment:

The abstract code generation layer (MacroAssembler interface down) is layered like a traditional compiler. In a traditional compiler, it is common to have an assembler layer completely independent of the compiler (often a separate application). The compiler takes a source code file, compiles it, and produces an output file of assembly code.…

Layering the compiler on top of an assembler in this fashion provides a number of benefits. For the compiler developer, layering the compiler on the assembler separates the instruction selection from the minutiae of machine instruction encoding. For clients of the compiler providing a well defined language for machine instruction generation is useful if the compiler provides facilities to bypass the higher level language, and directly emit a specific sequence of machine instructions…

The assembler interface within the JIT is designed to closely mimic that of the assembler layer in a traditional compiler…

If that sounds interesting, you can find a lot more details by reading the full text of the comment.

by Michael(tm)Smith (mike@w3.org) at June 21, 2009 09:19 PM

WebKit: Facilitating alternative/experimental implementations of existing features vs. discouraging unnecessary duplication of efforts

On the webkit-dev mailing list back in April, there was an interesting thread that Michael Nordman from the Google Chrome team started with a message titled, “AppCache functionality provided by the embedder of webkit” (related to the offline-Web-applications feature in HTML5). Michael begins that message with this paragraph:

I’m working on the app cache for Chrome. We’ve decided to hoist most the functionality provided by the app cache into Chrome’s main browser process, so we won’t be using most of the implementation provided by WebKit. I’d like to work through what changes to make within WebKit/WebCore to allow an embedder pull that off. Any suggestions would be much appreciated.

Darin Adler responded with some thoughts and a question (to which Michael replied) but the discussion about specifics didn’t go anywhere after that (not on the mailing list at least), because in the next message in the thread, Maciej Stachowiak replied to question the general approach — in fact, to question whether the WebKit trunk should be providing mechanisms to facilitate replacements of parts of its own core code:

It’s been a recurring theme for the Chrome team to request hooks to bypass WebKit functionality and replace it with Chrome-specific code that lives outside the WebKit tree. So far this has been mostly for code developed when Chrome was originally a secret project. While we felt it was best to grandfather in the existing carve-outs, in general I believe this is not the best way to move the WebKit project forward. I would not like to see this pattern replicated for newly developed functionality.

Earlier in the same message, Maciej cites a reason why facilitating the proposed general approach can have a potentially negative side effect:

One downside of this approach is that, if the application cache ever needs to change, it may be necessary to make changes to two separate implementations hosted in different repositories. In addition, quality-of-implementation improvements to one version won’t benefit the other.

[…Continued…]

by Michael(tm)Smith (mike@w3.org) at June 21, 2009 08:57 PM

WebKit destined to get its own content sniffer

Web/browser-security maven and coder Adam Barth has been working on implementing a content sniffer in WebKit, based on a content-sniffing algorithm that was originally specified in the HTML5 draft, but that’s now specified as a separate IETF draft that Adam is editing and that’s titled, Content-Type Processing Model.

WebKit applications/ports for particular platforms all currently need to rely on platform-specific content-sniffer code outside of WebKit. There are some reasons why it’s a good idea to do things that way — but there are also some good reasons not to; as Adam notes, doing things that way runs the risk of creating compatibility and security differences among various WebKit ports. So implementing a content-sniffer in WebKit itself will eliminate those differences.

[…Continued…]

by Michael(tm)Smith (mike@w3.org) at June 21, 2009 08:54 PM

On privacy protection in Web applications and browser APIs

Culled from a recent exchange I had on twitter, the following are some randomly ordered thoughts on privacy protection in Web applications/APIs intended for location-based services (LBS).

  • we really don’t want each of N different location-aware applications on a device showing their own Nth-different “location sharing active” dialogs to users
  • nobody questions the intentions of any of the proposed LBS privacy-protection solutions; they instead question whether the proposed solutions would actually have the intended effective if implemented
  • there are legitimate concerns that some LBS privacy-protection proposals, despite intentions, would risk creating a situation ultimately harmful to users
  • any proposal being advocated should be judged on its technical merit, not on its intentions
  • advocacy is bad when it means continuing to dogmatically promote a particular well-intentioned-but-unproven solution even after that proposed solution has been legitimately and seriously questioned
  • any effective solution must start with not trying to pressure (bully) browser vendors into implementing a particular proposal, but instead working with browser vendors (rather than in isolation from them) to develop a general solution that’s actually workable
  • building a specific privacy-protection mechanism into one particular API is not a solution to the general problem of protecting user privacy across different classes of applications
  • when legal requirements for privacy protection in applications are not in line with market realities and implementation/user practicalities and/or are not enforceable, the market is going to rightly ignore them

by Michael(tm)Smith (mike@w3.org) at June 21, 2009 07:40 PM

WebKit team is implementing HTML datagrid element/API

David Hyatt just opened a new WebKit master feature-implementation bug on June 19th: Implement the HTML5 datagrid. His first comment there:

This implementation may end up being very different from what’s in the spec.

The goal is to create a simpler implementation that can help improve the spec.

The <datagrid> element is a new HTML element in the HTML5 draft standard, with a corresponding DOM interface.

Elliotte Rusty Harold did a writeup on datagrid for the IBM developerWorks site a couple years ago, describing it in these terms:

What distinguishes [datagrid] from a regular table is that the user can select rows, columns, and cells; collapse rows, columns, and cells; edit cells; delete rows, columns, and cells; sort the grid; and otherwise interact with the data directly in the browser on the client.

The <datagrid> spec has been updated since the time when that article was published, but at a high-level, the feature remains the pretty much the same as in that description quoted above.

It’s great to see a browser project finally starting to implement <datagrid>, because it’s a great feature that I think a lot of Web authors and Web developers are going to be very glad to have.

by Michael(tm)Smith (mike@w3.org) at June 21, 2009 01:53 PM

June 20, 2009

Ivan's Blog

SemTech2009 impressions (addendum)


I wrote a blog yesterday on my SemTech impressions; I realized this morning that I forgot to add an item although I intended to.  Peter Deitz did indeed a presentation on a site called “social actions”: essentially a specialized index and search engine on various social, non-governmental actions around the World that one might want to join, contribute to, etc.  (Eg, the search on climate change will point you to a number of corresponding actions aroud the globe.) The interesting aspect, from the Semantic Web point of view, is that Peter would like to integrate the data, the access, etc, to the rest of the SW, essentially to the LOD (although he did not use this term), but he needs (and asks for) help from the community. Beyond the clear value of this particular dataset this is becoming a pattern (the NYT example in my blog yesterday is similar): people realize the value of publishing their data in a Linked Data format, but it is difficult to make the first steps. Even more tutorials, descriptions, and mainly community help is needed. That is essential for the success of Linked Data!

Posted in Semantic Web, Work Related Tagged: Linked Data, non-governmental organizations, Semantic Web, semtech2009

by Ivan Herman at June 20, 2009 02:09 PM

June 19, 2009

Ivan's Blog

SemTech2009 impressions


The first and possibly most important aspect of SemTech 2009 is that… it happened! I must admit that back in April-May, when the conference’s Web Site did not include any news of the program yet, I was a bit concerned that the general economic malaise would kill this year’s conference. O.k., I might have been paranoiac, but I think some level of concern was indeed legitimate. And… not only did the conference happen as planned, but the numbers were essentially the same as last year’s (over 1000). I think that by itself is an important sign of the interest in Semantic Technologies. Kudos to the organizers!

A general trend that was reaffirmed this year: by now, Semantic Web technologies are the obvious reference points for almost all presentations, products, etc, that were presented at the event. RDF(S), RDFa, OWL, SPARQL, etc, have become household names; newer specs like SKOS or POWDER may not have been as widely referred to yet, but I am sure that will come, too. Linked Data (and, more specifically, the Linked Open Data cloud) were almost ubiquitous this year while I do not believe that it was even mentioned last year. That is a huge change (although I still miss real “user facing” applications of LOD to show up; some, like Talis’ system deployed at UK universities, were presented but not as part of the regular conference). All that being said, I somehow seem to have missed more sessions than last year, which make my impressions more patchy. There were several journal interviews that I could not escape, hallway discussions that were great but made me miss a presentation here and there… I guess this is what happens when you have such a number of people around!

Tom Tague (from Open Calais) gave a very nice opening keynote. His talk was actually not on Open Calais (he did that in 2008), but rather on his experience in talking to different people who tried to start up new ventures in the Semantic Web area (a quote from his talk: “in 80% of the discussions I did not understand what the vendors wanted, and I walked away with my cheque book intact… Simplify!”). The main areas that he looked at were tools, social, advertising, search, publishing, user interface. One of the remarks I liked was on search: in his view (and I think I agree with that) Semantic Technologies may not be really interesting for general search (where the statistical, i.e., brute force methods work well) but for specialized, area-specific search tools (things like GoPubMed or applications deployed at, eg, Eli Lilly or experimented with at Elsevier come to my mind as good examples). Similarly, these technologies are not necessarily of interest for general, “robotic” publication tools like Google’s news, but for high quality publishing, with possible editorial oversight (reducing costs and difficulties).

(He also had a nice text on one of his slides: “Web2.0: Take Web 1.0, add a liberal dash of social, generous amounts of user generated content, atomize your content assets and stir until fully confused”:-)

Tom Gruber talked about his newest project: SIRI. A super-duper personal assistant running on an iPhone with conversational (voice directed) interface. The group behind it integrates a bunch of info on the Web (the “usual” stuffs like restaurants and travel sites), categorize them, and hide the complexities behind a sexy user interface. The problem I have is that I just do not see how this would scale. I see one of the major promises of the Semantic Web getting data in RDF out there so that such, essentially mash-up applications would become much easier to create and maintain. Until then, it is really tedious… On a more personal note, I am not sure I would like the voice conversational interface. I know that I have never used the voice commands on my phone for example; I do not feel comfortable with it. But, well, that is probably only me…

Chime Ogbuji made a really nice presentation on the system they have developed at the Cleveland Clinic. Great combination of RDF, OWL, and SPARQL. The interesting aspect (for me) was that usage of a medical expert system called Cyc, which is used to convert the doctor’s question in natural language (insofar as a question full of medical jargon can be considered as “natural”:-) into, essentially, a SPARQL query. The medical ontologies are used to direct this conversion process, and then the triple store could be queried through the generated query. Impressive work. (Part of it was documented in a W3C use case, but this presentation had a different emphasis.)

Unfortunately, I had to skip Peter Mika’s presentation on the SearchMonkey experiences, I will have to look at his slides… But, as a last minute addition to the program, the organizers succeeded in getting Othar Hansson and Kavi Goel to talk about Google’s rich sniplets. I have already blogged on this a few weeks ago but this presentation made the goal of the project way more understandable. Essentially, by recognizing specific microformat or RDFa vocabularies, they can improve the user experience by adding extra information on the search result. It is interesting to observe the difference between Yahoo! and Google in this respect: both of them use microformats/RDFa for the same general goal but, whereas Yahoo! relies on the community providing applications and on users personalizing their own search result page, Google controls the output in a generic way that does not require further user actions. It will be interesting to see how these differences influence people’s usage patterns. There were some discussion on the Google’s choice on vocabularies; the presenters made it quite clear that they are perfectly happy using other vocabularies (eg, vCard or FOAF) if they become pervasive, and this is a discussion that Google plans to engage with the community. There is of course a chicken-and-egg issue there (if a vocabulary is known by Google, then it will be more widely used, too), and this is cleary an area to discuss further. But these are details. The very fact that both Yahoo! and Google look at microformats and RDFa is what counts! Who would have thought just about a year ago?

I was not particularl impressed by the Semantic Search panel. I had the impression that the participants did not really know what they should say and talk about:-(

Nice presentation by Jeffrey Smitz from Boeing on a system called SPARQL Server pages. Essentially: the user can use similar structures like, say, a PHP page, ie, a mixture of HTML tags and server “calls”, except that this “calls” refer to SPARQL queries against a triple store on the server. Their system also includes some rule based OWL reasoning on the server side, although I am not sure I got all the details. All in all, the system seemed a bit complex, but the general approach is interesting! And it is nice to see that a company like Boeing seems to make good use of RDF+OWL+SPARQL; it would be good to know more…

I missed Zepheira’s presentation on freemix which is a shame, but, well, it happens. But I did play with freemix before travelling to San Jose;  I called it “Exhibit for the masses”. And this, I think, is a fair characterization. David Huynh’s exhibit is a really nice tool, but it is not easy to use it. On the other hand, it took me about 2 minutes to make a visualization of a json data set I used for an exhibit page elsewhere…

Andraz Tori talked about Common tag, a small vocabulary that, for example, can be used when marking up texts with tags (something that engines like Zemanta or Open Calais do). Bringing the RDF and the tagging worlds together is really important; I am very curious how successful this initiative will be…

The keynote on the last day was from the New York Times (by Evan Sandhaus and Robert Larson). It was quite interesting to see how a reputable journal like the NYT has developed a tradition of indexing, abstracting, cataloging articles, how these are archived and searched. Impressive. It is also great that the NYT Annotated Corpus has been released to the Research community. I did not know about that and, I presume, this must be a great resource for a lot of people active in the are of, say, natural language processing. Finally they announced their intention to release their thesaurus in a Semantic Web format, to add a “blob” to the Linked Data Cloud. They still have to work out the details (and expect feedback from the community) and I would hope they would publish a SKOS thesaurus and might even annotate the news items on their web site using this thesaurus in RDFa. But something in this space will happen, that is for sure! Other reputable newspapers, like Le Monde, the Guardian, NRC Handelsblatt,  el Pais, will you follow?

I also had my share of talking: gave an intro tutorial to SW, gave an overview of what is happening at W3C (quite a lot this year, including the finalization of POWDER, OWL 2, and SKOS!) and participated at an OWL 2 panel (with Mike Smith, Zhe Wu, Deb McGuinnis, and Ian Horrocks). I was quite happy with the tutorial and the way the panel went; the audience for the talk could have been a bit larger. But, well…

It was a long week, long trips, not much sleep… but well worth it!

Posted in Semantic Web, Work Related Tagged: Google, Linked Data, OWL, POWDER, RDFa, Resource Description Framework, Semantic Web, semtech2009, SKOS, Yahoo

by Ivan Herman at June 19, 2009 09:53 PM