Well, I have 10 minutes. I can explain maybe one maybe two things. I think I will use this time to answer the questions about linked data online, perhaps from a government point of view. I think the benefit of open data , and of open government data specifically has been well elaborated elsewhere, by activist groups, geeks, and heads of state. So I'm going to answer a few questions about specifically *linked* data.
Hands up poll
The Star Rating
I've been telling people that when someone puts data online, you should give them up to 5 stars as follows:
The first star, is for making the data public at all. This is the biggest star. It is the biggest because often that is just the biggest challenge, to actually get over the social constraints around to actually do it. but I have 10 minutes so I won't go over the million reasons why people like to hold on to their data, or the million reasons why actually they should and can release it anyway. But you get one big star even if you release the data at all, even if it is a PDF of scan of a fax of a document which has the data in.
The second star you get if you release the data in a machine processable way. A Microsoft Excel file, perhaps. There are a lot of those on the web. When the file is machine-processable it saves me having to retype it into a spreadsheet to analyse it.
The third star you get if the data is in a non-proprietary format. Open standards are important -- you know that. You know you shouldn't force your readers to buy some proprietary program to read your data. Especially readers in 1000 years time, when that particular copy of that package may have vanished without trace, but the open standards of this era are understood by all good librarians. So you get 3 stars for putting out data in a non-proprietary format, and I's include Comma Separated Values (CSV) in that even though its not a W3C standard, but XML is good too.
The fourth star you get if you publish it in the Linked Data format. You put it out in one of the RDF formats. Now, it can imported directly into linked data tools, browsed with linked data browsers, and so on, and amalgamated with other linked data. When you put it out there in linked Data, every things which you r dat ais about get a URI. a URI in yout space. YEs, something starting with http:
So getting that fourth star can be When @@
Why linked data?
Well, there is something different about the way linked data works. Of course its linked That works because we use URLs as identifiers not just for web pages but also for abstract objects like appointments and real things like cars. But the most important thing, the one thing I want to explain today, is that any RDF data is generally in a mixture of different vocabularies.
Linked data uses the RDF standard. So RDF is a data model mainly, and a very simple, one simpler than XML. And its a set of a few syntaxes for transmiting that data across eth web So any system can read eth RDF data and see it as a sel of relations between different things and properties of things.
So doesn't that just leave you wit the same problem that every otehr system does, XML-based or object oriented, that you still can't commuicate as you still don't necessarily understand what the other person's data is about?
Well, yes and no.
The Linked data is different. When you send linked data, you send a series of individual relatioships between things and each relationship in the same message can be taken from a different vocabulary. Some linked datasest do just use terms from one vocabulary, but that's not typical and and its not what gives it is its power. and thats not what makes it work socially.
What language is it in? English? Well, there is some plain english, American english on the front - it says "Potato Chips". That should be understood by most people in America, even if Brits would call them crisps. But look a the the other parts.
There is a whole panel on the Nutrition Facts.
Now that is much more of a controlled vocabulary.
It is understood by people and the food industry all over the USA.
The FDA
Compare that with, say, the bar code. The UPC code is basically a sungle number, which is recognized by the retail industry worldwide. You can scan that bar code in any retail shop, not just a food store, not just in the USA. So this is a different langauge, spokem by a different wider community.
Now here is allergy information. Many of you won't generally bother with that at all, and you may not even understand the terms in it. Some of you with allergies may have to read it every time you buy something. A different language, a different community. The food industry and people with allergies.
Which language are these? The "sell by" date? The mysterious "U" in a circle?
Now down here at the bottom of the label is an interesting string "110#7140(A)". Maybe that is a batch number, either of the chips, or maybe it is a metadata about the bag itself, the printer's own record of the printing of a set of bags.
Whatever it is, I don't understand what it means. And that doesn't stop me buying the chips. I don't say, "Hang on, I can't eat these, I don't understand what 110#7140(A) means".
So it is with RDF, An RDF message is made up of bits in different vocabularies, and when you ignore the bits you don't understand you still have useful data.
When you make a new XML application, you make a schema, in this case you would make a schema for all the stuff on the packet.
When you make a Linked Data application, you cherry-pick existing terms from the different communities you are a part of.
The US Government. Government agencies globally. The defense sector. The first responder community. The community of people communicating about border crossing violations. Threatened species. And so on. Some big, some small.
Some will be narrow communities of interest, but global. Some will be local to town or state, or national.
Increasingly, you will find those sets of terms (ontologies we call them) already exist. Sometimes, you will need to get together with people from that community and nail down a new ontology.
Often,
Sometimes people ask, "Isn't it going to be be just too much work to make all these ontologies"?
Well, the point is that you do your bit, and everyone else does theirs. Just like lots of life Most of the time, you are using ontologies which already exist. Some of the time you are involved in making a new one. Overall, a lot of interoperability comes from a very finite amount of work,
People ask, how ever are you going to get everyone to agree on
what all the terms mean?
The crucial thin is, they don't.
Most terms most people don't understand.
It is not a top-down taxonomy of everything.
It is weblike
Its bottom up.
Its small peices loosely joined.
It is is as I sometimes say, a big fractal tangle.
It has big communities and little communities.
Connected in whatever way life in all its wonder connects us as we work to do our jobs.
Pursue our own passions.
It scales better than anything ever before.
Let's look at the life cycle of some data. You make up some Linked data. It is easy. You cherry-pick some ontologies and invent others. If the FDA hasn't made an ontology for Nutrition facts, and you can't wait for them, you make one up. If later you find they have, that's OK. You link them. Two people can renvent hte same thing, and afterward, retrospectively, someone , anyone, can notice they are the same and link them, and it all works.
That's anotehr crucial way in which linekd data removed huge tensions from the application development world, and lets us get on and do more. More interop for less work.
So you generate you linke data. You map it from existing databases using mapping tools. You map it from CSV files using scripts. You fix the PHP scripts which generate HTML to generate parallel RDF pages. You maybe fix them to use the RDFa standard to embed the RDF in HTML.
Either way, your Linked data is out there.
People reads it in. They link their own data to it. Consumers browse yoru data. They build aps which use it directly.
Some trawl all your linked data, and throw it together with a lot of other linked data from sorces they trust into a big singe put. You can just do that with Linked Data, unlike with The information from different sources about the same thing is combined, to allow new insights, new mashups, new apps.
So
W3C eGov, and Semantic Web Interest Group
PANEL AT 1PM
(*If I was transmitting it in RDF, I would not make those terms. I imagine that the Food and Drug Administration defines those terms. I would ask them to publish the terms (Servings per packet, calories per serving) etc as RDF. If they didn't I'd dig around on the FDA website and try to find list of the terms, or an XML schema, or some other format and hack it into RDF and send it to them)