Points of Interest Working Group Teleconference -- 16 Mar 2011

<trackbot> Date: 16 March 2011

<mhausenblas> https://dev.deri.ie/confluence/download/attachments/9699349/using-uris-for-pois.pdf

<scribe> Scribe: Matt

F2F Registration

-> http://www.w3.org/2002/09/wbs/45386/POI-F2F-2011-1/ Registration form

Andy: Please register.

URIs as Identifiers

<ahill2> having trouble getting in, please give me public link

mhausenblas: I am the Linked Data guy here at DERI.

<ahill2> got it, thanks

mhausenblas: Working with Vinod on linked data cloud for AR and such.
... I tried to pull together why URIs are good for POIs. How one can use them, what the challenges and short comings are.
... Slide 2
... You get a globally unique identifier, which could be a URN or anything, but with URIs you get the social dimension. Through DNS there's an authority of who owns it.
... If you talk about http URIs you also have an expectation that you can do a GET on it, and there will be content and metadata about it.
... Using URIs to identify things is probably more important, but in terms of using it on the Web, using HTTP URIs makes the most sense.
... In the wild there are RESTful web services, where URIs are nouns, HTTP methods (GET/PUT etc) are verbs.
... AJAX/in-browser, but there's also ?, # and #! patterns to beware of.
... In the linked data world there is an expectation that when a GET is done RDF is returned.
... Slide 4
... ...
... You can see that this is mixing multiple data sets. If you use URIs you can make relationships between different data sets.
... Slide 5
... What's out there that's deployed?
... These are the four main things that I think one can do.
... http://linkedgeodata.org/OnlineAccess uses OSM underneath

mhaunsenblas: http://geosparql.appspot.com/ You can see the recent queries done on it. You can query on location, or some code or a feature, and query geo annotated database.

mhausenblas: http://data.ordnancesurvey.co.uk/ they've provided a very comprehensive set of data.
... http://opendatamap.ecs.soton.ac.uk/ is backed by linked data
... I encourage you to play around with these to get a feeling for what can be done, given the data there.
... Slide 6
... There's already the geo vocabulary at w3c, but then there are things like FOAF's based_near property.
... This is something I'd expect to be a best practice, rather than trying to define a vocabulary there.
... Encourage use of other vocabularies rather than trying to come up with a new one.
... For well-known entities you might want to reuse URIs from an authority, like w3c or wikipedia.
... Then there are user-defined entities, which would be good to express as a URI, but typically you're more constrained by what your application can do.
... I am chairing the rdb2rdf WG, and there we have a similar problem: well known entities like a person or something, and you need to find a way to reuse identifiers rather than establishing your own.
... Slide 7
... You typically have a problem of mapping the entry point to uris.
... If I have a term and want to find a URI for it, then I need to have a translation into URI services. In geocoded stuff, typically there's a lat/long lookup. There are some dedicated lookup services, and some general purpose services.
... Don't just put URIs in your own namespace, but provide hooks into other namespaces that you want to link.
... That's what I put together based on Vinod's input.

ahill2: Thank you very much!
... We were mostly talking about using URIs as identifiers, but I think you were pitching linked data at the same time, is that fair?

mhausenblas: Yes.
... I am a linked data guy, so I tell the story from the linked data perspective, but the most important thing is URIs, and as long as we're talking about the Web ecosystem, then HTTP URIs.
... Use URIs, HTTP URIs is the main message. Leverage the existing ecosystem around it.

ahill2: Thanks again.
... I'm sold, but why not? What are the negatives of this approach?
... I think we know HTTP URIs are heavier, but are there other negatives?

mhausenblas: The effort there is more of a social thing I think. If you look at the Web and the documents there, in order to make sure you can find out things identified by a URI, you have two main approaches: black/white lists (e.g. I trust this domain or gTLD), or you follow your nose (e.g. dereference the URI and see if it fits what you need). This is something that depends on the deployment base.

<ahill2> wow, what sribbing!

mhausenblas: It really depends on what kind of application one has in mind.

ahill2: This is starting to make sense. I think a lot of us have in mind is a POI system that this could be a backbone to a tremendous amount of information.
... URIs have this problem of identifying the authority of a URI, but the solution is the same as the problem. We'll get the benefit of social fixes to http URIs as well.

mhausenblas: You mentioned payload size, I'm not sure I understand.

<Zakim> matt, you wanted to talk about URI base and shortnames

matt: I think we have a bit of a problem understanding how base URIs and short names work.

<mhausenblas> Michael: QNames and CURIEs

mhausenblas: There are certain conventions to breakup a URI into a prefix part and a 'local' part. And you essentially find some convention to find what the prefix part really means.

<mhausenblas> http://prefix.cc/

mhausenblas: I won't really address the pros and cons, but there are technical solutions for that if the payload is an issue.
... There are services like prefix.cc where at the schema level you can essentially have a mapping between a prefix and a URI for schemas, but you could imagine that at the data level too.
... If you are worried about memory size, you can look it up if you have a quick service, but if you need quick, you can encode the whole thing in place. It's a classical comp sci tradeoff.

matt: It's easy to get lost if you try to follow your nose on this, you have a hard time figuring out what the best practices are.

Raj: I support the URI idea, but I'd also like to throw in a pitch for URIs without semantic meaning, and support the old traditional database notion of not having meaning in the ID because you may want to change the meaning, or come up with a better scheme, but you do not want to change the ID.

mhausenblas: I am in agreement here. URIs per RFC are opaque -- of course there are social hacks around this, so you see there's a certain structure, and you can do that kind of bottom-up reverse engineering and reconstruct it, but I agree that they should be opaque.
... There are some good reasons, like human debugging, but the URIs should be opaque so meaning can change. I'd say there are probably half/half, unique ID that is a string of numbers or something, and the other half are hacks where you can get a sense of the structure from them.
... Opaque and globally unique identifiers with a deployed and well understood ecosystem with knowing who can create URIs within are the keys.

Raj: I agree there.

<ahill2> +q

ahill2: I'm leaning towards jumping on the linked data bandwagon. Can you help those of us who think in terms of XML and XML schema, can you tell us what our responsibilities are for a linked data standard for POI?

<mhausenblas> http://oreilly.com/catalog/9780596529260

mhausenblas: Creating a good solid URI space, I found Richardson and Ruby's book on web services to be very useful.
... I think 80% of the design work goes into what kind of entities do I want to represent which could be exposed or talked about. Essentially the more you think about the URIs, the better.
... If you have the vocabulary then depending on the datasource there are different ways to get that into RDF.
... Then there's the interlinking, but the majority is about thinking hard about the URI design.
... And even that is independent of linked data. If you don't use linked data, fine, but you'll still have data that you can find with URIs
... We're currently working on RDB2RDF, but the more unstructured the data, the more difficult it is to come up with the data.
... The more standardized and structured the data the easier it is to come up with RDF>
... The interlinking part is a bit of a research challenge, why should people do that? It's not understood yet.
... You might have heard about Facebook OpenGraph and Yahoo and Google supporting rich snippets but the interlinkings of why and how you link between data sets is currently still being worked on, with good tools, good story and good incentives.

ahill2: That sort of answers my question, but my question was more about the responsibilities of what we create?
... Can we just make the linked data model, or do we have other responsibilities?

mhausenblas: I think it's way beyond a single working group, but there should be mechanisms for saying "here is your input" and "here is the output". You could define a set of standardized interface for how you go from an input to a URI, but leave it up to others to figure out how.

ahill2: We have a lot of concern about practical use of POIs. What should we be concerned about?
... Does linked data hamper the ability to implement? It sounds great to researchers, but do you hear these kind of complaints?

mhausenblas: Looking at data.gov.uk, there's a lot of data out there. It's a top-down approach in one hand, but in another hand it's bottom up it's the agencies themselves that setup the data.
... The data is out there as XML, JSON, etc. Web developers have issues with RDF and SPARQL, they want something very simple, data base APIs. So there's the linked ?? API, that turns a SPARQL query into a RESTful interface.

matt: I love the idea of store in RDF with rich links, query using SPARQL/geoSPARQL, make restful APIs to deliver JSON out of that.

mhausenblas: There were similar issues with HTML, why would you produce in the 90's a document in HTML? It's not as rich as whatever else, but it's the power of the link. You can follow a link and keep getting more data.
... There's a chicken/egg problem there, why put data out there if no one else does? But you get the network effect eventually.

ahill2: Excellent! I really appreciate it. Answers a lot of questions for me.
... It's the underlying data that allows us to make better connections, but that doesn't mean that it can't be delivered in other, less heavy ways.
... We're considering POIs as a data exchange, and so in that sense you have to have linked data in the database behind the scenes, but then it's a bit of a different subject for those who consume it.
... Are we interested in the consumer end of it or the interchange of it or something more powerful like linked data.

<mhausenblas> michael.hausenblas@deri.org

mhausenblas: I'm happy to provide answers to questions, or more references, please contact me or Vinod.

Andy: Thanks for joining, I think we learned a lot.

matt: And thanks Alex for asking so many questions, I think that helped nail down a lot of the issues we all have.

- DRAFT -

Points of Interest Working Group Teleconference
16 Mar 2011

Attendees

Contents

F2F Registration

URIs as Identifiers

Summary of Action Items

Scribe.perl diagnostic output

- DRAFT -

Points of Interest Working Group Teleconference 16 Mar 2011

Attendees

Contents

F2F Registration

URIs as Identifiers

Summary of Action Items

Scribe.perl diagnostic output

Points of Interest Working Group Teleconference
16 Mar 2011