Points of Interest Working Group Teleconference -- 08 Sep 2011

<trackbot> Date: 08 September 2011

<robman> hey matt

<robman> great thanks...you?

<robman> sure...lets talk later

<cperey> 848.aaaa is Christine

<cperey> called BuildAR

<scribe> Scribe: Matt

Identifying and Categorizing POIs, presented by Henning Schulzrinne

-> http://lists.w3.org/Archives/Member/member-poiwg/2011Sep/att-0005/poi-urn.pdf Slides

Henning: pulled some use cases out of the draf
... The ones I pulled out seemed to be about finding categories of things.
... How can we divide the millions of POIs into manageable categories for searching?
... Some problems we won't be solving:
... Properties of POIs vs category, e.g. "restaurant that takes credit cards" isn't a category.

<cperey> restaurant a favorite category :-)

Henning: The distinction between what is or isn't a category is somewhat arbitrary. You can claim that "Italian Restaurant" could be a category, or it could be a cuisine attribute of a restaurant.
... It's fuzzy, and one has to make pragmatic choices about what user's typically expect.
... The two characteristics I see defining categories is: 1. ?? and 2. that categories are not interchangeable
... A "gas station" and "restaurant" is not interchangeable, but even this gets tricky, "synagogue" and "Christian Church".
... North American Industry Classification System (NAICS) is one standard, the only one I've noticed. Based on Census.
... Very much comes out of industry, designed for classical manufacturing type industries, i.e. "this establishment produces cutlery"
... It struggles today with identifying services.
... While it's a fine example of categories, it isn't what we'd want to use though. Restaurants for instance have just two classifications: full-service and limited service.
... That may be okay, but is somewhat limiting, and not really what I'd expect users to care about.
... Great from a statistical perspective though.
... Many of things you'd want to look up, aren't in the system at all
... I tried some things that are common from GPS POIs and they don't appear at all in NAICS, e.g. ATMs, wifi hotspots, monuments, etc.
... One alternative is to say we've got Google, just use free-text. That works, and is probably better than many alternatives, but free text is also hard to translate into other languages, and the same service has many names.
... e.g. ATM, cash machine, automated teller machine, etc
... Then there are also things like distinguishing between a diner, a café, coffee shop, etc. McDonalds calls itself a restaurant, but most of us would think of it as another term.

<cperey> McDonalds =? Restaurant?

Henning: So you might get lots of things that you wouldn't expect showing up on a list.
... There's also properties, such as "public", "university", of library.
... and hierarchy is missing too: French vs French Restaurant.
... Another option is map overlay labels. These are usually used to label topographical features and not services, like restaurants, ATMs.
... GPS POIs are much more consumer applicable.
... As far as I can, there isn't any standardization for POI labels, at best informally standardized. I haven't confirmed with every vendor though.
... Some of the categories are a bit odd too, sometimes very broad: all government services labeled as one, even if it's everything from libraries to prisons, but sometimes libraries are separate etc. It's inconsistent and not clear if it's official or just made up.
... And lastly, the Yellow Pages labels.
... I've heard every region does there own thing and often picked in a way that businesses would appear in as many places as possible, rather than where users might expect them.
... Yellow Pages wouldn't contain bathrooms and ATMs or other things without a phone number.
... Coming from another direction, we had a similar problem when we started redesigning the emergency calling system in the US.
... One of the big problems is that every country pretty much has, for historical reasons, used a different digit pattern.
... Confusing scenario where in some cases a number is used generically for all services, i.e. 911 (excepting poison control), but other countries have different numbers for police, fire, ambulance, etc. So, there was no hope of a standard for a number identifier.
... We started on something different, RFC 5031. It's a URN for services. urn:service:sos
... They're extensible via IANA.

-> http://tools.ietf.org/html/rfc5031 RFC 5031

Henning: These are internal to the system, used for call routing.
... Allows devices to have an entry that gets to the right people.
... NG911 and NG122 have been moving in this direction. 911 it seems to be accepted and getting deployment.
... We determined that this was extensible to other services too.
... N11 services -- always reserved 3 digit numbers other than 0 and 1 and end in 11, e.g. 211, 311

<ahill2> 611 is used by the cell company for their information

Henning: In that same spirit, we explored extending it to non-communication services.
... Designed for consumer use, things that we can label, e.g. "food", "fuel", "business", "communication" (wifi hotspots, internet café). Design not to have thousands of categories, but still similar to a GPS POI finder.
... 13 top level categories that then have further detail within.
... e.g. transportation.airport
... So, remaining issues and to-dos:
... Are there systems out there already? Can we extend it without breaking? Maintained in some way that there is coherence in the labeling? If not, and we go forward with the URN model, would we register these things? With IANA?
... IANA would be just a database, would we sub-delegate that?
... How would maintenance be done on that?
... How is it maintained? I'm partial to something like the Olson time zone database model.
... Not an official group, or a government thing. It used to just be one person, but now there's a mailing list with consensus process.
... It will be maintained on a long term basis through IANA.

-> http://tools.ietf.org/html/draft-lear-iana-timezone-database-02.html IANA Procedures for Maintaining the Timezone Database

Henning: So that's my brief summary of what I'm trying to do.
... So two things: is there obviously related work that someone else has done? Or if not is there a group of people that might be interested in forming a nucleus of an organization to do this?
... If people take it up, yes, great, if not the harm is fairly limited.

robman: Two questions: how would URNs support non-English?

Henning: Are you familiar with IETF i18n document?
... This falls in the category of protocol label, not meant for human consumption. Each label would be called something different in each language, but as a protocol label for routing and querying is that you should stick to one language.
... At the IETF, it's been English. There's nothing inherently in the labels to prevent other languages, modulo the i18n URN problems.

robman: Because it's much more into the meaning than normal labels, it'd have labels that mean things just in certain cultures.

Henning: Right, if there was a category that doesn't have a good label in English, it's still plausible. It's not intended to preclude anything, but with it being a protocol label there's less concern over it at the moment.

robman: This is very top down. What if people could create random labels? And if they're not used, they'll die off or be redundant anyway?

Henning: There's likely not going to be a way to solve the language labeling problem. I'm not opposed to the free text model. Because of the need for i18n, and because it's not used as a search term, that I have some doubt that free text will be successful.

[[I was just thinking of schema.org given this conversation, and now I see: http://www.schema.org/Place]]

Henning: I'm not thinking the urn would be typed into a search for instance.
... Right now we don't have network databases, but these static ones that vendors maintain.

robman: I think the international bridging point you made is quite a good argument for it.

ahill2: How do you see this translation between labels and URNs happening?
... I can imagine a world where Google says "these are the categorizations" and translate them to URNs, but for an ordinary individual, what services are available to them to translate their terms and searches into these categories?

Henning: I imagine the software built by say a tourist app or a GPS vendor, would in turn, depending on the appropriate interface, would build some subset of these labels into their system and translate and expose them appropriately.

ahill2: You think this translation between common labels and URNs happens ad-hoc? No central database of any sort?

Henning: I hadn't thought that far ahead, but if we were more ambitious, there would be a description for each term, geared towards localization.
... Nothing prevents that if we get enough people together.

ahill2: Isn't that what is happening today with Google? Isn't it the translator today?
... If this knowledge is being crowd sourced somewhere, Google or OSM, we should use that.

Henning: I was unimpressed by OSM process.

ahill2: I'm envisioning a search where these urns are available, e.g. every result has a category. That would be neat, because then the search would be the translator between free text and labels and urns.

Henning: If someone could do that I'd be delighted.

<robman> http://en.wikipedia.org/wiki/Web_Ontology_Language

robman: What about these taxonomy communities that are working in their own domains, like OWL.
... It's not just locations we're talking about, but we're cross domain here.

Henning: Yes, that's part of why I was asking for pointers to these communities. Once we get into the ontology side more, that would be helpful.
... I'm unsure that the type of work, if it's property attributes, etc, if it's directly applicable, or if sub-pieces of that can be pulled out. We don't, from a POI perspective, want a complete ontology that crosses categories and properties.
... If there are communities we should know about, please let me know. We looked about a year ago into this, building a system than could combine ontologies, e.g. find a specific movie and dinner with a cuisine type. We didn't find anything then, but we might have looked in the wrong place.

ahill2: Can we remind ourselves of some of the other categorization efforts that we discussed. I believe Library of Congress was discussed and a number of them had URLs involved.

Henning: Any pointers you have, please pass along. One difference between identifying specific objects and categorization is one-to-one vs one-to-many.

Raj: Geonames

ahill2: Does that do categorization?

rsingh2: Yes, but they're just categorizing places, not business classifications.
... They started with USGS classification system, but theirs is much smaller problem than ours.

Henning: Looking at geonames, they've got postal codes as the lowest level I see.
... School, post office, cemetery, etc. Not sure how many features they have.

rsingh2: That's right out of USGS.

<karls> hi

ahill2: What did we propose to use?

Henning: The doc is all I know is from the doc NAICS.

rsingh2: I think coming up with a single country classification scheme is easy, but what's harder is a POI system like for AT&T, where they want you to search for say where to get phone cards.
... That's at one type of business in the USA, but another type in other countries.
... Reconciling that between countries is very difficult.

karls: There's a ton of work on brand binding and chain binding to help that work.
... That side-steps classification though.
... Using NAICS is mostly for information exchange, most of the time these are hand tuned by the app devs. Many schemes that are app specific.
... The low-level standards are just used for hand-off so people can do mappings.

Henning: That's what I've seen as well.

karls: It's useful to carry around NAICS codes in terms of the spec, as our spec is about exchanging information, but in terms of customer facing stuff, it's pretty open ended. Our model should be we'll support the structure, but you make it up.

rsingh2: My instinct is similar, we're not ready to tackle that in version 1.
... We might be overstepping the bounds of what innovative developers would build.

karls: Typically these systems were done for handhelds, or constrained environments. I think search trumps all though.
... The conversation at Microsoft/Nokia/NavTEQ is do we care about categorization anymore?

Henning: The search experience is good from large providers, but it requires a fair amount of user skill to get what you want. Looking at Restaurant, you have two things like Google maps, but also specific ones like Urban Spoon.
... There's more relevant hits in the latter.

karls: Here's what I see: one end there are proprietary category systems, on the other there's web page crawling for open ended search.
... In between, you've got a lot of POI gazetteers who are doing meta tagging, as it facilitates parametric search.
... The middle ground is the tagging. I thought the spec addressed that capability to open endedly do the metatagging.

ahill2: Can you elucidate on that a bit more karls?

karls: Take a service like Open Table, where they have restaurant categories and sub-categories.

<robman> +1 to link based structured data 8)

karls: You're not going to get that information out of scraping a web page. That information is best consumed by an application by OT if a POI has a pre-set, open-ended list of terms that describe it well. It's tantamount to the meta tag on HTML pages.
... Gazetteers are doing field ops, web scraping, crowd sourcing, etc, to distill down to ten or twenty keywords that are the most descriptive to put in the POI.

<rsingh2> parametric search = faceted search

<rsingh2> http://en.wikipedia.org/wiki/Faceted_search

karls: Typically the app tier puts a parametric search on top of that: hours, beer, etc.

ahill2: We're talking about somewhere between category only search and free text.

karls: You could argue that it's all categories or parameters, e.g. 24 hour restaurant could be a category or a property.

rsingh2: The popular term would be faceted search.

Henning: Close, but not quite, you might have things like types of credit cards accepted, and it might be labels drawn from a set, or specific information that isn't categorized: e.g. open hours.

<rsingh2> I'm late for another call. Bye all.

robman: That's why we were thinking open ended links, because it is so closely tied to the users' mind space when they search.
... If we approach it as a categorization problem we have to approach it differently.

Henning: I think I differ on that. If you look at OT, they do do categorization, they do much better than just crowd source tagging.

karls: I think what we want to do is to be able to have OT exchange their POIs outside their business sphere.
... So, we want to make sure the spec can support rich and proprietary tagging, without defining the facets ourselves.

Henning: Why not some of the facets? I think I've demonstrated that some are viable.

ahill2: One of the things we've been careful about is making sure that there are multiple categorizations that could apply to a POI.

Henning: It could have multiple category schemes too.

ahill2: In your proposal, are you open to the idea that NAICS ends up adding some of these categorizations that are facets as opposed to routing to a specific business?
... That is: if there were a number of different categorizaties that a business has, would NAICS be the appropriate place to build up a category?

Henning: I'm not part of NAICS, but given that they're part of the census, I imagine they wouldn't be looking at these properties. I can't say what they should do, but my perception is that their mission is industry classification statistics.
... eg. how many people work in fast food restaurants, rather than say what credit cards they take.

karls: They're also missing juicy POIs, like golf courses, transit stops, etc.

Henning: Yes, so far it seems outside their mission of what they're doing.

ahill2: Sorry, I think I asked the question wrong. In your URN proposal, would you see those categories, which are facets outside of a category being appropriate, e.g. hours, or all the way down to the kind of information from crowd sourcing.
... Where do you draw the line?

Henning: A URN to my mind is not as suitable for these non-categorization models. You've identified some binary things, but many are not easily represented in the same fashion. That said, we have separately, and I didn't talk about it here as it's preliminary, in the system we built, that has the ability to retrieve an XML type document with suitable tags that have that information.
... We could envision that being useful for us to agree on labeling to enable exchange.

<ahill2> thanks, that answers my question

Henning: There's an opportunity there, didn't discuss it here, and it's to some extent orthogonal, but there's a need for that as well, maybe industry specific bodies, which might be in a position to do that more appropriately.
... I look forward to the mailing list conversation.

cperey: As for next steps, Matt will publish the minutes of the meeting. It's almost a transcript.
... He publishes that as a URL, it becomes archives for the group. That gets it out to a larger audience, but after that it's kind of up to this group. We're having our F2F in two weeks.
... We should work on this at the F2F and followup with actions from that.

Henning: There's no dependency here, so that's fine.
... Right now, I don't even see it as appropriate to include it in the doc, as it's not specific to this effort. But, I would like to look for a community of interest to take it to the next level of specificity.
... I'm not asking the WG to take on this particular task, it's probably outside the immediate scope.

matt: Could be a CG perhaps? POI WG decided not to do this.
... Thank you!

Henning: Thanks, and thanks to Christine for arranging this.

- DRAFT -

Points of Interest Working Group Teleconference

08 Sep 2011

Attendees

Contents

Identifying and Categorizing POIs, presented by Henning Schulzrinne

Summary of Action Items

Scribe.perl diagnostic output