See also: IRC log
<trackbot> Date: 08 September 2011
<robman> hey matt
<robman> great thanks...you?
<robman> sure...lets talk later
<cperey> 848.aaaa is Christine
<cperey> called BuildAR
<scribe> Scribe: Matt
-> http://lists.w3.org/Archives/Member/member-poiwg/2011Sep/att-0005/poi-urn.pdf Slides
Henning: pulled some use cases
out of the draf
... The ones I pulled out seemed to be about finding categories
of things.
... How can we divide the millions of POIs into manageable
categories for searching?
... Some problems we won't be solving:
... Properties of POIs vs category, e.g. "restaurant that takes
credit cards" isn't a category.
<cperey> restaurant a favorite category :-)
Henning: The distinction between
what is or isn't a category is somewhat arbitrary. You can
claim that "Italian Restaurant" could be a category, or it
could be a cuisine attribute of a restaurant.
... It's fuzzy, and one has to make pragmatic choices about
what user's typically expect.
... The two characteristics I see defining categories is: 1. ??
and 2. that categories are not interchangeable
... A "gas station" and "restaurant" is not interchangeable,
but even this gets tricky, "synagogue" and "Christian
Church".
... North American Industry Classification System (NAICS) is
one standard, the only one I've noticed. Based on Census.
... Very much comes out of industry, designed for classical
manufacturing type industries, i.e. "this establishment
produces cutlery"
... It struggles today with identifying services.
... While it's a fine example of categories, it isn't what we'd
want to use though. Restaurants for instance have just two
classifications: full-service and limited service.
... That may be okay, but is somewhat limiting, and not really
what I'd expect users to care about.
... Great from a statistical perspective though.
... Many of things you'd want to look up, aren't in the system
at all
... I tried some things that are common from GPS POIs and they
don't appear at all in NAICS, e.g. ATMs, wifi hotspots,
monuments, etc.
... One alternative is to say we've got Google, just use
free-text. That works, and is probably better than many
alternatives, but free text is also hard to translate into
other languages, and the same service has many names.
... e.g. ATM, cash machine, automated teller machine, etc
... Then there are also things like distinguishing between a
diner, a café, coffee shop, etc. McDonalds calls itself a
restaurant, but most of us would think of it as another
term.
<cperey> McDonalds =? Restaurant?
Henning: So you might get lots of
things that you wouldn't expect showing up on a list.
... There's also properties, such as "public", "university", of
library.
... and hierarchy is missing too: French vs French
Restaurant.
... Another option is map overlay labels. These are usually
used to label topographical features and not services, like
restaurants, ATMs.
... GPS POIs are much more consumer applicable.
... As far as I can, there isn't any standardization for POI
labels, at best informally standardized. I haven't confirmed
with every vendor though.
... Some of the categories are a bit odd too, sometimes very
broad: all government services labeled as one, even if it's
everything from libraries to prisons, but sometimes libraries
are separate etc. It's inconsistent and not clear if it's
official or just made up.
... And lastly, the Yellow Pages labels.
... I've heard every region does there own thing and often
picked in a way that businesses would appear in as many places
as possible, rather than where users might expect them.
... Yellow Pages wouldn't contain bathrooms and ATMs or other
things without a phone number.
... Coming from another direction, we had a similar problem
when we started redesigning the emergency calling system in the
US.
... One of the big problems is that every country pretty much
has, for historical reasons, used a different digit
pattern.
... Confusing scenario where in some cases a number is used
generically for all services, i.e. 911 (excepting poison
control), but other countries have different numbers for
police, fire, ambulance, etc. So, there was no hope of a
standard for a number identifier.
... We started on something different, RFC 5031. It's a URN for
services. urn:service:sos
... They're extensible via IANA.
-> http://tools.ietf.org/html/rfc5031 RFC 5031
Henning: These are internal to
the system, used for call routing.
... Allows devices to have an entry that gets to the right
people.
... NG911 and NG122 have been moving in this direction. 911 it
seems to be accepted and getting deployment.
... We determined that this was extensible to other services
too.
... N11 services -- always reserved 3 digit numbers other than
0 and 1 and end in 11, e.g. 211, 311
<ahill2> 611 is used by the cell company for their information
Henning: In that same spirit, we
explored extending it to non-communication services.
... Designed for consumer use, things that we can label, e.g.
"food", "fuel", "business", "communication" (wifi hotspots,
internet café). Design not to have thousands of categories, but
still similar to a GPS POI finder.
... 13 top level categories that then have further detail
within.
... e.g. transportation.airport
... So, remaining issues and to-dos:
... Are there systems out there already? Can we extend it
without breaking? Maintained in some way that there is
coherence in the labeling? If not, and we go forward with the
URN model, would we register these things? With IANA?
... IANA would be just a database, would we sub-delegate
that?
... How would maintenance be done on that?
... How is it maintained? I'm partial to something like the
Olson time zone database model.
... Not an official group, or a government thing. It used to
just be one person, but now there's a mailing list with
consensus process.
... It will be maintained on a long term basis through
IANA.
-> http://tools.ietf.org/html/draft-lear-iana-timezone-database-02.html IANA Procedures for Maintaining the Timezone Database
Henning: So that's my brief
summary of what I'm trying to do.
... So two things: is there obviously related work that someone
else has done? Or if not is there a group of people that might
be interested in forming a nucleus of an organization to do
this?
... If people take it up, yes, great, if not the harm is fairly
limited.
robman: Two questions: how would URNs support non-English?
Henning: Are you familiar with
IETF i18n document?
... This falls in the category of protocol label, not meant for
human consumption. Each label would be called something
different in each language, but as a protocol label for routing
and querying is that you should stick to one language.
... At the IETF, it's been English. There's nothing inherently
in the labels to prevent other languages, modulo the i18n URN
problems.
robman: Because it's much more into the meaning than normal labels, it'd have labels that mean things just in certain cultures.
Henning: Right, if there was a category that doesn't have a good label in English, it's still plausible. It's not intended to preclude anything, but with it being a protocol label there's less concern over it at the moment.
robman: This is very top down. What if people could create random labels? And if they're not used, they'll die off or be redundant anyway?
Henning: There's likely not going to be a way to solve the language labeling problem. I'm not opposed to the free text model. Because of the need for i18n, and because it's not used as a search term, that I have some doubt that free text will be successful.
[[I was just thinking of schema.org given this conversation, and now I see: http://www.schema.org/Place]]
Henning: I'm not thinking the urn
would be typed into a search for instance.
... Right now we don't have network databases, but these static
ones that vendors maintain.
robman: I think the international bridging point you made is quite a good argument for it.
ahill2: How do you see this
translation between labels and URNs happening?
... I can imagine a world where Google says "these are the
categorizations" and translate them to URNs, but for an
ordinary individual, what services are available to them to
translate their terms and searches into these categories?
Henning: I imagine the software built by say a tourist app or a GPS vendor, would in turn, depending on the appropriate interface, would build some subset of these labels into their system and translate and expose them appropriately.
ahill2: You think this translation between common labels and URNs happens ad-hoc? No central database of any sort?
Henning: I hadn't thought that
far ahead, but if we were more ambitious, there would be a
description for each term, geared towards localization.
... Nothing prevents that if we get enough people together.
ahill2: Isn't that what is
happening today with Google? Isn't it the translator
today?
... If this knowledge is being crowd sourced somewhere, Google
or OSM, we should use that.
Henning: I was unimpressed by OSM process.
ahill2: I'm envisioning a search where these urns are available, e.g. every result has a category. That would be neat, because then the search would be the translator between free text and labels and urns.
Henning: If someone could do that I'd be delighted.
<robman> http://en.wikipedia.org/wiki/Web_Ontology_Language
robman: What about these taxonomy
communities that are working in their own domains, like
OWL.
... It's not just locations we're talking about, but we're
cross domain here.
Henning: Yes, that's part of why
I was asking for pointers to these communities. Once we get
into the ontology side more, that would be helpful.
... I'm unsure that the type of work, if it's property
attributes, etc, if it's directly applicable, or if sub-pieces
of that can be pulled out. We don't, from a POI perspective,
want a complete ontology that crosses categories and
properties.
... If there are communities we should know about, please let
me know. We looked about a year ago into this, building a
system than could combine ontologies, e.g. find a specific
movie and dinner with a cuisine type. We didn't find anything
then, but we might have looked in the wrong place.
ahill2: Can we remind ourselves of some of the other categorization efforts that we discussed. I believe Library of Congress was discussed and a number of them had URLs involved.
Henning: Any pointers you have, please pass along. One difference between identifying specific objects and categorization is one-to-one vs one-to-many.
Raj: Geonames
ahill2: Does that do categorization?
rsingh2: Yes, but they're just
categorizing places, not business classifications.
... They started with USGS classification system, but theirs is
much smaller problem than ours.
Henning: Looking at geonames,
they've got postal codes as the lowest level I see.
... School, post office, cemetery, etc. Not sure how many
features they have.
rsingh2: That's right out of USGS.
<karls> hi
ahill2: What did we propose to use?
Henning: The doc is all I know is from the doc NAICS.
rsingh2: I think coming up with a
single country classification scheme is easy, but what's harder
is a POI system like for AT&T, where they want you to
search for say where to get phone cards.
... That's at one type of business in the USA, but another type
in other countries.
... Reconciling that between countries is very difficult.
karls: There's a ton of work on
brand binding and chain binding to help that work.
... That side-steps classification though.
... Using NAICS is mostly for information exchange, most of the
time these are hand tuned by the app devs. Many schemes that
are app specific.
... The low-level standards are just used for hand-off so
people can do mappings.
Henning: That's what I've seen as well.
karls: It's useful to carry around NAICS codes in terms of the spec, as our spec is about exchanging information, but in terms of customer facing stuff, it's pretty open ended. Our model should be we'll support the structure, but you make it up.
rsingh2: My instinct is similar,
we're not ready to tackle that in version 1.
... We might be overstepping the bounds of what innovative
developers would build.
karls: Typically these systems
were done for handhelds, or constrained environments. I think
search trumps all though.
... The conversation at Microsoft/Nokia/NavTEQ is do we care
about categorization anymore?
Henning: The search experience is
good from large providers, but it requires a fair amount of
user skill to get what you want. Looking at Restaurant, you
have two things like Google maps, but also specific ones like
Urban Spoon.
... There's more relevant hits in the latter.
karls: Here's what I see: one end
there are proprietary category systems, on the other there's
web page crawling for open ended search.
... In between, you've got a lot of POI gazetteers who are
doing meta tagging, as it facilitates parametric search.
... The middle ground is the tagging. I thought the spec
addressed that capability to open endedly do the
metatagging.
ahill2: Can you elucidate on that a bit more karls?
karls: Take a service like Open Table, where they have restaurant categories and sub-categories.
<robman> +1 to link based structured data 8)
karls: You're not going to get
that information out of scraping a web page. That information
is best consumed by an application by OT if a POI has a
pre-set, open-ended list of terms that describe it well. It's
tantamount to the meta tag on HTML pages.
... Gazetteers are doing field ops, web scraping, crowd
sourcing, etc, to distill down to ten or twenty keywords that
are the most descriptive to put in the POI.
<rsingh2> parametric search = faceted search
<rsingh2> http://en.wikipedia.org/wiki/Faceted_search
karls: Typically the app tier puts a parametric search on top of that: hours, beer, etc.
ahill2: We're talking about somewhere between category only search and free text.
karls: You could argue that it's all categories or parameters, e.g. 24 hour restaurant could be a category or a property.
rsingh2: The popular term would be faceted search.
Henning: Close, but not quite, you might have things like types of credit cards accepted, and it might be labels drawn from a set, or specific information that isn't categorized: e.g. open hours.
<rsingh2> I'm late for another call. Bye all.
robman: That's why we were
thinking open ended links, because it is so closely tied to the
users' mind space when they search.
... If we approach it as a categorization problem we have to
approach it differently.
Henning: I think I differ on that. If you look at OT, they do do categorization, they do much better than just crowd source tagging.
karls: I think what we want to do
is to be able to have OT exchange their POIs outside their
business sphere.
... So, we want to make sure the spec can support rich and
proprietary tagging, without defining the facets ourselves.
Henning: Why not some of the facets? I think I've demonstrated that some are viable.
ahill2: One of the things we've been careful about is making sure that there are multiple categorizations that could apply to a POI.
Henning: It could have multiple category schemes too.
ahill2: In your proposal, are you
open to the idea that NAICS ends up adding some of these
categorizations that are facets as opposed to routing to a
specific business?
... That is: if there were a number of different categorizaties
that a business has, would NAICS be the appropriate place to
build up a category?
Henning: I'm not part of NAICS,
but given that they're part of the census, I imagine they
wouldn't be looking at these properties. I can't say what they
should do, but my perception is that their mission is industry
classification statistics.
... eg. how many people work in fast food restaurants, rather
than say what credit cards they take.
karls: They're also missing juicy POIs, like golf courses, transit stops, etc.
Henning: Yes, so far it seems outside their mission of what they're doing.
ahill2: Sorry, I think I asked
the question wrong. In your URN proposal, would you see those
categories, which are facets outside of a category being
appropriate, e.g. hours, or all the way down to the kind of
information from crowd sourcing.
... Where do you draw the line?
Henning: A URN to my mind is not
as suitable for these non-categorization models. You've
identified some binary things, but many are not easily
represented in the same fashion. That said, we have separately,
and I didn't talk about it here as it's preliminary, in the
system we built, that has the ability to retrieve an XML type
document with suitable tags that have that information.
... We could envision that being useful for us to agree on
labeling to enable exchange.
<ahill2> thanks, that answers my question
Henning: There's an opportunity
there, didn't discuss it here, and it's to some extent
orthogonal, but there's a need for that as well, maybe industry
specific bodies, which might be in a position to do that more
appropriately.
... I look forward to the mailing list conversation.
cperey: As for next steps, Matt
will publish the minutes of the meeting. It's almost a
transcript.
... He publishes that as a URL, it becomes archives for the
group. That gets it out to a larger audience, but after that
it's kind of up to this group. We're having our F2F in two
weeks.
... We should work on this at the F2F and followup with actions
from that.
Henning: There's no dependency
here, so that's fine.
... Right now, I don't even see it as appropriate to include it
in the doc, as it's not specific to this effort. But, I would
like to look for a community of interest to take it to the next
level of specificity.
... I'm not asking the WG to take on this particular task, it's
probably outside the immediate scope.
matt: Could be a CG perhaps? POI
WG decided not to do this.
... Thank you!
Henning: Thanks, and thanks to Christine for arranging this.
This is scribe.perl Revision: 1.136 of Date: 2011/05/12 12:01:43 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/sos/service:sos/ Found Scribe: Matt Inferring ScribeNick: matt WARNING: No "Present: ... " found! Possibly Present: Henning P11 P14 Raj aaaa aabb aacc aadd ahill2 cperey danbri joined karls matt poiwg robman rsingh2 trackbot You can indicate people for the Present list like this: <dbooth> Present: dbooth jonathan mary <dbooth> Present+ amy WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth Found Date: 08 Sep 2011 Guessing minutes URL: http://www.w3.org/2011/09/08-poiwg-minutes.html People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option.[End of scribe.perl diagnostic output]