RestaurantRecommendation

From W3C Wiki
Revision as of 09:19, 11 August 2012 by Zruset (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Nearby: RestaurantsVersusTheirReviews, SemanticWebDeveloperMap, GeoInfo, GeoOnion, GrubstreetMeetsChefMoz


Many people see Restaurant Recommendation as a classic SemanticWeb use case - building community around food!

Some links and ideas:

lots of ideas here, but not so much by way of running code/examples/etc. Please somebody build/show something. It doesn't have to be perfect. DontWorryBeCrappy. Then share the lessons learned as you improve it...

A look at Chef Moz

Let's take a look at Chef Moz, both as a dataset, and as an RDF vocabulary. Here is a restaurant description from Chef Moz, with some minor changes (detailed below):

Note that we don't have lat/long or timezone info here. Perhaps CityLookup or other GeoInfo tools can help? (and how would we distinguish between knowing the lat/long of a specific venue, versus general lat/long for a city it is in?).


<r:RDF xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:d="http://purl.org/dc/elements/1.1/"
     xmlns:cm="http://chefmoz.org/rdf/elements/1.0/"
     xmlns="http://chefmoz.org/rdf/elements/1.0/">
<Restaurant cm:id="United_Kingdom/England/Bristol/Casa_Sudaca983983157">
   <Location>United_Kingdom/England/Bristol</Location>
   <d:title>Casa Sudaca</d:title>
   <Address>25 Zetland Road</Address>
   <City>Bristol</City>
   <Country>United Kingdom</Country>
   <Phone>(0117) 944 6304</Phone>
   <State>England</State>
   <Zip>BS6 7AH</Zip>
   <Neighborhood>Cotham</Neighborhood>
   <Hours>daily 6.30pm-11pm</Hours>
   <ParsedHours>18.5-23|18.5-23|18.5-23|18.5-23|18.5-23|18.5-23|18.5-23</ParsedHours>
   <Smoking>permitted</Smoking>
   <Accessibility>partially</Accessibility>
   <AccessibilityNotes>there are small steps</AccessibilityNotes>
   <LargestParty>50</LargestParty>
   <Price>#5.01 - #15.00</Price>
   <Cuisine>Latin American</Cuisine>
   <Cuisine>Mexican</Cuisine>
   <Accepts>checks</Accepts>
   <Accepts>EnRoute</Accepts>
   <Accepts>Diners' Club</Accepts>
   <Accepts>Visa</Accepts>
   <Accepts>Japan Credit Bureau</Accepts>
   <Accepts>MasterCard/Eurocard</Accepts>
   <Accepts>Carte Blanche</Accepts>
   <Accepts>Discover</Accepts>
</Restaurant>
</r:RDF>


In addition to restaurant reviews, Chef Moz has data about other Web sites and about reviewers. Here are some (un-amended) brief excerpts.

The chefmoz.guides.rdf file describes Web sites that offer restaurant guide data:

<Guide about="http://boston.citysearch.com/Boston/restaurants_bars/">
   <d:Title>CitySearch Boston - Restaurtants</d:Title>
   <d:Description>Restaurant and bar news and listings.</d:Description>
   <d:Coverage>United_States/MA/Boston</d:Coverage>
</Guide>


Chef Moz also describes the reviewers. Here's one (picked at random from the Boston section):

<Profile r:id="jabrams">
<displayname>Joel Abrams</displayname>
<aim>TeaMan2000</aim>
<edits r:resource="United_States/MA"/>
<edits r:resource="United_States/MA/Boston"/>
<edits r:resource="United_States/MA/Waltham"/>
<Text>I'm a news junkie and Boston-area new media producer - currently working on a service that brings audio
content to cell phones. I hope it includes restaurant reviews.</Text>
<email>http://chefmoz.org/cgi-bin/send.cgi?toeditor=jabrams</email>
</Profile>


Chef Moz details

The first example above (the restaurant description) already has a few basic fixes to the RDF/XML to make it parse as sensible, modern RDF.

  • £ written as # (because didn't know the numeric character reference nor where to look to find it. Clues please! --DanBri) (£ according to the html DTD)
  • some syntax fixes (rdf: prefix, namespace URI etc)
  • the rdf:id is now written cm:id, since this isn't usefully part of a URI
  • Dublin Core title is lowercase, ie. 'd:title" here

Other changes can wait (eg. 'Phone' should use URI), and are more stylistic and pragmatic in nature: perhaps we might consider using GeoInfo or RdfCalendar techniques to better represent the details of place and time, or FOAF to represent contact and other details, 'who knows who' etc amongst reviewers.

  • anyone tried parsing it yet? - chefmoz as distributed doesn't quite parse as RDF or even (char encoding issues) XML
  • The toplevel RDF element is not namespace prefixed, and so is in the default namespace (theirs)
  • they use the older http://purl.org/dc/elements/1.0/ URI for dublin core, and write 'Title' instead of 'title'; recent DC applications almost all use http://purl.org/dc/elements/1.1/ and lowercase property names.
  • If the intention is to write what is often represented as 'rdf:ID', then their r:id element should be r:ID, with r bound to http://www.w3.org/1999/02/22/rdf-syntax-ns# instead of http://www.w3.org/TR/1999/REC-rdf-syntax-19990222 as it is currently
  • the namespace uri ends in a slash, like DC, RSS, FOAF vocabs. This bothers some people (LinkMe)
  • the parsed opening hours property -- can we try converting this to iCal/RDF markup?
  • If there was a cleaned up version of this, I'd love to encourage folk to use such markup alongside their FOAF self-descriptions. --DanBri
  • Factual errors in the data: Sudaca is in the Redland area of Bristol, not Cotham! what's the process for fixing this?


chefmoz.guides.rdf:4178: error: Input is not proper UTF-8, indicate encoding !
   <d:Title>KVIA Café</d:Title>
                    ^
chefmoz.guides.rdf:4178: error: Bytes: 0xE9 0x3C 0x2F 0x64
   <d:Title>KVIA Café</d:Title>
                    ^


A look at Grubstreet

reworked from scattered wiki + irc conversation with Kake. Hmm if only there were some way to hook Wiki and IRC together... ;) --DanBri

How does the Chef Moz approach compare to what folk write in Grubstreet? does Grubstreet have any structured data / expected fields? eg. using extended Wiki markup?

Grubstreet as it stands is very unstructured – it's a plain usemod wiki although efforts are underway to move to a CGI::Wiki based script. Take a look at the development site and try editing a couple of pages (pick pub or restaurant etc ones) to see the edit boxes for categories, landranger co-ords, phone number, postcode etc.

Kake made a start at a exporting something like the Chef Moz format; see for example the Calthorpe Arms entry, which now has a 'chefmoz rss' link. This was followed by a version that (per RestaurantsVersusTheirReviews) makes a clearer distinction between the venue and its description. (what next? tarball download?--DanBri)

Here is what the main bit of the RSS + chefmoz markup looked like as of April 6th 2003:

  <item rdf:about="http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Calthorpe_Arms%2C_WC1X_8JR">
  <title>Calthorpe Arms, WC1X 8JR</title>
  <link>http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Calthorpe_Arms%2C_WC1X_8JR</link>
  <description>TODO-description</description>
  <chefmoz:Country>United Kingdom</chefmoz:Country>
  <chefmoz:Restaurant>http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Calthorpe_Arms%2C_WC1X_8JR</chefmoz:Restaurant>
  <chefmoz:Zip></chefmoz:Zip>
  <chefmoz:City>London</chefmoz:City>
  <chefmoz:Neighborhood>Clerkenwell</chefmoz:Neighborhood>
</item>


The RSS aspects might need more work (why format this as RSS rather than just plain RDF/XML? — because I don't really know what I'm doing :) Kake) but this is a good start...

This brings new meaning to the phrase "RSS feed" :)

Grubstreet can pick out things like postcode, city, country (the latter two are set at London/UK defaults in the wiki script), and it can also figure out which of its categories are locales (by seeing which of their pages are in Category Locale). Can do the same for cuisines.

Basically Grubstreet will be able to do pretty much anything we want it to, and we're mostly quite into the whole metadata thing. The RSS recent changes feed plugin for CGI::Wiki is now released: http://search.cpan.org/author/KAKE/CGI-Wiki-Plugin-RSS-ModWiki/

Grubstreet meets ChefMoz

Two datasets that cover eating in London. Here's the Grubstreet listing (from dev't server); and here's the ChefMoz listing for London.

Using RubyRdf, we can load up the ChefMoz data using the 'scutter' harvester:

./ayftest.rb 'http://un.earth.li/~kake/cgi-bin/wiki.cgi?action=index;format=rdf'


The chefmoz RDF can similarly be downloaded and stored.

This gets us two sets of RDF triples with similar coverage, but doesn't necessarily do the data merge thing to the extent that we can fully benefit from the two collections.

Next steps...? Find some actual common coverage and work from there...


<hex> danbri: try http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Pizza_Express%2C_SE1_9QQ 
              and http://chefmoz.org/United_Kingdom/England/London/Pizza_Express948546331.html


...continued in GrubstreetMeetsChefMoz

(shall we move more of this example into that page? probably...)

Suggestions and questions

This drifts off the Grubstreet topic a little, as we start to talk about ways of describing general 3rd party reviews (such as those cited at the bottom of some Grubstreet pages).

  • chefmoz needs multiple locales/neighbourhoods plus more than three cuisine options (how to make xml::rss support this? (is the rss spec happy with that anyway re repeating elements? there was a bug in the spec we need to fix relating to this?)
  • if the reviews were organised sequentially, weblog style, it might feel more RSS-ish...? Are there style guidelines for reviews in Grubstreet? The reviews seem to speak mostly in a single voice, how does that relate to idea of having an 'feed' of reviews, and of doing the 'web of trust' thing around reviewer reputation?
  • is there more structured data that could be exported? eg. some pages have photos (Calthorope Arms page has a rights statement about the photo too, so CreativeCommons metadata seems relevant); also the 'other reviews' list at foot of page is (at a different granularity) similar to the Chef Moz listing of other review sites.

We can write, using DC, Chef Moz and FOAF vocabulary, that a page is about a restaurant. Maybe we could extract such info from Grubstreet 'other reviews' links?


<r:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:dc="http://purl.org/dc/elements/1.1/"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:cm="http://chefmoz.org/rdf/elements/1.0/"
<foaf:Document rdf:about="http://www.whatsonbristol.co.uk/reviews/restaurants/casa_sudaca.html">
  <dc:title>What's On Bristol Guide - Reviews - Restaurants - Casa Sudaca</dc:title>
      <dc:description>Review: Imagine a place where the atmosphere is always warm, where the people smile, the
  music is upbeat, the colours are the rich oranges and pinks of a summer sunset and the food is good
  and nourishing. Does that appeal to you? Well then get yourself down to Casa Sudaca at the bottom
  of Zetland Road where you'll find a perfect little South American gem of a restaurant.</dc:description>
  <foaf:topic>
    <cm:Restaurant>
      <dc:title>Casa Sudaca</dc:title> <!-- is it best to use dc:title for naming real world things? -->
       <foaf:depiction rdf:resource="http://www.whatsonbristol.co.uk/images/review/casa-2.jpg" />
       <foaf:depiction rdf:resource="http://www.whatsonbristol.co.uk/images/review/casa-1.jpg"o tr />
    </cm:Restaurant>
  </foaf:topic>
</foaf:Document>
</rdf:RDF>


What Semantic Web design issues does this illustrate?

  • We can (though nobody forces us) be clear about the distinction between things and their descriptions
  • The Web document is has as its topic the Restaurant
  • both have titles or names: 'What's On Bristol Guide - Reviews - Restaurants - Casa Sudaca', and 'Casa Sudaca'
  • we can use foaf:depiction to relate the restaurant to a digital image that depicts it

c.f. PropertiesForNaming, ThingsVersusTheirNames

All this doesn't quite get us to restaurant recommendations in the social sense, since much of the RDF (ChefMoz, Grubstreet) is written in the neutral 3rd person voice (is that true re chefmoz? need to check data...). To augment that, let's find a way to say 'DanBri wrote this review', 'DanBri likes the food at Sudaca' etc. When we merge RDF data from sites that have already done the value adding and view-combination (eg. Grubstreet wiki) we don't have in the RDF itself the 'WhoSaidWhat' aspect. Some FOAF tools, eg foafbot, do try to preserve this... Maybe we could revive foafbot and teach it how to read restaurant reviews from the Web?

A look at some running code

I'm using my RubyRdf code here for now; folks with better documented and bundled APIs are welcome to improve this by showing use in other packages. --DanBri

Say we had a number of restaurant descriptions / reviews in RDF files on the Web, how could we use them? What sorts of things do RDF tools do? Which things are easy? Which things are hard?

What would this look like in Perl, Java? Prolog? Or using Jena, Rdflib, Redland, Mozilla, Cwm, or any of the other RDF toolkits...?

Here are some tests using Grubstreet/chefmoz data in RubyRdf. You can't run these out of the box (yet...) but browsing around them should give some idea of the sorts of facility we might expect from RDF tools.

RubyRdf tests:

  • fgrub.rb loads 5 documents into an SQL-backed RDF store, for each one, replacing any triples previously loaded from that source with the new data.
  • qgrub.rb shows an RDF query (subgraph match request) being used to consult the RDF database.
  • the output from running these tests is also available.

Here's the Squish query we sent. Don't worry so much about the syntax (although see RDFQueryTestCases if interested), except enough to note that it corresponds to a graph with bits labelled missing, and that the results can be represented as a table of variable-to-value bindings.

The query says find me values for ?uri, ?title, ?desc, ?c, ?n where they stand in these (...) specified relations to each other..


   SELECT ?uri, ?title, ?desc, ?c, ?n,
    WHERE
      (rss::title ?uri ?title)
      (rss::description ?uri ?desc)
      (cm::Country ?uri ?c)
      (cm::Neighborhood ?uri ?n)
    USING
      rss for http://purl.org/rss/1.0/
      cm for http://chefmoz.org/rdf/elements/1.0/


Here are the query results, formatted as a table:

title country neighbo(u)rhood uri
Anchor Bankside, SE1 9EF United Kingdom Southwark http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Anchor_Bankside%2C_SE1_9EF
Calthorpe Arms, WC1X 8JR United Kingdom Clerkenwell http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Calthorpe_Arms%2C_WC1X_8JR
Cittie Of Yorke, WC1V 6BN United Kingdom Holborn http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Cittie_Of_Yorke%2C_WC1V_6BN
Counting House, EC3V 3PD United Kingdom City of London http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Counting_House%2C_EC3V_3PD
Crosse Keys, EC3V 0DR United Kingdom City of London http://the.earth.li/~kake/cgi-bin/cgi-wiki/wiki.cgi?Crosse_Keys%2C_EC3V_0DR

OK, so we didn't do much. Loaded some data into a database and queried it. However some things are worth noting:

  • Everything we did except the actual query and the code to ask it was neutral about the contents of the data
  • Compare this with for eg. some RSS libraries (...LinkMe) which try to anticipate usage scenarios, by hard-coding handy vocabularies such as DC into the implementation.
  • Nothing would break (except we'd get different answers to the query) if the data source were to provide differently structured RDF, eg. using GeoInfo or RdfCalendar vocabulary.
  • Compare this to the fragility of the traditional RDBMS world, where schemas (and hence descriptive capabilities) must be agreed in advance or negotiated with sysadmins at some cost.

Design Choices

Advantages of this use case:

  • Everyone eats, and just about everyone eats in restaurants sometimes.
  • Competition between restaurants is rarely bitter. Nearly everyone accepts that
  restaurants will be reviewed and compared.  People also understand
  it's not just about picking the best restaurant.   Restaurants have limitted
  capacity and can't become monopolies (unlike software vendors).   There are also matters of taste
  involved.   :-)    All this means that restaurant reviews tend to be less
  controvercial than many other kinds of reviews.

General motivation behind this application: Aside from saving the world through better eating, restaurant recommendations are interesting because they are so personal. Online restaurant reviews and lists that exist today are useful, be serve a every different function than getting actual suggestions from actual people whom you actual know (or have specific reason to trust).

1. Why this is different from the other zillion restaurant search sites and from the Firefly type colloborative rating?

emphasis is on the trust and reputation of the actual people doing the recommendation, not aggregation of preferences a la firefly or the many other collaborative rating sites

handle scale by linking individual communities together (ie. one group of users may refer to other restaurant evaluation communities and express individual opinion about resources in other repositories, but not force the system to scale up so much that it loses the individual reputations.

2. What is the value of this as an RDF illustration: it can show, perhaps even graphically, what it means to traverse a complex graph.

3. What's the research value: exploring ways of expressing and operating on complex notions of individual trust and reputation.

Perhaps we can keep the number of users and resources (restaurants) down to a managable size so that the focus can stay on individual interaction, as opposed to aggregation that ends up obscuring individuals.

One additional wrinkle to think about might be integrating RSS, Blogs & Wikis into such a system. One of the stumbling locks to a widely deployed recommendation system of any sort is that most people don't have an easy means to publish stuff. That means you have get get people to put things in centralized locations which are inherently less flexible and less Web-like. Now that lots of people blog, could this be a useful publishing platform for distributed, trust-based recommendation systems? The Grubstreet project takes a Wiki-led approach (with GeoInfo additional markup soon to be integrated to the system - see the development site for an example), billed as the 'Open Community Guide to London'. See also discussion of Grubstreet and Wiki collaboration on the food-obsessive Chowhound site.

There is a design tension here: ease of editing, contribution pulls us towards the (often anonymous) world of Wiki; filtering and reputation concerns draw us back towards logins and user identification. Merging food-related Weblogs is in the latter direction. The SWAD-E work on requirements for Semantic Blogging and Bibliographies is nearby...

Data Sources

Some resources we could use to build prototypes. Or if we have DemonstratorFatigue, to build the real thing:

Earlier Discussions

Let's make like we've done our reading...

(re-organisation needed, merge with list at top of the page, separate out more scholarly references from mailing list pointers?)

Misc scribbles:


Opentable.com models users for restaurants -- OpenTable.com, an online
restaurant reservation service, maintains extensive models of
individual profiles and preferences for restaurants. (4/24)
http://agents.umbc.edu/cgi-bin/raw?url=http://www.latimes.com/technology/la-fo-matters23apr23,1,1170486.column?coll=la%2Dheadlines%2Dtechnology

(from UMBC AgentNews, link seems to require subscription. maybe someone could skim/summarise here?)