Demo for dinner ordering dialogue

This page uses Cognitive AI for a natural language demo involving a waiter and a customer at a restaurant. They are simulated by a couple of cognitive agents that follow a shared dialogue plan for ordering dinner. Each agent uses its knowledge of the plan to generate an utterance, and to understand the response from the other agent. This involves both natural language generation and natural language understanding. The latter uses a word by word incremental concurrent processing of syntax and semantics to avoid backtracking. Further details are given in the explanation below.

Log:



Explanation

Natural language will be key to human-machine collaboration with cognitive agents. It will also make it possible to teach cognitive agents the everyday skills that us humans take for granted. Trying to manually program cognitive agents directly is difficult and won't scale. This demo features both natural language generation and natural language understanding, along with reasoning about plans, and follows on from a much simpler demo for the Towers of Hanoi. Further information is given on the associated GitHub page.

Whilst there has been plenty of work on statistical natural language processing, there has been comparatively little on cognitive approaches, see e.g. ACT-R and parsing. My work is thus a blend of research and engineering with a view to building commercially useful cognitive agents on a roadmap to strong AI.

Conventional approaches to natural language processing focus on statistical processing of text with little attention to meaning, see e.g. Christopher Manning’s slides on statistical natural language parsing. Natural language is highly ambiguous, although we are rarely aware of that as we effortlessly select the most appropriate meaning. If you train your statistical parser on a very large corpus, the statistics will make it more likely for the parser to pick the most appropriate parse tree, but then what do you do? Without the meaning you cannot reason about what the text conveys.

Meaning has been approached in terms of first order logic and predicate calculus, but natural language doesn't lend itself to formal semantics and logical deduction. We instead need to mimic how people reason about the meaning of natural language, in other words, to follow a cognitive approach. This is best explored in a context that is well understood, such as the dialogue used to order dinner at a restaurant. This follows a regular sequence of stages:

  1. Exchange of greetings
  2. Selecting a table to sit at
  3. Reviewing the menu
  4. Placing an order for food and drink
  5. Thanking the waiter when it arrives
  6. Asking for the bill
  7. Paying the bill
  8. Farewells

Each stage can be broken down into smaller steps involving declarative and procedural knowledge that determine the next utterance the cognitive agent will make. The semantic representation of the utterance can be used in a statistical process to generate a parse tree, and in turn the sequence of words to use. Natural language understanding works in reverse, first identifying the most likely part of speech and word sense for the next word, then identifying the syntactic structure in terms of a tree of chunks, and concurrently determining its meaning, using the dialogue context and semantics to find the most consistent meaning without the need for backtracking.

The demo mimics listening to speech, as humans have used spoken language for over a million years and written language for just a few thousand. Common abbreviations are expanded (e.g. "I'll" is transformed to "I will"), punctuation is stripped out, characters are coerced to lower case and then transformed into a sequence of words. Each word is associated with one or more part of speech categories (e.g. noun, adjective, adverb, ...), and for each of these, one or more word senses. The use of semantic disambiguation makes it practical to use the part of speech categories commonly found in dictionaries rather than the much larger set used by most statistical natural language parsers, e.g. the Stanford Parser, which uses the tag set from the Penn Treebank.

This demo is now under development, but I still have a lot of work to do before it is ready. I am currently focusing on reasoning about plans and will then work on natural language generation before working on natural language understanding. I have already worked on how to use chunks to represent the syntactic structure of typical utterances.

I am looking for ideas for a future demo that focuses on reasoning about time as a means to explore the use of different tenses, e.g. the past continuous tense for something that happened before and after a specific time in the past, see the British Council pages on English Grammar. This will need a scenario with well understood semantics and typical language usage. Any suggestions would be warmly received!

Dave Raggett <dsr@w3.org>


eu logo This work is supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No 780732 for project Boost 4.0, which focuses on smart factories. Clipart for the customer and waiter are courtesy of publicdomainq.net