Introduction to Chunks and Rules

Companies are looking to digital transformation to become more efficient, more flexible and more nimble in respect to changing business needs. Paper forms, spreadsheets and traditional tabular databases are the legacy that is the weighing businesses down. Knowledge about the meaning of the data is locked inside people’s heads, partially described in specification documents that are often out of date, or buried inside application code that no one really knows how to update.

The future will emphasise digital integration vertically, horizontally, and temporally throughout the product life cycle, featuring decentralised information systems and machine interpretable metadata. Graphs are key to achieving this together with rules and highly scalable graph algorithms capable of handling massive datasets. Cognitive agents that blend symbolic and sub-symbolic (statistical) techniques represents the next stage in the evolution of AI, drawing upon decades of progress in Cognitive Psychology and Cognitive Neuroscience. This is important for addressing everyday situations involving uncertainty, incompleteness, inconsistency and the likely presence of errors, where traditional approaches based purely upon logical deduction struggle to be effective.

This page explores ideas for a simple way to express graphs and rules that operate on them, in conjunction with highly scalable graph algorithms, suitable for handling big data. The approach is designed with the aim of facilitating machine learning for vocabularies and rules, given that manual development will become impractical and excessively expensive as the number of vocabularies and their size scales up and up, and information systems require agility to track ever changing business needs. With that in mind, both declarative and procedural knowledge are represented in the same way to facilitate manipulation of rules as data. A series of demos are under development as proof of concept.

cognitive agent architecture

Chunk is a term from Cognitive Psychology, and is defined by Wikipedia as follows:

A chunk is a collection of basic familiar units that have been grouped together and stored in a person's memory. These chunks are able to be retrieved more easily due to their coherent familiarity.

See also Chunking mechanisms in human learning, by Gobet et al., who say:

Researchers in cognitive science have established chunking as one of the key mechanisms of human cognition, and have shown how chunks link the external environment and internal cognitive processes.

In the brain, chunks are signalled as concurrent stochastic spiking patterns across bundles of nerve fibres. You can think of this in terms of vectors in spaces with a large number of dimensions. The set of name/value pairs in a chunk is represented by the projection of the vector onto orthogonal axes. See Eliasmith's work on concepts as semantic pointers.

For this work, a chunk is modelled as a concept with a set of properties. Each chunk has a type and an identifier. Chunk property values are either booleans, numbers, names, string literals enclosed in double quote marks, or a list thereof. Here are some examples:

friend f34 {
  name Joan
}
friend {
  name Jenny
  likes f34
}

Where friend is a chunk type, and f34 is an optional chunk identifier. If the chunk identifier is not provided it will be assigned automatically. The second chunk above links to the first via its chunk identifier. Links are a subclass of chunk, for which the chunk type is the link relationship (aka predicate), and the link is described by the chunk properties subject and object. Links can be expressed in a compact format, e.g.

dog kindof mammal
cat kindof mammal

This is equivalent to:

kindof {
  subject dog
  object mammal
}
kindof {
  subject cat
  object mammal
}

The chunk syntax avoids the need for punctuation with the exception of comma separated lists for property values. However, line breaks are required to minimise the likelihood of input errors.

Relation to Property Graphs

There is an obvious similarity to labeled property graphs. These consist of nodes connected by labeled directed edges, which are referred to as relationships. Both nodes and relationships may have properties represented by key/value pairs. In the current work, both nodes and relationships are modelled as chunks, with chunk properties used for key/value pairs.

One difference from the Property Graph databases is that they generally provide query/update languages, but not rule languages. Applications are written using conventional programming languages that interact with Property Graphs via the corresponding query/update APIs. By contrast, this work on chunks seeks to support machine learning of vocabularies and rulesets, changing the role of developers into that of teachers who instruct and assess the capabilities of systems, and monitor their performance. This will be increasingly important as the number and size of vocabularies scale up and up, along with the challenge of mapping data between different vocabularies, so that manual development becomes increasingly impractical. The effectiveness of vocabularies and rulesets can be assessed through application to a curated set of test cases, with the ability for developers to add new cases as needed.

Goal directed production rules

The architecture is inspired by ACT-R, a popular cognitive architecture that has been successfully used to describe a broad range of human behaviour in Cognitive Science experiments, e.g. mental arithmetic and driving a car. There are several modules, each of which is associated with a graph. The facts module contains declarative facts, whilst the goal module contains goals.

Following ACT-R, we distinguish declarative knowledge expressed as chunks from procedural knowledge expressed as rules. Rules consist of conditions and actions. Conditions match the current contents of module buffers. Actions can directly update the buffers, or can do so indirectly, by exchanging messages with the modules to invoke graph algorithms, such as graph queries and updates.

Rules are expressed as a set of chunks, e.g.

rule r1 {
  @condition g1
  @action a1, a2, a3
}
count g1 {
  @module goal
  start ?num
  state start
}
count a1 {
  @module goal
  state counting
}
increment a2 {
  @module facts
  @action recall
  first ?num
}
increment a3 {
  @module output
  @action update
  value ?num
}

The rule language attaches special meaning to terms beginning with "@", for instance, @condition is used to name the chunk identifiers for the rule's conditions, and likewise, @action is used to name the chunk identifiers for the rule's actions. Rule variables begin with "?" and provide a means to bind data between conditions and actions.

Each condition and action identifies which module it relates to. Conditions act on buffers rather than on the modules. Each module has a single buffer that may contain a single chunk. In the above example, g1 is a condition chunk that matches the chunk in the goal buffer, which must have the same chunk type (in this case count). The start property is used to bind the ?num variable, while the state property is expected to have the value start.

The a1 chunk is an action and updates the chunk in the goal buffer to have the value counting for the state property. The other properties for the buffered chunk remain unchanged. The a2 chunk sends a request to the facts module to recall a chunk with type increment and a matching value for the property first. The @action directive instructs the rule engine what action to perform. The default is to update the buffered chunk for that module.

Badly designed rulesets have the potential for infinite loops. This is addressed by abandoning tasks that take considerably longer than expected, see task management and attention.

The rule language is a little cumbersome for manual editing. That could be addressed with a higher level rule language that compiles into chunks, but it wouldn't be appropriate for machine generated rules. Vocabularies and rule sets that have been generated through machine learning are likely to be harder to understand for humans, since the terms they use will have been machine generated, e.g. an identifier like "_:386314". This isn't expected to be a problem in practice, as people will be able to use natural language (or a controlled subset) to interact with cognitive agents, where a lexicon maps the human terms to the terms used internally.

Additional features

The rule language is expected to evolve further in the course of work on new demonstrators. This section describe some additional features and the expected direction for future extensions.

The @distinct property can be used in conditions to test that its values are not all the same. A related idea, also borrowed from ACT-R, is to be able to declare that the value in a variable must be different from the given property value, e.g.

goal g3 {
  @module goal
  state counting
  start ?num
  end !num
}

i.e. this requires that the start and end properties in the goal chunk have different values. This is a redundant feature given @distinct. If it is kept, it might possibly make sense to change the syntax to say ~?num, or perhaps !?num.

Additional features will be added as needed for new use cases. One example is the potential requirement to test that a variable is unbound after matching a rule condition to a buffer, e.g. using @unbound with one or more variables to test that they are all unbound.

Sometimes it may be necessary to test whether a variable holds a boolean, number, name, string literal or a list thereof. This suggests the need for properties like @boolean, @number, @integer and so forth. Further consideration is needed for string literals. In principle, human language descriptions could be expressed using a chunk with properties for the string literal, its language tag and its base direction. Complex string operations would seem to be beyond the scope of a simple rule language, and something that could be better handled via invoking operations implemented by a module.

By default, actions specify chunks with the same type as the action chunk, however, sometimes you will want to query for an instance given its super type. For example, given these facts:

penguin kindof bird
eagle kindof bird
...

and a rule action like:

* a3 {
  @module facts
  @isa bird
}

Then the action could, in principle, load the following chunk to the facts buffer:

penguin p6 {
  name pingou
}

This follows since pingou is a penguin, and a penguin is a kind of bird. The * acts as a wild card that matches any type. The @kindof property can be similarly used in actions to query subclasses of a given class in a taxonomy. Actions can request a given chunk using its chunk identifier with @id. This can also be used in conditions to bind a variable to the chunk identifier, and likewise you can use @type to bind a variable to the chunk's type. The following condition matches any chunk in the facts buffer and binds ?id to the chunk's identifier and ?type to the chunks's type.

* a3 {
  @module facts
  @id ?id
  @type ?type
}

The current approach allows you to either state the expected value for a property, or to use a variable to match any value. A further possibility is when you want to constrain the match to a value from a given set. This could be addressed by referencing a range definition in subsidiary chunk, e.g. a rule could include the following:

t-shirt c1 {
  @module facts
  @range c2
}
range c2 {
  property colour
  values red, green, blue, white, black
}

which would match a chunk in the facts buffer with the type t-shirt and a colour property that is one of red, green, blue, white, or black. This could be extended to support numeric ranges, and for defining range values as separate chunks with relative ordering. This further points to the potential for supporting fuzzy reasoning. For instance, a temperature could be classified as cold, warm or hot. As the temperature is raised, it starts by being cold, but rather than suddenly being classified as warm, there is a smooth transition, with decreasing probability that the temperature is cold and increasing probability of being warm. The probability then flattens out until the smooth transition from warm to hot. For more details see the wikipedia article on fuzzy logic.

diagram depicting fuzzy logic temperature

There is a need for flexible handling of properties that have a list of values. This requires further consideration. Some possible requirements include: testing if the list contains a given item, a means to iterate through the list, a means to add and remove list items, set operations on lists, e.g. union and intersection, counting the number of items in a list, a means to sort lists, and to remove any duplicates. Once again, many of these operations could be handled via graph algorithms associated with a module, rather than being built into the rule language.

Support for numbers

People and animals have an innate ability to handle numbers, e.g. to know if something is in reach of your hand, or whether a gap is small enough to jump over. This suggests the need for simple numerical operations, e.g. comparisons, such as @lteq which would be used with two variables to test that the value of the first is less than or equal to the value of the second. Limited support for comparison and adjustment of numerical values is needed for modelling emotional states. Operations on numerical values is also needed for spatial and temporal reasoning, and points to the potential for specialised processing with a graph algorithm invoked by rule actions. In other words, the rule language should remain simple, with more complex operations handled by the modules.

Scripting API

This section describes the scripting API exposed by the JavaScript library for chunks and rules, as used in the online demos. The starting point is to create a graph from a text string containing the source of the chunks that make up the graph. The following code creates two graphs: one for facts from "facts.chk", and another for rules from "rules.chk".

fetch("facts.chk")
.then((response) => response.text())
.then(function (source) {
	facts = new ChunkGraph(source);
	fetch("rules.chk")
	.then((response) => response.text())
	.then(function (source) {
		rules = new ChunkGraph(source);
	});
});

Here are some operations you perform on a graph:

new ChunkGraph(source)
Create a new graph from a text string containing the chunks and links.
graph.chunks[id]
Find a chunk given its id.
graph.types[type]
Find the list of chunks with a given type.
graph.forall(kind, handler, context)
Apply a function to all chunks whose type has the kindof relationship to the given kind. This applies recursively to chains of kindof relationships. The handler is a function that is passed the chunk and the context.
graph.recall(type, values)
Recall a chunk with a given type, and matching values as denoted by a JavaScript object with a set of named properties. Note that this is stochastic and returns the 'best' chunk when there are multiple matches.
graph.remember(type, values, id)
Remember (i.e. create and store) a chunk with a given type, and matching values as denoted by a JavaScript object with a set of named properties. The chunk id will be assigned automatically if not supplied.
graph.parse(source)
Parse source as chunks and add to this graph.
graph.add(chunk)
Adds a chunk or link to the graph, see below for ways to create chunks and links. If the chunk is currently part of another graph, it will be removed from that graph before being added to this one.
graph.remove(chunk)
Remove a chunk or link from the graph.

Here are some operations you perform on a chunk:

new Chunk(type, id)
Create a new chunk for a given type and id. The id is optional and will be assigned automatically when the chunk is added to a graph if not supplied.
new Link(subject, predicate, object)
Create a new Link as a subclass of chunk where the chunk type is given by the predicate. The chunk id will be assigned automatically when the Link is added to a graph.
chunk.id
Access the chunk's id.
chunk.type
Access the chunk's type.
chunk.properties[name]
Access a chunk property value given the property's name.
chunk.setValue(name, value)
Overwrite the value of a named property
chunk.addValue(name, value)
Add a value for named property. An array is used only if the property value has multiple values.
chunk.removeValue(name)
Remove a value from the named property - this is the inverse of addValue.
chunk.hasValue(name, value)
Returns true if the named property contains the given value.
chunk.toString()
Returns a pretty printed version of the chunk.

The following describes the API for rules:

new RuleEngine()
Create a new rule engine.
engine.start(initial_goal, rules, facts)
Applies the engine to an initial goal, provided as a chunk. Note: rules is a graph containing the rules that define procedural knowledge, and facts is a graph containing a set of chunks that define declarative knowledge.
engine.next()
Find and execute the next matching rule.
engine.getGoals()
Return the goal graph created by calling engine.start.

Short term vs working memory

Working memory is used here for the module buffers which are restricted to a single chunk. Short term memory is more flexible and provides a means to hold multiple chunks of short term interest. A possible approach would be to provide a short term memory module analogous to the brain's hippocampus, and to provide a means for queries on long term memory modules (analogous to the cortex) to place results into the short term memory module.

When trying to remember all instances of some class, it is easy to remember the most common instances, but the others will be much harder. If the instances form a sequence, then given one instance, it is relatively easy to remember the following one, for example, successive letters in the alphabet. When recalling all kinds of birds, the results could be mapped into a sequence of chunks, that rules could iterate over by following the reference from one chunk in the sequence to the next. In a sufficiently large database, search will be limited to what is most useful based on prior knowledge and past experience. This can be implemented in terms of ACT-R's stochastic recall, based upon a combination of dynamic activation levels and persistent strengths.

It will also be interesting to consider other kinds of queries, e.g. automata based upon graph traversal or simple patterns inspired by SPARQL. Those demos use RDF, but could easily be adapted to use chunks, for instance, @shape could be used to reference a chunk that is the starting node in an automata defining a shape constraint, analogous to SHACL and ShEx. The results of such queries would be a set of chunks that could be placed in the short term module. This fits well with an architecture that provides a local module for short term memory together with access to remote long term memory modules. The ability to retrieve multiple chunks in a single remote query provides for better performance compared to having to retrieve chunks one by one.

Reasoning from multiple contexts

Search may often need to be conducted relative to a given context rather than across the database as a whole. The ability to define and search from within such contexts is important when it comes to counterfactual reasoning, causal reasoning, and reasoning from within multiple perspectives.

Further consideration is needed on how to express and reference such contexts. One idea is to provide a means to group chunks into a set, for instance, by adding a context property to the chunk object model. This would allow for the context itself to be defined as a chunk. The context chunk can link to a parent context to define a chain of contexts. Rules conditions and actions would refer to the context chunk via @context. This would allow rules to match a specific context, and likewise to update chunks in a specific context.

The root context is everyday declarative knowledge, e.g. elephant is a kind of mammal. A context might be created for what-if reasoning, for describing the beliefs attributed to some person or agent, for lessons in which some things are deemed to hold true in the context of a lesson, and for a story about some fictional world, e.g. magic exists in the world of Harry Potter novels, but pretty much everthing thing else is the same as in our world.

Nested contexts may be used, e.g. for describing the personal beliefs of the people in a given story. Knowledge described in a given context will often override or supplement knowledge in a parent context. Another use case is where you are considering different possible events leading up to a particular outcome, e.g. when trying to explain a fault in some machinery. This may necessitate a tree of chained contexts.

One question is what syntax to use in the chunk serialisation format to indicate that one or more chunks belong to a given context. One idea would be to declare the context as a regular property of other chunks, analogous to kindof, see above. Another idea is to treat the context as a meta property, analogous to the chunk type and identifier. You would then be able to use @context followed by the context's chunk identifier then curly brackets enclosing chunks belonging to that context. This is the same syntax as for a single chunk, except that the brackets would enclose a set of chunks rather than a set of properties.

The Semantic Web has focused on formal logic. This work, by contrast, focuses on graph traversal and manipulation, adopting the philosophy of relativism in which views are relative to differences in perception and consideration. There is no universal, objective truth according to relativism; rather each point of view has its own truth. Protagoras is reported to have said to Socrates:

What is true for you is true for you, and what is true for me is true for me.

This doesn't mean that all perspectives should be considered equal, but rather should be seen in the context of other knowledge, moreover, what people say isn't necessarily what they consider to be true, but what they want others to believe.

Causal Reasoning and asking why?

According to Barbara Spellman and David Mandel:

Causal reasoning is an important universal human capacity that is useful in explanation, learning, prediction, and control. Causal judgments may rely on the integration of covariation information, pre-existing knowledge about plausible causal mechanisms, and counterfactual reasoning.

Causal reasoning allows you to make predictions and decisions based upon an understanding of cause and effect. A starting point is to look at correlations across a sequence of observations. For instance, looking at the correlation between smoking and lung cancer, using counts for smokers with and without lung cancer, and counts for non-smokers with and without lung cancer.

table of counts
With thanks to Barbara Spellman and David Mandel

This computes the proportion of times the effect occurs when the suspected cause is present, minus the proportion of times it occurs when the suspected cause is absent. Statistical significance is defined as the likelihood that a relationship between two or more variables is caused by something other than chance. The larger the number of samples in the experiment, the smaller the observed difference in proportions needs to be in order to be considered statistically significant. However, statistical significance doesn't by itself prove a causal relationship. For instance, ice cream sales may have a statistically significant correlation with crime rates, but further study reveals a common cause - a heat wave.

To understand and gain control, we seek plausible explanations as to why things happen. Sometimes there could be more than one possible explanation, necessitating reasoning about which is the most likely for any given event. We could perhaps re-examine the statistics having widened our search to include data for the weather as well as for ice cream sales and crime rates. A plausible explanation involves a mechanism, e.g. high temperatures increase the likelihood of people getting angry and committing a crime. Simpler explanations are generally preferred to more complex ones.

A cognitive agent could look for statistically significant correlations when an event is deemed similar to previous ones, and then look for plausible explanations. However, what happens if you don't have a large number of events to analyse? Humans from an early age pay more attention to events which don't follow the pattern seen in previous events. One possible approach is to seek explanations by considering a range of potential causes. This can be modelled as counterfactual reasoning where something is assumed to have taken place for the purpose of analysis, but is not considered to be true in general. The previous section describes one way to represent this in terms of chunk contexts.

Knowledge of causal relationships can also be exploited when it comes to planning how to achieve a particular outcome. Actions can be modelled as having pre-conditions before they can be applied, and post conditions that hold after they have been applied. As is the case for counterfactual reasoning, plans are a kind of what-if reasoning rather then reflecting the state of the world. As such they could be constructed in a chunk context created for the purpose. Plans often make use of previous experience as a guide to how to break problems down into manageable pieces. This has implications for episodic memory.

In conclusion, cognitive agents need an innate curiosity that directs attention to finding explanations for events, starting with a means to relate a current event to previous ones. The reasoning processes will depend upon the means to construct contexts for chunks which are assumed to be true within the context of the reasoning process, rather being general facts about the world. Episodic memory needs to support recall of past events based upon similarities with the current event. It would be interesting to look at graph algorithms that can be used to offload the processing needed for computing statistically significant correlations. Such an approach would be essential for handling big data.

Compiling rules from declarative representations

The @compile property can be used with a chunk identifier to compile a set of chunks into a rule. This is needed as the use of @ terms in goals and rules interferes with retrieving or storing chunks involving these terms. The compilation process maps to these terms when copying chunks to the rule module. The default mapping simply inserts an @ character before the name, e.g. mapping action to @action. If the application needs to use the reserved terms for other purposes, you can reference your own map to the standard terms by using @map to reference a chunk with the map, e.g. if you wanted to use m instead of module, and act instead of action:

map {
  m module
  act action
}

Note that for compile actions, @module refers to the source module, as the target module is always the rule module. In principle, there could be an @uncompile property which takes a rule chunk identifier and puts the mapped rule chunks into the given module, and placing the corresponding rule chunk into the module's buffer. This would provide an opportunity for inspection over procedural knowledge. Further work is needed to check whether this capability is really needed. See below for a brief discussion of the potential for declarative reasoning over rules as part of the process of learning how to address new tasks.

The following figure illustrates the theory of skill retention, with three stages of learning and forgetting, from Kim et al, 2013

3 stage theory of skill retention

Task management and Attention

The sequential nature of rule execution necessitates a means for switching attention between different tasks according to the current priorities, including high priority interrupts. Moreover, complex tasks will often need to be broken down into simpler sub-tasks that need to be managed in the face of competing demands. The focus of attention is also important in respect to directing processing within input modules, e.g. for an autonomous vehicle, the need to focus visual attention on road signs or pedestrians in the field of view ahead of the vehicle.

To make machine learning practical, rules need to be grouped into sets that are designed for specific tasks. A given task might involve a start rule, one or more progress rules, and at least one stop rule. For each task, there is likely to be multiple goal states. This brings a number of challenges: the relationship between a task and a sub-task, how to manage competing tasks, including time critical ones, and the relationship between tasks and machine learning.

In respect to switching between tasks, one idea is to wait for the current task to stop, and then to search for another task. Another idea is to abandon the current task when it is necessary to switch to a high priority task that requires urgent attention. Lengthy tasks could be broken up into smaller sub-tasks, allowing for pausing to consider what to do next at the completion point for individual sub-tasks. That presumes a means for suspending and resuming high level tasks.

One idea is to set the goal buffer to an idle chunk that triggers rules that look for currently pending tasks. In principle, pending tasks could be held as chunks in the goal module, having been put there when a task is proposed or suspended. This is related to the requirements for episodic memory for recording and reasoning over past experience, and to the role of the hippocampus for a relatively detailed short term memory.

A related challenge is indexing of rules for efficient selection as the number of rules scales up and up. The baseline is indexing based upon chunk type and chunk id. Further indexes could be constructed dynamically based upon the observed patterns of access, e.g. by determining that a given property is key to rule selection. Is there a way to implement a discrimination network to speed up rule selection?

One idea is to focus on conditions with literal values as a first stage, and to treat variable bindings as a second stage in the filtering process. This is then followed by a process for selecting the highest ranked rule, and then executing its actions. Rule conditions match the current state of the module buffers, and there are only a handful of such buffers. Efficient selection is thus much much easier than if rules were to directly match the state of all of the graph databases.

More sophisticated cognitive agents will be able to pause to reflect on their priorities and how well they are doing in respect to higher level goals. This includes models of self and others in the context of social interaction. How well is the agent doing relative to its expectations? What are some other ways it could proceed? This involves a means to switch attention between higher level reflective tasks and lower level tasks. Attention could also be diverted when encountering novel situations in order to understand and learn from the new experience. Both cases are related to models of attention based upon emotional responses, see the later section on Emotions and Social Interaction.

Focusing input on features of interest

When you are driving, your visual attention prioritises features of interest relevant to driving. This involves a reinforcement learning process, as is apparent when driving in another country for the first time, when it is common at first to feel bewildered by the apparent visual clutter distracting you from the task in hand. The cognitive effort soon drops off as you learn what to focus on and what to ignore.

This can be modelled in terms of a means to signal the current task to the input modules along with a means to direct attention to specific features when needed, e.g. to read the information on a road traffic sign, having noticed a sign in the field of view. In addition, there needs to be a means to signal success and failure as a basis for reinforcement learning within the input modules. The input modules can be implemented using multi-layer artificial neural networks, together with deep learning.

Delegated control for actions

Reaching out to grasp an object involves a complex coordinated activity in regards to perception and actuation. The intent for such an action is delegated to a separate system that runs in parallel to the cognitive rule engine. In the brain, this is implemented by the cerebellum, which can be likened to an air traffic controller interpreting data from the sensory cortex and sending control signals to the motor cortex to orchestrate a large array of muscles.

Conscious thought is needed for new tasks, but through repetition, the effort is considerably lessened as the task becomes a subconscious activity through procedural learning in the cerebellum, motor and sensory cortex. This can be emulated using reinforcement learning across a hierarchical arrangement of real-time control systems that execute concurrently. This will be explored in future demonstrators, using automata that generate smooth control signals as piecewise approximations to continuous functions.

The following diagram provides more details for the cortical circuitry for consciousness (on the left) and motor control (on the right). The cerebellum dynamically regulates movement using connections to the sensory systems, the spinal cord, and other parts of the brain.

cortical circuits
With thanks to Fumika Mori et al.

Reinforcement learning

Machine learning of rule sets is possible using heuristics to propose new rules, together with a means to adapt the perceived utility of rules based upon their performance at executing tasks. In reinforcement learning a reward or penalty is computed when a task either succeeds or fails. The reward/penalty is then propagated backwards in time along the chain of rules that were used to get to that point. The reward/penalty is discounted so that it has less effect the further back in time you get from the point when the task was found to have succeeded or failed. The reward/penalty could itself be related to the length of the rule chain, i.e. how long the task is expected to complete, as well as to the perceived importance of the task.

Rules could be used to determine when a task has successfully completed or when it has failed. In addition, the rule engine could decide to abandon tasks that were taking much longer than expected based upon past experience. One idea is for the rule engine to record the sequence of rule execution, and to perform the back propagation process along this sequence. It is unclear how the brain could support that in practice. Another idea just requires the rule engine to keep track of the last rule executed prior to the current rule. In this approach, the perceived utility of each rule is propagated to the immediately preceding rule. Task repetition will then ensure that reward/penalty eventually propagates back all the way to the first rule in the chain, yielding accurate estimates for rule utilities.

Work with ACT-R has identified some heuristics for proposing new rules, e.g. merging successive rules when practical. Further study is needed to better understand the process by which the heuristics are selected and applied. There is a potential analogy with evolutionary algorithms with mutation and swap operations on genetic code. Left to itself, this could require a vast number of task repetitions to achieve effective task performance.

This could be speeded up by learning from experience which approaches are more likely to work in a given context. That involves case based reasoning that looks for similarities and differences with other tasks. This process could be carried out by first creating a representation of the rules in declarative memory, and interpreting these and then modifying the representation as needed. The rules would be compiled to procedural memory as they stabilised, offering significant speed up in the process. Alternatively, it may be simpler to allow for inspection and annotation of rules held in the rule module, using the mapping mechanism described earlier in this document.

The rule engine identifies which rules match the current buffer states, and then picks the rule with the highest perceived utility. This process is stochastic, so that lower ranked rules will occasionally be picked over high ranking rules. This can be associated with a temperature parameter, where the higher the temperature, the more likely the rule engine will propose new rules and pick a lower ranked rule. When starting to learn a new task the temperature can be set high. The temperature is subsequently lowered when the task completes successfully, and raised when it doesn't. Over time this allows the system to explore the problem space and to find effective solutions. This requires a means to represent tasks as chunks, along with the temperature parameter.

One think to remark on is that in the brain, the basal ganglia have direct and indirect paths for outgoing connections to the cortex. One set is excitory and the other inhibitory. What insights does this provide for machine learning, and for queries for database modules?

Emotions and Social Interaction

Looking further out, a computational model of feelings and emotions may be appropriate to guide attention and decision making. This is also relevant to human computer collaboration, making the difference between being warm, caring and fun to work with versus cold and uncaring. This is where ideas developed in sociology are likely to be very relevant. At a higher level, involving reflective thinking, cognitive agents could be designed to apply ethical principles in compliance with future legislation, and held to higher standards than human citizens.

Emotions play an important evolutionary role in respect to the survival of a species. At the most basic level, pain directs organisms to take immediate action to remove themselves from a cause of physical harm, e.g. burns from fire, damage from thorns or biting predators. Emotions are also at play in respect to fear of predators, interest in prey, courtship, mating and care of eggs and offspring. This can be seen as a computational process relating to the anticipated future reward or penalty for the outcome of particular behaviours, as well as to the observed difference between the expected and actual reward or penalty for a given behaviour.

Many species live in social groups, e.g. social insects such as ants, bees and termites, schooling fish, meerkats, wolves, elephants, apes and humans, to name just a few. Human social interaction is complex, and it can be argued that the evolutionary benefits of different behaviours is reflected in the wide range of emotions we can experience. Our ability to function effectively as members of a social group depends on our ability to construct workable models of other people and ourselves. The anterior cingulate cortex (ACC) has been shown to play a key role in how we appraise future reward or penalty, and how we resolve conflicting emotions, e.g. when we are torn between immediate self-interest and our desire to help those close to us.

Paul Ekman has worked extensively on how basic emotional attitudes are communicated through facial expressions: anger, sadness, fear, surprise, disgust, contempt, and happiness. An expanded list includes amusement, contentment, embarrassment, excitement, guilt, pride in achievement, relief, satisfaction, sensory pleasure, and shame. Emotional reactions that need to be executed rapidly, are appraised in an automatic, unreflective, unconscious or preconscious way. Emotions can also be subject to slow, deliberate and conscious thought processes.

Psychologists use the terms valence for whether an emotion is positive or negative, and arousal to describe degree of intensity, ranging from passive to active. Emotions can thus be considered to have an intensity and a direction. The details vary across theories, e.g. Russell's circumplex model versus Bradley et al's vector model.

emotions distributed in a circle with valence and arousal as the two axes
Russell's circumplex model of emotions

For a cognitive agent, we can choose which theory to apply, for instance, using a pair of numeric properties to represent the valence and arousal. Another choice, could be to use an enumerated property for the emotional attitude and a numeric value for the intensity. These correspond to the difference between cartesian and polar coordinates. These values can be adjusted by the execution of rules. Other rules are conditional on emotions, e.g. to direct behaviour and to resolve conflicting emotions. When reasoning about a choice between alternative courses of action, we need a way to compute their likely effects on the emotional state. This will be effected by past experience, and memories of previous events.

The rules that compute and act upon the emotional state can be regarded as heuristics for guiding appraisal and decision making. Such heuristics are fast compared to more extensive deliberative reasoning, but can lead to making what in hindsight were the wrong choices. This relates to Daniel Kahneman's ideas on System 1 vs System 2 in his book "Thinking fast and slow". He write: "System 1" is fast, instinctive and emotional; "System 2" is slower, more deliberative, and more logical. Most people tend to place too much confidence in human judgement (System 1).

When two people are talking with one another, gaze direction, facial expressions, head movements and hand gestures provide a complementary non-verbal communication channel, signaling overt or covert emotional state and attention. What would be needed for cognitive agents support non-verbal communication? This calls for rapid evaluation and generation. The intents include emphasis on particular parts of an utterance, and an emotional overlay that reinforces what is being said. Non-verbal communication is also used when you are listening to someone, e.g. to signal your interest, your emotional response, and to signal your acknowledgement of specific points. For both speaker and listener, this involves reasoning about the emotional implications and goals of the utterance.

This level of sophistication will need to build upon progress with earlier work on demonstrating how emotions can serve as a heuristic means to direct behaviour. Similarly, work on exploring humour as part of human-machine collaboration will need to build upon progress in dealing with non-verbal communication.

A lot has been written about consciousness and whether it will ever be possible to build conscious machines. For this work, a simpler position is taken which enables cognitive agents to be aware of themselves and others, and have access to a record of their experience, goals and performance, through workable models of episodic memory. This further involves the means for cognitive agents to pause to reflect on their performance and goals. This relates to task management and attention as described in an earlier section, and can be likened to an operating system that manages the use of the central processing unit by a large set of running programs.

An open question for the design of cognitive agents is whether emotional appraisal can be integrated into the main loop for rule execution or whether a complementary system is needed that rapidly maps stimuli to responses, e.g. using some form of discrimination network. Either way, the cues and their interpretation in the current context require sophisticated models of social interaction. What kinds of use cases and datasets are needed to explore this?

Natural language processing

Natural language processing is needed to take a sequence of words and translate them into a network of chunks, and vice versa for natural language generation. This is a statistical process that needs to take the current context into account. The lexicon describes knowledge about words and their meanings. The starting point is to identify the likely part of speech for each word, as a basis for forming a dependency tree. Words often have multiple meanings, and a spreading activation model can be used to account for priming effects in picking the most likely meaning in the current context. In English, verbs have patterns of slots which can be filled by the subject, object and prepositional phrases. This process of attachment also needs to take the semantic context into account. A further challenge is the binding of references from nouns and pronouns. These processes make use of graph algorithms for short and long term memory.

Individual natural language utterances take place in the context of a dialogue which itself is part of an social interaction model. The literal meaning needs to be supplemented by an emotional understanding, and this is guided by non-verbal communication that takes place concurrently. Natural language usage patterns evolve with practice and by listening to others. This has implications for how the lexicon and related knowledge is updated during natural language dialogues.

Natural language involves a great deal of everyday knowledge and so called common sense skills. This will depend on being able to teach and assess these skills through a series of lessons, starting with a core framework of built-in declarative and procedural knowledge that needs to be developed manually. There are plenty of opportunities for cognitive agents where natural language interaction is limited to a controlled subset of language. Richer use of language can come later and build upon experience gained with simpler systems.

Cognitive Databases

The architecture for chunk rules involves memory modules that act as cognitive databases, and which are accessed using a request/response pattern analogous to the way that Web pages are retrieved with HTTP. Cognitive databases have the potential to store vast amounts of information, similar to the human cortex. The analogy with Web architecture is further strengthened by the way that information is recalled based upon which chunks are likely to be the most valuable given prior knowledge and past experience. This can be likened to Web search engines which seek to provide the results most relevant to a particular user. There is no intent, and no point, in providing the complete set of matches, given huge scale of the Web.

Cognitive databases could support a variety of graph algorithms to support a range of cognitive tasks, e.g.

There can be a many to many relationship between cognitive agents and cognitive databases. This allows a single cognitive database to be shared with many cognitive agents via encrypted protocols such as HTTPS and Web Sockets Secure (WSS). Some information would be accessible by all agents, whilst other information would be restricted to a single agent or group of agents. This is made possible through the use of chunk contexts, see the earlier section on reasoning from multiple contexts.

The initial implementations are designed to work within Web pages for ease of demonstration. These demos fetch the database from the Web server hosting the Web page. Larger demos will require implementations that can scale to much larger databases, e.g. Gigabytes rather than Megabytes. This could be addressed through memory mapped files managed through a Web server. For even larger databases, it will become necessary to use a federated approach across server farms, with the means to run graph algorithms in a distributed way, and then gather the results back to respond to a given query. Redundancy will be needed to cope with the inevitable component failures to be expected in any sufficiently large system.

Further work is needed to consider how to scale rule databases, which need tighter integration with cognitive agents to provide the indexing speed needed for fast rule execution. In principle, machine learning across many cognitive agents could be pooled to accelerate learning. Another perspective would be provided by enabling open markets of declarative and procedural knowledge for specific application areas. As with today's software packages, licenses would be needed to describe the terms and conditions of use. Clients would benefit from regular updates.

Integration with RDF

To relate chunks to RDF, you could use @rdfmap. For instance:

@rdfmap {
  dog http://example.com/ns/dog
  cat http://example.com/ns/cat
}

It would be easy to also support @prefix for defining URI prefixes, e.g.

@prefix p1 {
  ex: http://example.com/ns/
}
@rdfmap {
  @prefix p1
  dog ex:dog
  cat ex:cat
}

It may be more convenient to refer to such mappings rather than inlining them, e.g.

@rdfmap from http://example.org/mappings

Note: people familiar with JSON-LD would probably suggest using @context instead of @rdfmap, however, that would be confusing given that we want to use the term context in respect to reasoning in multiple contexts.

Further work is needed to consider use cases where integration with existing RDF based systems is important. It is likely that this will also involve work on context sensitive mapping of data between different vocabularies. It is anticipated that such mappings could be learned from a curated set of examples, by analogy with modern approaches to machine translation of human languages.

Demonstrators for Chunk Rules

Here are some demos as proof of concept:

Counting from 3 to 8 with support for single step execution.
Demo that models how people count by remembering which number comes next.
Simple decision trees
Demo for how rules can be used to decide whether to play golf.

Further demos are planned, e.g. task management, cyber-physical control, inductive learning from examples, reinforcement learning of rulesets, different kinds of reasoning, including reasoning from multiple perspectives, and natural language processing. Suggestions of use cases are welcomed. If you are interested in collaborating on these, please get in touch.

Dave Raggett <dsr@w3.org>

eu logo This work is supported by the European Union's Horizon 2020 research and innovation programme under grant agreement No 780732, project Boost 4.0