# Overview

The idea of this document is to sketch what aspects of the provenance model can be formalized and how they can be formalized, as a first step towards establishing a consensus on the (intended) meaning of the components of the model and the consistency constraints or inferences that can be applied to the model to distinguish good from bad provenance records.

This document is intended to synchronize with PROV-DM WD3.

## Status

This is the WD3 semantics, which is being updated to become consistent with WD5.

## Idea of the semantics

As a starting point, I will assume that we intend the assertions made in a PROV-DM instance to be intended to describe one, consistent state of the world, much like a logical formula is said to be satisfied in a mathematical model. That is, I propose an approach similar to that taken in model theory, where the PROV-DM instance corresponds to a formula or theory of a logic, and the semantics corresponds to what logicians call a model.

For example, the formula $\forall x. x \geq 0$ is satisfied in a mathematical model where the individual elements denotable by $x$ are natural numbers, and the ordering relation symbol $\geq$ is interpreted as the usual ordering relation on natural numbers. Here, the goal is to come up with a plausible "intended model" for interpreting PROV-DM instances, where the formulas are assertions in PROV-DM and the individuals are things and agents. This is complicated by the fact that many statements about provenance involve talking about objects that change over time.

The modeling methodology I'm using below is similar to that used in some (old, but still relevant) work on preservation: see [1]

The word "world" is used in PROV-DM to talk about the actual state of affairs that the PROV-DM instance describes, which is what I would usually call a "model". The word "model" is used in PROV-DM mainly in the sense of "data model", that is, to talk about what I would otherwise call the syntax of PROV-DM. To avoid confusion with the uses of terms in PROV-DM, I will use "world model" to describe the mathematical structure that corresponds to actual state of affairs, and will try to avoid ambiguous, unqualified uses of the word "model".

# Basics

I will use syntax for PROV-DM records (which I will usually call formulas) as described in the third working draft of PROV-DM (PROV-DM 3PWD).

A PROV-DM instance consisting of assertions $\phi_1$...$\phi_n$ is interpreted as a conjunction, that is, the overall instance is considered to hold if each atomic formula in it holds.

The rest of the document will discuss the structure of the worlds and when an atomic assertion holds in a given world.

## Identifiers

A lowercase symbol $x,y,...$ on its own denotes an identifier. Identifiers may or may not be URIs. I view identifiers as being like variables in logic (or blank nodes in RDF): just because we have two different identifiers $x$ and $y$ doesn't tell us that they denote different things, since we could discover that they are actually the same later.

## Times and Intervals

We assume an ordered set $(T,\leq)$ of time instants. This could be a total order (i.e. a linear timeline) or a partial order describing events that are not necessarily reconciled to a single global clock.

We also consider a set $Intervals$ of sets of time instants closed under "convexity", that is, such that if $t_1 \leq t \leq t_2$ and $t_1,t_2 \in I$ then $t \in I$. Sets of the form ${t \mid t_1 \leq t \leq t_2}$ are called basic intervals - there can also be other intervals.

The common case is that $T$ is a finite, linear order, for example, the set of time instants in a single place/time zone. In this case, we can represent all intervals as basic intervals, giving only the starting and ending time $[t_1,t_2]$. (If the order is infinite or dense, like the reals or rational numbers, then we need open intervals and infinite intervals too.)

### Example

First consider an ordinary linear order $Monday < Tuesday < Wednesday < Thursday < Friday$. Then all of the intervals are of the form $[Monday,Wednesday]$ or $[Tuesday,Friday]$.

Now consider the time point set $\{Monday,Tuesday, Tuesday, WednesdayNY, WednesdayCA\}$, where $Monday < Tuesday < WednesdayNY$ and $Monday < Tuesday < WednesdayCA$ are two basic intervals. The set of all time points in this example is also an interval, but it cannot be defined by taking just two time points.

## Attributes and Values

We assume a set $Attributes$ of attribute labels and a set $Values$ of possible values of attributes.

# Formulas

Summary of syntax of formulas from PROV-DM.

$\begin{array}{rcl}  formula &::=& element\_formula\\ & | & relation\_formula\\ element\_formula &::= &entity(id,attrs) \\ & |& activity(id,st,et,attrs)\\ & |& agent(id,attrs)\\ relation\_formula &::=& wasGeneratedBy(id,e,a,t,attrs)\\ & |& used(id,e,a,t,attrs)\\ & |& wasAssociatedWith(id,ag,act,pl,attrs)\\ & |& wasStartedBy(id,a,ag,attrs)\\ & |& wasEndedBy(id,a,ag,attrs)\\ & |& actedOnBehalfOf(if,ag2,ag1,act,attrs)\\ & |& wasDerivedFrom(id,e2,e1,act,g,u,attrs)\\ & |& wasDerivedFrom(id,e2,e1,t,[prov:steps=single,attrs])\\ & |& wasDerivedFrom(id,e2,e1,t,[prov:steps=any,attrs])\\ & |& alternateOf(e1,e2)\\ & |& specializationOf(e1,e2)  \end{array}$

Some additional features defined in PROV-DM 3PWD but not yet handled in the semantics are omitted. We exclude from attention derivation, annotations, accounts, and containers, since these features are in flux and non-obvious how to formalize. Also, we omit identifiers and attributes on alternateOf and specializationOf, anticipating that these features will be removed in future drafts.

# World Models

## Things

Things are things in the world. Each thing has a lifetime during which it exists and attributes whose values can change over time.

To model this, a world model $W$ includes

• a set $Things$ of things
• a function $lifetime : Things \to Intervals$ from objects to time intervals
• a function $value : Things \times Attributes \times Times \to Values$

Note that this description does not say what the structure of an object is, only how it may be described in terms of its time interval and attribute values. An object could just be a record of fixed attribute values; it could be a bear; it could be the Royal Society; it could be a transcendental number like $\pi$. All that matters from our point of view is that we know how to map the object to its time interval and attribute mapping.

It is possible for two Things to be indistinguishable by their attribute values and lifetime, but have different identity.

## Objects

A Object is described by a time interval and attributes with unchanging values. Objects encompass entities, interactions, and activities.

To model this, a world includes

• a set $Objects$
• a function $lifetime : Objects \to Intervals$ from objects to time intervals
• a function $value : Objects \times Attributes \to Values$

Intuitively, $lifetime(e)$ is the time interval during which object $e$ exists; and for each $t \in lifetime(e)$, the value $value(e,a)$ is the value of attribute $a$ during the object's lifetime.

As with Things, it is also possible to have two different objects that are indistinguishable by their attributes and time intervals.

### Entities

An entity is a kind of object that describes a time-slice of a thing, during which some of the thing's attributes are fixed. We assume:

• a set $Entities \subseteq Objects$ of entities, disjoint from $Activities$ and $Events$ below.
• a function $thingOf : Entities \to Things$ that associates each Entity with a Thing, such that for each $t \in lifetime(obj)$, and each attribute defined by $obj$, we have $value(obj,a) = value(thingOf(obj),a,t)$

#### Plans

We identify a specific subset of the entities called plans, $Plans \subseteq Entities$.

#### Agents

An agent is an entity that can act, by controlling, starting, ending, or participating in activities. Agents can act on behalf of other agents. We introduce:

• a set $Agents \subseteq Entities$ of agents, which is disjoint from $Plans$ and also necessarily disjoint from $Activities$ and $Events$

### Interactions

We consider Interactions which are split into Events between entities and activities, Associations between agents and activities, and Derivations that describe chains of generation and usage steps. (The first two sets may overlap.)

$Interactions = (Events \cup Associations) \uplus Derivations$

#### Events

An Event is an interaction whose lifetime is a single time instant, and relates an activity to an entity (which could be an agent). Events have types including usage, generation, starting and ending (possibly more may be added such as destruction/invalidation of an entity). Events are instantaneous. We introduce:

• A set $Events \subseteq Interactions$ of events.
• A function $time : Events \to Times$ giving the time of each event; i.e. $lifetime(evt) = {time(t)}$.
• The derived ordering on events given by $evt_1 \leq evt_2 \iff t(evt_1) \leq t(evt_2)$
• A function $type: Events \to \{start,end,use,generate\}$ such that Events have types in $\{start,end,use,generate\}$.

#### Associations

An Association is an interaction relating an agent to an activity. Associations can overlap with events; for example, a start event is also an association. To model associations, we introduce:

• A set $Associations \subseteq Interactions$, such that every event $evt \in Events$ that is a start or end event is also an association. That is, $type(evt) \in \{start,end\}$ implies $evt \in Associations$

Associations are used below in the $ActsFor$ and $AssociatedWith$ relations.

#### Derivations

A Derivation is an interaction chaining one or more generation and use steps. Derivations can also carry attributes, so we introduce an explicit kind of interaction for them that can carry attributes.

• A set $Derivations \subseteq Interactions$, disjoint from $Events \cup Associations$.

See below for the associated derivation path and DerivedFrom relation.

### Actvities

An activity is an object that encompasses a set of events. We introduce

• a set $Activities \subseteq Objects$ of activities, disjoint from $Entities$ and $Events$

## Relations

(ISSUE-214: Luc suggests replaces $Generated,EventActivities$ with $Generated : Events \times Entities \times Activities$ and similar for $Used$).

### Simple relations

The entities, interactions, and activities in a world model are related in the following ways:

• A relation $Used \subseteq Events \times Entities$ saying when an event used an entity. An event can use at most one entity, and if $(evt,e)\in Used$ then $time(evt) \in lifetime(e)$ must hold.
• A relation $Generated \subseteq Events \times Entities$ saying when an event generated an entity. An event can generate at most one entity, and an entity is generated by at most one event, and if $(evt,e)\in Generated$ then $min(lifetime(e)) = time(evt)$ must hold.
• A relation $EventActivity \subseteq Events \times Activities$ associating activities with events, such that $(act,evt) \in EventActivity$ implies $time(evt) \in lifetime(act)$.
• A relation $AssociatedWith \subseteq Association \times Agents \times Activities \times Plans^?$ indicating when an agent is associated with an activity, and giving the identity of the association relationship, and an optional plan. This must satisfy the constraint that if $(assoc,ag,act,pl) \in AssociatedWith$ then $lifetime(assoc) \subseteq lifetime(ag) \cap lifetime(act)$, i.e. the duration of the association must be within the lifetime of both the agent and the activity.
• A relation $ActsFor \subseteq Agents \times Agents \times Activities$ indicating when one agent acts on behalf of another with respect to a given activity. This must satisfy the constraint that if $(ag_2,ag_1,act) \in ActsFor$ then there is an association such that $(assoc,ag_1,act),(assoc,ag_2,act) \in AssociatedWith$, hence, both agents' lifetimes must overlap with some common time point in the lifetime of the activity.

### Derivation paths and DerivedFrom

Recall that above we introduced a subset of interactions called Derivations. These identify paths of the form

$ent_n\cdot g_n\cdot act_n\cdot u_n\cdot ent_{n-1}\cdot ...\cdot ent_1\cdot g_1\cdot act_1\cdot u_1\cdot ent_0$ where the $ent_i$ are entities, $act_i$ are activities, $g_i$ are generations, and $u_i$ are usages.

Formally, we consider the (regular) language:

• $DerivationPaths = Entities \cdot (Events \cdot Activities \cdot Events \cdot Entities)^+$

and we use this language to give meaning to derivations:

• A relation $derivedFrom : Derivations \to DerivationPaths$ with the constraints that for each derivation path:
1. for each substring $ent\cdot g \cdot act$ we have $(g,ent) \in Generated$ (hence, $type(g) = generation$) and $(g,act) \in EventActivities$, and
2. for each substring $act \cdot u \cdot ent$ we have $(u,ent) \in Used$ (hence, $type(u) = used$) and $(u,act) \in EventActivities$.

Note: The reason why we need paths and not just individual derivation steps is that imprecise-n wasDerivedFrom edges can represent multiple derivation steps. Without this, we can simplify here.

## Putting it all together

A world model W is a structure containing all of the above described data. If we need to talk about the objects or relations of more than one world model then we may write $W_1.Objects$; otherwise, to decrease notational clutter, when we consider a fixed world model then the names of the sets, relations and functions above refer to the components of that model.

# Semantics

In what follows, let $W$ be a fixed world model with the associated sets and relations discussed in the previous section, and let $I$ be an interpretation of identifiers as objects in $W$.

The annotations [WF] refer to well-formedness constraints that could in principle be enforced by requiring a consistent typing of identifiers (i.e. by assuming that each identifier is associated with and can only denote an element of Entity, Agent, Note, Activity, Event, Association or Note). These could be factored out or eliminated for readability.

## Interpretations

Note that the above discussion does not mention identifiers. We need to add structure to the world to link identifiers to the objects they denote. We do this using a function which we shall call an interpretation.

The mapping from identifiers to objects may not change over time. Thus, we consider interpretations as follows:

• An interpretation function $I : Identifiers \to Objects$ describing which object is the target of each identifier. This is time-independent.

## Satisfaction

Consider an assertion $\phi$, a world $W$ and an interpretation $I$. We define notation $W,I \models \phi$ which means that $\phi$ is satisfied in $W,I$. For basic assertions, the definition of the satisfaction relation is given in the next few subsections. For a conjunction of assertions $\phi_1,\ldots,\phi_n$ we write $W,I \models \phi_1,\ldots,\phi_n$ to indicate that $W,I \models \phi_1$ and ... and $W,I \models \phi_n$ hold.

## Attribute validity

Objects have attributes, and many

We say that an object $obj$ has valid attributes $[attr_1=val_1,...]$ in world $W$ provided:

• for each attribute $attr_i$, we have $W.value(obj,attr_i) = val_i$.

This is sometimes abbreviated as: $valid(W,obj,attrs)$

## Semantics of Element Records

### Entity Records

PROV-DM refers to entity assertions $entity(id,attrs)$.

Entity assertions $entity(id,attrs)$ can be interpreted as follows:

• $W,I \models entity(id,attrs)$ if and only if:
1. [WF] $id$ denotes an entity $ent = I(id) \in Entities$
2. the attributes are valid: $valid(W,ent, attrs)$.

### Activity Records

An activity record is of the form $activity(id,plan,st,et,attrs)$ where $id$ is a identifier referring to the activity, $st$ is a start time and $et$ is an end time. There is also a set of attribute-value pairs describing additional features of the activity. We don't do anything to interpret the plan here, it is purely an annotation (e.g. pointing to a web page describing the program or service that was executed, to source code, or to some other information that might be useful).

• We say that $W,I \models activity(id,st,et,attrs)$ if and only if:
1. [WF] The identifier $id$ maps to an activity $act = I(id) \in Activities$
2. If $st$ is specified then it is equal to the start time of the activity, that is: $min(lifetime(id)) = st$
3. If $et$ is specified then it is equal to the end time of the activity, that is: $max(lifetime(id)) = et$
4. The attributes are valid: $valid(W,act,attrs)$.

### Agent Records

For the purpose of the semantics so far, an agent record holds at a time instant or interval if the corresponding entity record holds and the entity denoted by the record identifier is an agent.

Agent assertions $agent(id,attrs)$ can be interpreted as follows:

• $W,I \models agent(id,attrs)$ if and only if:
1. [WF] $id$ denotes an agent $ag = I(id) \in Agents$
2. The attributes are valid: $valid(W,ag,attrs)$.

### Note records

Out of scope of the formalization.

## Semantics of Relations

### Entity-Activity

#### Generation

The generation assertion is of the form $wasGeneratedBy(id,e,a,t,attrs)$ where $id$ is an event (identifier), $e$ is an object identifier, $a$ is an activity identifier, $attrs$ is a set of attribute-value pairs, and $t$ is an optional time.

• $W,I \models wasGeneratedBy(id,e,a,t,attrs)$ if and only if:
1. [WF] The identifier $id$ denotes an event $evt = I(id) \in Events$
2. [WF] The identifier $e$ denotes an object $obj = I(e) \in Objects$
3. [WF] The identifier $a$ denotes an activity $act = I(a) \in Activities$.
4. The event $evt$ is involved in $act$, that is, such that $(evt,act) \in EventActivities$.
5. The type of $evt$ is $generation$, i.e. $type(evt) = generation$
6. The event $evt$ occurred at time $t$, i.e. $time(evt) = t$
7. [Redundant?] The object $obj$ came into existence at time $t$, that is, $min(lifetime(I(e))) = t$.
8. The event $evt$ generated $obj$, i.e. $(evt,obj) \in Generated$.
9. The attribute values are valid: $valid(W,evt,attrs)$

#### Use

The use assertion is of the form $used(id,a,e,t,attrs)$ where $id$ denotes an event, $a$ is an activity identifier, $e$ is an object identifier, $attrs$ is a set of attribute-value pairs, and $t$ is an optional time.

• $W,I \models used(id,a,e,t,attrs)$ if and only if:
1. The identifier $id$ denotes an event $evt = I(id) \in Events$.

1. [WF] The identifier $id$ denotes an event $evt = I(id) \in Events$
2. [WF] The identifier $a$ denotes an activity $act = I(id) \in Activities$
3. [WF] The identifier $e$ denotes an object $obj = I(e) \in Objects$
4. The event $evt$ is part of $act$, i.e. $(evt,act) \in EventActivities$.
5. The type of $evt$ is $use$, i.e., $type(evt) = use$.
6. The event $evt$ occurred at time $t$, i.e. $time(evt) = t$
7. [Redundant?] The entity $obj$ existed at time $t$, i.e., $t \in lifetime(obj)$.
8. The event $evt$ used $obj$, i.e. $(evt,obj) \in Used$.
9. The attribute values are valid: $valid(W,evt,attrs)$

### Agent-Activity

#### Association Records

An association record has the form $wasAssociatedWith(id,a,ag,pl,attrs)$.

• $W,I \models wasAssociatedWith(id,a,ag,pl,attrs)$ holds if and only if:
1. [WF] $assoc$ denotes an association $assoc = I(id) \in Associations$ is an association interaction
2. [WF] $a$ denotes an activity $act = I(a) \in Activities$ is an activity.
3. [WF] $ag$ denotes an agent $agent = I(ag) \in Agents$ is an agent.
4. [WF] $pl$ denotes a plan $plan=I(pl) \in Plans$ is a plan.
5. The association associates the agent with the activity and plan, i.e. $(assoc,agent,act,plan) \in AssociatedWith$.
6. The attributes are valid: $valid(W,assoc,attrs)$.

#### Start Records

A start record $wasStartedBy(id,a,ag,attrs)$ is interpreted as follows:

• $W,I \models wasStartedBy(id,a,ag,attrs)$ holds if and only if:
1. [WF] $id$ denotes an event $evt = I(id) \in Events$ that is also an association $evt \in Associations$
2. [WF] $a$ denotes an activity $act = I(a)$
3. [WF] $ag$ denotes an agent $agent = I(ag)$
4. The event $evt$ has type $start$, i.e. $type(evt) = start$.
5. The agent was associated with the activity, i.e. $(id,ag,act) \in AssociatedWith$
6. The event was part of the activity, that is, $(act,evt) \in ActivitiesEvents$, and $min(lifetime(act)) = time(evt)$.
7. The attributes are valid: $valid(W,evt,attrs)$.

#### End Records

An activity end record $wasEndedBy(id,a,ag,attrs)$ is interpreted as follows:

• $W,I \models wasEndedBy(id,a,ag,attrs)$ holds if and only if:
1. [WF] $id$ denotes an event $evt = I(id)$ that is also an association $evt \in Associations$
2. [WF] $a$ denotes an activity $act = I(a)$
3. [WF] $ag$ denotes an agent $agent = I(ag)$
4. The event $evt$ has type $end$, i.e. $type(evt) = end$.
5. The agent was associated with the activity, i.e. $(id,ag,act) \in AssociatedWith$
6. The event was part of the activity, that is, $(act,evt) \in ActivitiesEvents$, and $max(lifetime(act)) = time(evt)$.
7. The attributes are valid: $valid(W,evt,attrs)$.

### Agent-Agent or Entity-Entity

#### Responsibility

The $actedOnBehalfOf(id,ag2,ag1,act,attrs)$ relation is interpreted using the $ActsFor$ relation as follows:

• $W,I \models actedOnBehalfOf(id,ag2,ag1,act,attrs)$ holds if and only if:
1. [WF] $id$ denotes an association $assoc=I(id) \in Associations$ that is an association interaction, and $type(id) = responsibility$.
2. [WF] $a$ denotes an activity $act=I(a) \in Activities$ is an activity.
3. [WF] $ag1,ag2$ denote agents $agent1=I(ag1), agent2=I(ag2) \in Agents$ are agents.
4. The agent $agent2$ acts for the agent $agent1$ with respect to the activity $act$, i.e. $(agent2,agent1,act) \in ActsFor$.
5. [Redundant?] The association $id$ associates both agents with the activity, i.e. $(assoc,agent1,act),(assoc,agent2,act) \in AssociatedWith$.
6. The attributes are valid: $valid(W,assoc,attrs)$.

#### Derivation

##### Precise-1

A precise-1 derivation record has the form $wasDerivedFrom(id,e2,e1,a,g,u,attrs)$.

• $W,I \models wasDerivedFrom(id,e2,e1,act,g,u,attrs)$ if and only if:
1. [WF] $id$ denotes a derivation $deriv = I(id) \in Derivations$
2. [WF] $e1,e2$ denote entities $ent1 = I(e1), ent2=I(e2) \in Entities$
3. [WF] $a$ denotes an activity $act = I(a) \in Activities$
4. [WF] $g$ denotes a generation event $gen = I(g) \in Events$ and $type(I(g)) = generation$
5. [WF] $u$ denotes a use event $I(u) \in Events$ and $type(I(u)) = used$
6. The derivation denotes a valid one-step derivation $derivedFrom(deriv) = I(e2) \cdot I(g) \cdot I(act) \cdot I(u) \cdot I(e1))$
7. The attribute values are valid: $valid(W,deriv,attrs)$.

##### Imprecise-1

An imprecise-1 derivation record has the form $wasDerivedFrom(id,e2,e1,t,[prov:steps=single,attrs])$.

• $W,I \models wasDerivedFrom(id,e2,e1,t,[prov:steps=single,attrs])$ if and only if there exist $act \in Activities$, $g \in Events$, and $u \in Events$ such that:
1. [WF] $id$ denotes a derivation $deriv = I(id) \in Derivations$
2. [WF] $e1,e2$ denote entities $ent1 = I(e1), ent2=I(e2) \in Entities$
3. $type(u) = used$ and $type(g) = generation$ and $time(g) = t$
4. $derivedFrom(deriv) = ent2 \cdot g \cdot act \cdot u \cdot ent1)$
5. The attribute values are valid: $valid(W,deriv,attrs)$.
##### Imprecise-n

An imprecise-n derivation record has the form $wasDerivedFrom(id,e2,e1,t,[prov:steps=any,attrs])$.

• $W,I \models wasDerivedFrom(id,e2,e1,t,[prov:steps=any,attrs])$ if and only if there exists $path \in DerivationPaths$ such that:
1. [WF] $id$ denotes a derivation $deriv = I(id) \in Derivations$
2. [WF] $e1,e2$ denote entities $ent1 = I(e1), ent2=I(e2) \in Entities$
3. $derivedFrom(deriv)= ent2 \cdot g \cdot w \cdot ent1$ and $time(g) = t$
4. The attribute values are valid: $valid(W,deriv,attrs)$.

#### Specialization

The $specializationOf(e1,e2)$ relation indicates when one entity record "provides a more concrete characterization of" another. This concept (like alternateOf) is in flux, but we attempt to provide a formalization to aid discussion.

• $W,I \models specializationOf(a,b)$ if and only if:
1. [WF] Both $a$ and $b$ are entity identifiers, denoting $obj_1 = I(a)$ and $obj_2 = I(b)$.
2. The two Entities refer to the same Thing, that is, $thingOf(obj_1) = thingOf(obj_2)$.
3. The lifetime of $obj_1$ is contained in that of $obj_2$,i.e. $lifetime(obj_1) \subseteq lifetime(obj_2)$.
4. Every attribute $attr$ of $obj_2$ is also an attribute of $obj_1$, and for each such attribute we have $value(obj_1,attr) = value(obj_2,attr)$.

The second criterion says that the two Entities are descriptions of the same Things. Note that the third criterion allows $obj_1$ and $obj_2$ to have the same lifetime (or that of $obj_2$ can be larger). The last criterion allows $obj_1$ to define more attributes than $obj_2$, but they must agree on their common attributes.

Question: There has been controversy over whether $specializationOf$ is transitive and/oranti-symmetric:

• Transitivity: If $specializationOf(a,b)$ and $specializationOf(b,c)$ hold then $specializationOf(a,c)$ hold. This holds for the above definition.
• Antisymmetry: If $specializationOf(a,b)$ and $specializationOf(b,a)$ hold then $a=b$. This doesn't follow from the current definition (but it would if we stipulated that two entities that have the same interval, attribute and thing are equal).

#### Alternate

The $alternateOf$ relation indicates when two entity records "provide different characterizations of the same thing".

• $W,I \models alternateOf(a,b)$ if and only if:
1. [WF] Both $a$ and $b$ are entity identifiers, denoting $obj_1 = I(a)$ and $obj_2 = I(b)$.
2. The lifetime of $obj_1$ overlaps that of $obj_2$. That is, $lifetime(obj_1) \cap lifetime(obj_2) \neq \emptyset$.
3. The two objects refer to the same underlying Thing: $thingOf(obj_1) = thingOf(obj_2)$
4. For every attribute $attr$ common to both $obj_1$ and $obj_2$, we have $value(obj_1,attr) = value(obj_2,attr)$.

Question: There has been controversy whether $alternateOf$ is symmetric and transitive:

• Symmetry: If $alternateOf(a,b)$ holds then $alternateOf(b,a)$ holds.
• Transitivity: If $alternateOf(a,b)$ and $alternateOf(b,c)$ hold then $alternateOf(a,c)$ hold. This does not hold of the above definition because we require the intervals to overlap ($a$ and $c$ do not necessarily have overlapping lifetimes.)

We also consider the following properties which have been suggested:

• $specializationOf(e_1,e_2)$ implies $alternateOf(e_1,e_2)$? (This holds at the moment.)
• $alternateOf(a, b)$ if and only if there exists c such that $specializationOf(a,c)$ and $specializationOf(b,c)$? (This does not necessarily hold without further assumptions about the Entities).

Problem: The above definition is symmetric. However, it is not transitive. Suppose

• $lifetime(obj_1) = [1,2]$
• $lifetime(obj_2) = [2,3]$
• $lifetime(obj_3) = [3,4]$

and suppose $alternateOf(obj_1,obj_2)$ and $alternateOf(obj_2,obj_3)$ hold. Then $alternateOf(obj_1,obj_3)$ cannot hold since the intervals of $obj_1,obj_3$ do not overlap.

### Annotation records

As with note records, annotation records are considered out of scope of the formal semantics.

# Semantics of Bundles and Contextualization

DRAFT - VERY QUICK AND DIRTY!

## Bundles

A bundle is a named set of PROV assertions.

$\begin{array}{rcl} bundle &::=& bundle~id ~formula\_set ~ endBundle\\ formula\_set &::=& {formula_1,...,formula_n} \end{array} $


To model this, fixing a set of entities $Entities$, we define a new structure called a <emph>contextual world</emph>, consisting of a world, a set of <emph>contexts</emph> $Context$, and a function $worldOf : Contexts \to Worlds$ assigning each $c \in Contexts$ a world $worldOf(c)$.

This is analogous to a SPARQL Dataset, as I understand it (but haven't checked carefully). The main world corresponds to the toplevel assertions and the others correspond to named graphs.

### Semantics in the presence of Bundles

To determine whether a top-level PROV statement holds in a contextual world $(D,Contexts,worldOf)$, we add the contextual world as a parameter and test satisfaction in the default world:

$CW,W,I \models formula \iff ...$

These are defined as before, ignoring the world model. We start with $W = D$ to evaluate top-level assertions.

To determine whether a bundle is satisfied in a contextual world, we simply check that the set of statements it names is satisfied in the named world.

$CW,W,I \models bundle(b,{formula_1,...,formula_n}) \iff CW,CW.worldOf(I(b)),I \models formula_1 \wedge ... \wedge CW,CW.worldOf(I(b)),I \models formula_n$

NB. Contexts are entities. The entities of the contextual world are the union of those in the default worlds and context worlds, and the contexts themselves. The interpretation function maps identifiers to these entities; however, different things may be known about the entities in different worlds (corresponding to the toplevel and bundles). The identifiers of bundles are mapped to Context entities.

## Contextualization

The contextualization relation $contextualizationOf(e,e_b,b)$ links an entity $e$ declared in the current bundle with another entity $e_b$ in a remote bundle $b$.

### Semantics of contextualization

Proposal 1:

It seems that to define what we mean by contextualization we need to keep track of the current context as well as the contextual world.

• $(D,Contexts,worldOf),W,I \models contextualizationOf(e,e_b,b)$ holds if and only if:
1. $e$ identifies an entity in $W$, i.e. $obj = I(e) \in W.Entities$.
2. $b$ identifies a context $ctx$, i.e. $ctx = I(b) \in Contexts$, which in turn identifies a world $W_b = worldOf(ctx)$
3. $e_b$ identifies an entity in $worldOf(I(b))$, i.e. $obj_b = I(e_b) \in worldOf(ctx).Entities$
4. The two Entities refer to the same Thing, that is, $thingOf(obj) = thingOf(obj_b)$.
5. The lifetime of $obj$ in the current world is contained in that of $obj_b$,i.e. $W.lifetime(obj) \subseteq W_b.lifetime(obj_b)$.
6. Every attribute $attr$ of $obj_b$ in $W_b$ is also an attribute of $obj$ in $W$, and for each such attribute we have $W.value(obj,attr) = W_b.value(obj_b,attr)$.

This is not quite the same as specialization, since the attributes of the different (related) entities are in different worlds: those of $e$ are in the current world, and those of $e_b$ are in the remote world.

Need to be careful about the Things - these are also global, not local to worlds.