An alternative to model Roles

From Provenance WG Wiki
Revision as of 17:50, 12 December 2011 by Tlebo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Title: Using RDFS Subproperties to Model Roles of Involvement; a PROV-O Compliant Abbreviation.

author: Daniel Garijo , Tim Lebo.

contributors: Yolanda Gil.

Objective

Roles are modeled in PROV-O as links from qualified involvements. The example below, based in the PROV-O modeling for roles, describes an Activity (:a) that uses two Entities (:i1 and :p1), as io:input and as io:parameters. These roles are described in a common, third party ontology, and are associated to our entities :i1 and :p1 by adding the prov:qualifiedUsage construct.

:a
  a prov:Activity;

  prov:used :i1;
  prov:qualifiedUsage [
     a prov:Usage;
     prov:qualifiedEntity   :i1;
     prov:hadRole            io:input;
  ];

  prov:used :p1;
  prov:qualifiedUsage [
     a prov:Usage;
     prov:qualifiedEntity   :p1;
     prov:hadRole            io:parameters;
  ];
.

The previous modeling requires an additional OWL class prov:Usage and two additional ObjectProperties (prov:qualifiedUsage and prov:hadRole) to qualify how the Activity used the Entities, making the model larger and more complex. The size and complexity increases because a new class (e.g. Control) and ObjectProperty (e.g. qualifiedControl) must be created to qualify each of PROV-O's binary relations (wasControlledBy, generated, hadParticipant etc).

The modeling above offers an advantage because arbitrary descriptions of the used, wasControlledBy, and other binary relations of PROV-O can be added by describing the instances of QualifiedInvolvement (Usage, Control, etc). For example, someone concerned about the port number of the input can add the extra triple in:

:a
  a prov:Activity;

  prov:used :i1;
  prov:qualifiedUsage [
     a prov:Usage;
     prov:qualifiedEntity   :i1;
     prov:hadRole            io:input;
     ex:portNumber          3456;
  ];
.

The motivation for this modeling is the scenario where the user wants to qualify the "used" or "wasGeneratedBy" relationship with a role, but does not want to add additional metadata to the relationship. No time, location, description, etc. Just the role played by the artifact or process. This is a very common scenario in the scientific domain: scientists are interested in knowing the data dependencies and roles, but don't care about other details.

In what follows, we offer an abbreviated model for Roles of prov:used and prov:wasGeneratedBy that is conformant to the PROV-O modeling above while fulfilling a very common use case that does not need the full complexity of QualifiedInvolvements. The following is an offer for a Best Practice.

An alternative to qualifiers

This best practice models roles by extending the usage and generation properties by creating domain-specific RDFS subproperties. An example is illustrated in the figure below:

RoleModeling.jpg

SUGGEST (TIM): make subproperties wasGeneratedAsDataBy and wasGeneratedAsMetadataBy to illustrate multiple out modeling. (I don't understand this very well (DANIEL))

In this example both "prov:used" and "prov:wasGeneratedBy" have been extended for the current domain with subproperties adding the name of the roles to the edges. We suggest to the ex:usedAsX pattern (or ex:wasGeneratedByAsX pattern), where X is the domain-specific role.

Discovering role information

Regarding the previous figure, it is clear that we don't need a qualified involvement to add just roles to the edges. However, how should we proceed to discover the information of the role? Is it more complex?

To answer this question we would have to query the domain ontology and the instances. Given a known entity, the next SPARQL query would retrieve the process (or processes) that used it and additional information about the role.

select ?process ?usedRole ?info where{
 example:artifactID ?usedRole ?process.
 ?usedRole rdfs:subpropertyOf provo:used.
 ?usedRole dcterms:title ?info.
} 

(Assuming that the additional information is described with a dcterms:title).

TODO(Suggested by Tim): get the query to work for BOTH specializations and the standard.

Proposed query (Daniel). We don't know which approach has been taking and we are interested in the name of the role:

select ?process ?roleTitle ?info where{
 example:artifactID ?role ?process.
 OPTIONAL{
   ?processprov:qualifiedUsage ?usage.
   ?usage ?prov:hadRole ?r.
   ?r dcterms:title ?roleTitle
 }
 OPTIONAL{
   ?usedRole rdfs:subpropertyOf provo:used.
   ?usedRole dcterms:title ?roleTitle.
 }
}