Difference between revisions of "Provenance Best Practice"

From MultilingualWeb-LT EC Project Wiki
Jump to: navigation, search
(Created page with "=Scope= This best practice document explains how the provRef data attribute of the ITS2.0 Provance data category can be used in conjunction with external provance records conform…")
 
Line 1: Line 1:
 
=Scope=
 
=Scope=
 
This best practice document explains how the provRef data attribute of the ITS2.0 Provance data category can be used in conjunction with external provance records conformant to the W3C PROV recommendation.
 
This best practice document explains how the provRef data attribute of the ITS2.0 Provance data category can be used in conjunction with external provance records conformant to the W3C PROV recommendation.
 +
 +
The ITS2.0 Provenance data category allows inline identification of people, organisations and tools/services that were involved in the translation or translation revision of the annotated content. The inline provenance annotation does not support recording of the timing of translation or translation revision, additional attributes related to those activites nor record provenance information related to other types of activities related to internationalization and localization. For such use cases the provRef attribute can be used to point to such information in external provenance records. The ITS specification recommends the use of the W3C PROV specification for such records. This note therefore describes best practice for structuring PROV conformant external records.
 +
 +
== The W3C Provenance Working Group ==
 +
The [http://www.w3.org/2011/prov/wiki/Main_Page Provenance WG] has produces a [http://www.w3.org/TR/prov-overview/ set of specifications] commonly referred to as 'PROV'. It consists of:
 +
*A [http://www.w3.org/TR/prov-primer/ PROV Primer]
 +
*[http://www.w3.org/TR/prov-dm/ PROV-DM], the PROV data model for provenance
 +
*[http://www.w3.org/TR/prov-constraints/ PROV-CONSTRAINTS], a set of constraints applying to the PROV data model
 +
*[http://www.w3.org/TR/prov-n/PROV-N], a notation for provenance aimed at human consumption
 +
*[http://www.w3.org/TR/prov-o/ PROV-O], the PROV ontology, an OWL2 ontology allowing the mapping of PROV to RDF
 +
*[http://www.w3.org/TR/prov-aq/ PROV-AQ], the mechanisms for accessing and querying provenance
 +
 +
As there is a growing interest in the use of RDF by the L10N and I18N community, the rest of this document will focus on the use of the RDF mapping of PROV.
 +
 +
The data category identifies the selected content as corresponding to an entity in a provenance record by specifying the provenance URI of that entity as specified in [http://www.w3.org/TR/prov-aq/ PROV-AQ]. Such an entity provenance record can possess additional attributes characterizing the content it represents. Entities in a provenance record can be associated with provenance activities, representing processes that either made use of or generated the entity. Example activity types could include: named entity recognition; source QA; machine translation; postediting or target QA. Provenance records can also specify agents that play a role in an activity, therefore have some responsibility for the activity having taken place and as a result can have that responsibility expressed by the entity being attributed to the agent. Examples of agent types could be: people acting as translators or posteditors; pieces of software such as machine translation engines, text analytics services or CAT tools; or organizations such as Language Service Providers. Provenance records can also associate timings with entity generation and usage events as well as derivative or collection relationships between entities.
 +
 
=External Provenance Usage Scenarios=
 
=External Provenance Usage Scenarios=
The ITS2.0 Provenance data category allows inline identification of people, organsiations and tools/services that were involved in the translation or translation revision of the annotated content. The inline provenance annotation does not support recording of the timing of translation or translation revision, additional attributes related to those activites nor record provenance information related to other types of activities related to internationalization and localization. For such use cases the provRef attribute can be used to point to such information in external proveance records. The ITS specification recommends the use of the W3C PROV specification for such records. This note therefore describes best practice for structuring PROV conformant external records.  
+
This best practice document introduces the following ITS usage scenarios that can be complemented by use of external provenance records.
 +
* translation and translation agent review using the ITS provenance category
 +
* localisation quality assurance review recorded in external provenance records
 +
 
 +
This document also describes how external provenance records can be used with ITS mapped onto XLIFF.
 +
 
 +
It also indicates how external provenance records can be use with content that doesn't correspond to inline ITS markup. This is accomplished by using elements of the NLP Interchange Format.
 +
 
 +
Finally, it also explains how to interlink external provenance records that are related to the same content in a L10N workflow but are stored in different triple stores.
 +
 
 +
==Extended translation and translation review provenance==
 +
 
 +
==Localization Quality Assurance Provenance==
 
   
 
   
 +
==PROV using NIF==
 +
 +
==PROV using XLIFF==
 +
 +
==Interlinking PROV record across Triple stores==
 +
It is possible for multiple entity provenance records pertaining to the same content to co-exist. This may be because two organizations record differing views of the provenance of the same content. For example, a localization client may view the whole localization workflow resulting in translated content as a single step, whereas a language service provider may record details of the QA process conducted prior to the that same content being delivered. Therefore, document content may be associated with more than one entity provenance-URI, each potentially from a different provenance store.
 +
 
=Extension to PROV Schema=
 
=Extension to PROV Schema=
 +
 +
The basic PROV schema is structured according to the figure below (taken from the [http://www.w3.org/TR/prov-primer/ PROV Primer])

Revision as of 01:30, 1 February 2013

1 Scope

This best practice document explains how the provRef data attribute of the ITS2.0 Provance data category can be used in conjunction with external provance records conformant to the W3C PROV recommendation.

The ITS2.0 Provenance data category allows inline identification of people, organisations and tools/services that were involved in the translation or translation revision of the annotated content. The inline provenance annotation does not support recording of the timing of translation or translation revision, additional attributes related to those activites nor record provenance information related to other types of activities related to internationalization and localization. For such use cases the provRef attribute can be used to point to such information in external provenance records. The ITS specification recommends the use of the W3C PROV specification for such records. This note therefore describes best practice for structuring PROV conformant external records.

1.1 The W3C Provenance Working Group

The Provenance WG has produces a set of specifications commonly referred to as 'PROV'. It consists of:

  • A PROV Primer
  • PROV-DM, the PROV data model for provenance
  • PROV-CONSTRAINTS, a set of constraints applying to the PROV data model
  • [1], a notation for provenance aimed at human consumption
  • PROV-O, the PROV ontology, an OWL2 ontology allowing the mapping of PROV to RDF
  • PROV-AQ, the mechanisms for accessing and querying provenance

As there is a growing interest in the use of RDF by the L10N and I18N community, the rest of this document will focus on the use of the RDF mapping of PROV.

The data category identifies the selected content as corresponding to an entity in a provenance record by specifying the provenance URI of that entity as specified in PROV-AQ. Such an entity provenance record can possess additional attributes characterizing the content it represents. Entities in a provenance record can be associated with provenance activities, representing processes that either made use of or generated the entity. Example activity types could include: named entity recognition; source QA; machine translation; postediting or target QA. Provenance records can also specify agents that play a role in an activity, therefore have some responsibility for the activity having taken place and as a result can have that responsibility expressed by the entity being attributed to the agent. Examples of agent types could be: people acting as translators or posteditors; pieces of software such as machine translation engines, text analytics services or CAT tools; or organizations such as Language Service Providers. Provenance records can also associate timings with entity generation and usage events as well as derivative or collection relationships between entities.

2 External Provenance Usage Scenarios

This best practice document introduces the following ITS usage scenarios that can be complemented by use of external provenance records.

  • translation and translation agent review using the ITS provenance category
  • localisation quality assurance review recorded in external provenance records

This document also describes how external provenance records can be used with ITS mapped onto XLIFF.

It also indicates how external provenance records can be use with content that doesn't correspond to inline ITS markup. This is accomplished by using elements of the NLP Interchange Format.

Finally, it also explains how to interlink external provenance records that are related to the same content in a L10N workflow but are stored in different triple stores.

2.1 Extended translation and translation review provenance

2.2 Localization Quality Assurance Provenance

2.3 PROV using NIF

2.4 PROV using XLIFF

2.5 Interlinking PROV record across Triple stores

It is possible for multiple entity provenance records pertaining to the same content to co-exist. This may be because two organizations record differing views of the provenance of the same content. For example, a localization client may view the whole localization workflow resulting in translated content as a single step, whereas a language service provider may record details of the QA process conducted prior to the that same content being delivered. Therefore, document content may be associated with more than one entity provenance-URI, each potentially from a different provenance store.

3 Extension to PROV Schema

The basic PROV schema is structured according to the figure below (taken from the PROV Primer)