Bibliography Tags

From XG Provenance Wiki
Revision as of 11:30, 4 August 2010 by Bvillazo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

To prepare for our state of the art report, we discussed how to best cross-reference existing work on provenance with our scenarios, provenance dimensions, and requirements. There are many pieces of relevant work, see the compilation of related work assembled by the group so far. A major concern was the large amount of effort required to cite all the relevant work appropriately.

We decided to annotate bibliography entries in the Mendeley collection with tags. Major advantages of this approach are that the work can be distributed among group members, that people outside the group will be able to contribute to the process, and that the collection of related work can continue to grow beyond the end date of the W3C Provenance Incubator Group.

There will be a period where only members of the group will be tagging the collection. After that, tagging will be open to anyone in the community and there will be no constraints on the process.

Current state of tagging process

Here is a summary of what still needs to be done.

Here is a simple frequency count of the tags used so far -- please note that this is currently updated manually (daily) until I find a way to automate the upload.

If anyone wants to update the spreadsheet:

1.- Download Mendeley Desktop

2.- Acces your sqlite file (You can see where is it located here).

3.- Query the the sqlite database. Example query:

select tag, count(*) as count from DocumentTags where tag not in ('#tagged', '#provenance', 'prov-xg') group by tag

4.- Copy&paste the results into the spreadsheet.

Tagging Process

For the bibliography tagging done by the group we agreed to the following process:

  • Everyone in the group would be allowed to create tags for papers in the collection.
  • There would be a single requirement for tagging:
    • Every paper it should be given at least one of the major provenance dimension tags:
      • Content
      • Management
      • Use
  • General to dos
  • Tag papers with one of the 3 scenarios when possible
  • Tag provenance related papers with the #provenance tag.
  • There would be no other mandatory requirements for the tagging process.
  • There would be no other standard tags, the tags provided below are simply suggestions.
  • We would revise the process if needed once we see how it works.
  • The goal is for each group member to tag at least three to five papers per week.

How to tag entries in the collection

    • download and install the desktop Mendeley client at [1]
    • login as provxg@googlemail.com. Please email pmissier@acm.org to get the password
    • hit "sync" on the toolbar to make sure your local working copy is current

Tagging existing entries

  • choose entries from the prov-xg public collection. Add tags in the right panel on your desktop, following the guidelines on this page.
  • please remember to add the special tag #tagged which makes it easy for us to count which papers have been tagged
  • please send email to the list with the list of papers you have tagged
  • at the end of the session, hit Sync again to make sure you publish your work
  • note that you may consult the attached pdf if available just by clicking on the pdf icon on the entry

adding a new entry

there are several ways to add a new paper

  1. File -> add entry manually. Tedious
  2. if you can start off with a bibtex entry: paste the entry into a tmp file, save the file, then open it with File->Add files from the M. desktop
  3. you can also install an autoimport feature that will grab citation info from a web page. please see Tools-> install web importer. Follow instructions to add an entry.

The import actually happens on the web collection rather than on your desktop, so you will need to sync again to see the new entry locally.

 Important: auto import is almost always poor quality. The good news is that they are automatically placed in the "needs review" collection where you can find and edit them manually.
 When you do this you will find a "search by document title" button which will do an exact title search on google scholar. That really helps you get a cleaner entry!
please check your new entry for accuracy and curate as needed!!!

If you upload a paper but do not add tags to it, please add the #untagged tag so everybody knows easily which papers are still untagged.

If you want to cite a technology, you have to follow the previous steps. The difference is that when you select the type of document you are uploading you just have to select if it is a computer program or a web page

tag format

  • the recommended tag format is #<tag>, (as in tweeter for example), which makes it easier to distinguish tags from word occurrences anywhere else in the entry -- Mendeley has a flat index over the entire entry (and pdf text as well) so the # is used to improve precision in the search.

One concern is that in Mendeley there is no way to track the provenance of the tags, so we cannot see who contributed what.

Tag categories

There are five major suggested categories for tags:

  • Provenance dimensions
  • Scenarios
  • Technical solutions
  • Research areas
  • Application areas

These are just suggestions for tags, anyone should be able to use any tags they prefer.

We agreed to focus on the first two levels of the provenance dimensions used in the group's requirements report:

  • Content
    • Object
    • Attribution
    • Process
    • Evolution
    • Justification
    • Entailment
  • Management
    • Publication
    • Access
    • Dissemination_control
    • Scale
  • Use
    • Understanding
    • Interoperability
    • Comparison
    • Accountability
    • Trust
    • Imperfections
    • Debugging

The three scenarios used in the group's requirements report:

  • News_Aggregator
  • Disease_Outbreak
  • Business_Contract

User and technical requirements are labeled, the labels can be used as tags.

Technical solutions include:

  • OPM
  • PML
  • DC
  • PV
  • RDF_Named_Graphs

Research areas include:

  • Workflow_Provenance
  • Data_Provenance
    • Annotation_Models
    • Explicit_Provenance
      • Where_Provenance
    • Implicit_Provenance
      • How_Provenance, Why_Provenance
    • Lineage
  • Provenance_for_SPARQL
  • RDF_Provenance
  • Trust_Assessment