Warning:
This wiki has been archived and is now read-only.

Usecases/ShortestPath

From Property Graphs Model and API Community Group
Jump to: navigation, search

Shortest Path Usecases

Protein Interaction Paths

Use Case: Protein-protein interaction networks are scale-free and String (http://string-db.org/) for example, offers cross-species data to suggest new directions for exploration in a particular species.

One very common task in bioinformatics is to follow the interaction paths (edges) between two proteins and to select the path that has the highest confidence score. That is to say the edges in the graph are scored and one task is to find the best scoring path between two or more proteins.

A number of relationships are represented and weighted between proteins by the String 9.1 database but the details are beyond the scope of this use case.

Paths from protein A to Z. Fictional paths and weights: A - .05 - cc - .10 - dd - .05 - hetc. until it reaches Z Another fictional path.A - .10 - ee - .15 - jj - .05 - k etc. until it reaches Z. A weighted path collects the values of edges between proteins.The question is which path between A and Z has the greatest weight?

"...finding the best scoring path connecting two proteins in the STRING network is a crucial part of the NetworKIN algorithm ( Linding et al., 2007)." (1)

The authors imported a database with 20,140 proteins and 2.2 million interactions. There are confidence scores assigned to each interaction between proteins.

The advantage of a property graph database is that it stores its edges as directed pointers so traversal is much faster than the equivalent relational database. The same holds true for a weighted graph, except there a property graph can quickly find the weights for the various paths and return the best path between two proteins.


Neo4j, a popular property graph database outperformed PostgreSQL by a factor of 981x on the best scoring path test.

Property graphs, where appropriate, can be the targets of highly efficient algorithms designed for graph data.

References:

(1) Are graph databases ready for bioinformatics? Christian Theil Have and Lars Juhl Jensen. Bioinformatics (2013) doi: 10.1093/bioinformatics/btt549 http://dx.doi.org/10.1093/bioinformatics/btt549