From W3C Wiki

Pathology Radiology Correlation


John Madden (Pathology/Duke University), Daniel Rubin (Radiology/Stanford), M. Scott Marshall (co-chair HCLS IG), Matthias Samwald (DERI)


The American Cancer Society estimated nearly 1.4 million new cases of invasive breast cancer worldwide in 2008.[1] In 2009, there were an estimated 192,370 new cases of invasive breast cancer among women in the United States according to the American Cancer Society along with about 62,280 new cases of ductal carcinoma in situ, a noninvasive, early form of breast cancer. Each of these cases will have at least one biopsy, probably more than one (one diagnostic biopsy, at least one excision). Since about 80% of breast biopsies are benign, this means the total number of breast biopsies is well over 1 million annually in the US see eMedicine overview. In handling such a large number of biopsies, it is of course important to avoid misdiagnoses, which could lead to increased costs, both in the form of personal costs or costs resulting from inappropriate surgeries and therapies. Discrepancies between clinical reports can provide clues about potential life-saving and cost-saving corrections and check the validity of the information on which decisions about therapy and intervention are based. The discrepancies in our example use case below are between a radiological report (based on the mammogram) and the associated pathology report (based on a tissue sample from a biopsy). Catching such discrepancies at an early stage could therefore lower costs and increase effectiveness for mammograms. For example, the radiology report for a given mammogram might note a 'calcification pattern'. If the corresponding pathology report does not note finding any microcalcifications, then the biopsy might have missed the location of the radiological observation.


A New York Times article Prone to Error: Earliest Steps to Find Cancer, describes some of the motivations behind the College of American Pathologists (CAP) latest interest in improving the handling of mammograms.


Our approach is to create an RDF representation of the information contained in each of two corresponding reports: the Mammogram Report (Mammo Report) and the Pathology Report (Path Report). We would like to automatically detect a mismatch of clinical observations in a computational process that is triggered by report submission or even one that assists during report entry. In initial tests, we would like to design an RDF representation of each clinical report type that enables us to automatically detect a discrepancy as a 'graph pattern' in the RDF graph that represents the reports. We would like to look at forms of automation that can be employed to unobtrusively augment and correct any manual entry of information, both in real time and in batch mode.


  • SNOMED-CT terminology
  • Radlex Ontology
  • Translational Medicine Ontology
  • aTags - a simple convention for representing scientific assertions in RDF, based on SIOC. Some tools and web services based on aTags have been developed, such as a web-based tool for the semantic annotation of statements in documents on the web. See aTags paper for detailed description of the convention and associated resources.
  • ODIE - Ontology Development and Information Extraction
  • NCBO Annotator - The NCBO Annotator is a Web service that can be used to annotate text metadata with biomedical ontology concepts. The concepts are from ontologies in BioPortal and includes the Unified Medical Language System (UMLS) Metathesaurus, OBO Foundry candidate ontologies and many others. The Annotator can be tested via a Web interface available through BioPortal. For more information on using the Annotator Web service, see: http://bioontology.org/wiki/index.php/Annotator_Web_service

Rough Outline

Example Data

  • Assuming that we start out with a handful of de-identified reports, we add them as attachments to the wiki so that we can refer to them in conversation (and eventually from the RDF). John, Daniel: when you have assembled something to send to Anni, please let us know if we can put it directly on the wiki
  • Acquire or create a pathology report to pair with the original radiology report from Daniel (might be handled in above item) and create aTags for it.

RDF Design

  • Ennumerate the initial set of discrepancy types that we intend to detect
    • List the scenarios that we would like to demonstrate, such as: biopsy was from wrong location (left instead of right breast, upper instead of lower, ductal instead of XXX, etc.), no correlative signs of microcalcification in the pathology report, temporal discrepancy such as radiology report with a date *after* the pathology report..
  • Design an RDF representation that enables us to detect the above discrepancy types with various combinations of SPARQL queries and reasoning.

Manual annotation and RDF production

  • Use aTags to annotate a set of paired reports with SNOMED, RadLex, and TMO (Mammo and Path Reports)
  • Create a MammoTags version of aTags
    • Feature requests for MammoTags:
      • identifiers for reports (URL locators of report text?)
        • a clinical report should have a "case number" (better term for this? "study number") that refers to, for example, both the radiology and pathology report associated with a left breast lump in a particular patient. For now, a case number of sequence numbers will suffice. Additionally, we should have a way to refer to the radiology and pathology report separately (report types: Radiology and Pathology)?
        • Create a separate aTags application for clinical reports that makes it possible to add information such as the case number and report type, eventually custom RDF (i.e. something more specific than rdfs:seeAlso), and refer to URI's for original clinical report


      • identifiers for manual annotators or clinicians
      • (pseudo)identifiers for patients
    • store RDF output directly into triple store
    • Integrate NER's and relation extraction from text mining tools into aTags as a suggestion mechanism
    • Tune RDF output of MammoTags based on RDF design decisions
  • Experiment with types of inferencing that will augment the recall of discrepancy detection

Automatic annotation and RDF production

  • install ODIE
  • create annotator for Mammo and Path Reports
  • run annotator in batch mode
  • Create hybrid MammoTags that makes use of ODIE to suggest tags and point out potential discrepancies on the fly

Web Services

  • Create a webservice that will validate submitted documents and produce RDF from them
  • Create a webservice that will check for discrepancies in a submitted document pair

Report semantics

Mammo Report Path Report
Calcification pattern Microcalcifications
Tumor size Tumor size
Localizing wire position Localizing wire position
Contiguous structure involvment Contiguous structure involvement
Number of foci Number of foci

Pathology Report Data Model

Suggested N3 Style (Initial Draft)

Example paired reports

Example Paired Rad Report1

some Bioportal annotation service annotations

Suggested N3 for Paired Rad Report1

Example Paired Path Report1

Example queries to find a location discrepancy using a paired report

View Example Paired Reports in Turtle RDF

Download Example Paired Reports in Turtle RDF

'Matching': Example Query to select findings at same location, 'matching location'

'Missing': Example Query to select findings that exist at a specific breast location in the Radiology Report but not in the Pathology Report

How to run the above example queries yourself:

Example reports

Example Rad Report #1

Example aTags of Rad Report #1

Example Path Report #1

Example Path Report #2

Example Path Report #3

Example Path Report #4

Example Path Report #5

Example Path Report #6

Using aTags to generate annotations

A dedicated aTags page fot the Pathology-Radiology project has been set up at http://hcls.deri.org/atag/data/pathrad_atags.html

You can add the bookmarklets provided on that to your web browser to add curated statements to the page. Just drag them to the bookmarks bar of your browser (preferably Firefox) to use them. When you installed the bookmarklet, you can highlight a portion of text on any web page, click the bookmarklet, annotate and save the statement. The resulting aTag is added to the page as a snippet of HTML that contains the annotations in RDFa format. Examples of some annotated statements from pathology and radiology reports can already be found on the page. Use the special field "Add id of clinical case" to associated each statement with a certain clinical case. This makes it possible to create several statements about a single clinical case.

You can view/extract the RDF contain in that page with the Any23 service: http://any23.org/turtle/http://hcls.deri.org/atag/data/pathrad_atags.html

Related Projects



  1. and the reference is forthcoming