HCLSIG BioRDF Subgroup/Tasks/Natural Language Processing and RDF

From W3C Wiki


  • To generate RDF from free text documents, using natural language processing (NLP) technologies

Task Objectives:

  • 1. To test different approaches and technologies to transform free text into RDF
  • 2. To enrich the RDF knowledge base, created by the BIORDF group, with content originating from free text documents


Davide Zaccagnini, ....., ......

Problem statement for this use case: The vast amount of biomedical information stored in free text documents is inaccessible to software applications. Natural language processing allows transforming free text into schematized, executable data; NLP systems, able to interact with RDF and OWL, are crucial to the integration of semantic applications with existing IT systems in health care and life sciences.


  • 1. Different schemas and data, outputted by various NLP engines, including, but not limited to:
XML, HL7 CDA (XML), Tab delimited files
  • 2. RDF, generated from NLP engines’ outputs
  • 3. Written report, analyzing advantages and limits of the tested technologies and strategies to convert NLP outputs into RDF

Related resources

Task supports and dependencies

  • Dependency with other BIORDF taskforces experimenting on transforming XML and other schemas into RDF
  • Dependency with the BIO-ONTOLOGY subgroup for use case definition and integration with their ontology

Tools and Services

Timeline for Task Completion

  • 1- Identification and acquisition of a corpus of documents dealing with the group’s use case
  • 2- Preliminary analysis of available NLP engines
  • 3- Processing documents and analysis of the outputs (3 months goal)
  • 4- Identification of further steps for generating RDF, if needed
  • 5- Analysis and application of available tools (inside and outside of the BIORDF group) for transforming engines’ outputs into RDF
  • 6- Writing a report on lessons learned