HCLSIG BioRDF Subgroup/Tasks/Natural Language Processing and RDF

Task:

To generate RDF from free text documents, using natural language processing (NLP) technologies

Task Objectives:

1. To test different approaches and technologies to transform free text into RDF
2. To enrich the RDF knowledge base, created by the BIORDF group, with content originating from free text documents

Participants:

Davide Zaccagnini, ....., ......

Problem statement for this use case: The vast amount of biomedical information stored in free text documents is inaccessible to software applications. Natural language processing allows transforming free text into schematized, executable data; NLP systems, able to interact with RDF and OWL, are crucial to the integration of semantic applications with existing IT systems in health care and life sciences.

Deliverables:

1. Different schemas and data, outputted by various NLP engines, including, but not limited to:

XML, HL7 CDA (XML), Tab delimited files

2. RDF, generated from NLP engines’ outputs
3. Written report, analyzing advantages and limits of the tested technologies and strategies to convert NLP outputs into RDF

Related resources

NLP:
Scientific Papers (from Language and Computing's archive)
Text mining in neurosciences, C. Castro (from Senselab)
NLP to RDF:
Integrating XML data sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
Bridging the Gap between RDF and XML
The Future of Natural Language Processing

Task supports and dependencies

Dependency with other BIORDF taskforces experimenting on transforming XML and other schemas into RDF
Dependency with the BIO-ONTOLOGY subgroup for use case definition and integration with their ontology

Tools and Services

NLP engines:
- TeSSI, ontology-based NLP (Language and Computing)
- GATE, General Architecture for Text Engineering (GATE, Hamish Cunningham et al.)

Timeline for Task Completion

1- Identification and acquisition of a corpus of documents dealing with the group’s use case
2- Preliminary analysis of available NLP engines
3- Processing documents and analysis of the outputs (3 months goal)
4- Identification of further steps for generating RDF, if needed
5- Analysis and application of available tools (inside and outside of the BIORDF group) for transforming engines’ outputs into RDF
6- Writing a report on lessons learned

Categories