HCLSIG BioRDF Subgroup/Tasks/NLP-RDF
Task:
- To generate RDF from free text documents, using natural language processing (NLP) technologies
Task Objectives:
- 1. To test different approaches and technologies to transform free text into RDF
- 2. To enrich the RDF knowledge base, created by the BIORDF group, with content originating from free text documents
Participants:
Davide Zaccagnini, ....., ......
Problem statement for this use case: The vast amount of biomedical information stored in free text documents is inaccessible to software applications. Natural language processing allows transforming free text into schematized, executable data; NLP systems, able to interact with RDF and OWL, are crucial to the integration of semantic applications with existing IT systems in health care and life sciences.
Deliverables:
- 1. Different schemas and data, outputted by various NLP engines, including, but not limited to:
XML, HL7 CDA (XML), Tab delimited files
- 2. RDF, generated from NLP engines’ outputs
- 3. Written report, analyzing advantages and limits of the tested technologies and strategies to convert NLP outputs into RDF
Related resources
- NLP: Scientific Papers (from Language and Computing's archive)
- NLP to RDF:
- Integrating XML data sources using RDF/S Schemas: The ICS-FORTH Semantic Web Integration Middleware (SWIM)
- Bridging the Gap between RDF and XML
Task supports and dependencies
- Dependency with other BIORDF taskforces experimenting on transforming XML and other schemas into RDF
- Dependency with the BIO-ONTOLOGY subgroup for use case definition and integration with their ontology
See Also NLP Interfaces
Tools and Services
- NLP engines: TeSSI, ontology-based NLP (Language and Computing)
- NLP-to-RDF conversion: Cypher
Timeline for Task Completion
- 1- Identification and acquisition of a corpus of documents dealing with the group’s use case
- 2- Preliminary analysis of available NLP engines
- 3- Processing documents and analysis of the outputs (3 months goal)
- 4- Identification of further steps for generating RDF, if needed
- 5- Analysis and application of available tools (inside and outside of the BIORDF group) for transforming engines’ outputs into RDF
- 6- Writing a report on lessons learned
Categories