Warning:
This wiki has been archived and is now read-only.

Text Analysis serializations

From ITS

Jump to: navigation, search

Version: 16 January 2014

Overview

The ITS 2.0 specifiation defines a normative way to represent Text Analysis information in XML and HTML locally. Text Analysis information can also be represented in other formats, e.g. JSON. This page provides a description of such alternative serializations. Please edit this page or provide comments on the ITS IG mailing list.

Comparison to NERD API output

The output of the NERD API is described in a JSON format. Here is an example API call output.

[
{
(1) idEntity: 120,
(2) label: "BBC",
(3) startChar: 138, endChar: 141,
(4) extractorType: "Company",
(5) nerdType: "http://nerd.eurecom.fr/ontology#Organization",
(6) uri: "http://dbpedia.org/resource/BBC",
(7) confidence: 0.0582796,
(8) relevance: 0.5,
(9) extractor: "dbspotlight"
},
...]

There are the following correspondences between the NERD API and Text Analysis information pieces:

idEntity: no correspondance
label: content of the annotated element in XML or HTML
startChar, endChar: not represented as part of Text Analysis information piece, but is generated in a NIF workflow, see conversion to NIF
extractorType: no correspondance
nerdType: entity type / concept class, e.g. in HTML its-ta-class-ref="http://nerd.eurecom.fr/ontology#Organization"
uri: Entity / concept identifier, e.g. in HTML its-ta-ident-ref="http://dbpedia.org/resource/BBC"
confidence: Text analysis confidence, e.g. in HTML its-ta-confidence="0.0582796"
relevance: no correspondance
extractor: its-annotators-ref (in HTML) or annotatorsRef (in XML) attribute, e.g. its-annotators-ref="text-analysis|dbspotlight".

Retrieved from "https://www.w3.org/International/its/wiki/index.php?title=Text_Analysis_serializations&oldid=776"

Text Analysis serializations

Overview

Comparison to NERD API output

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Navigation

Tools