Re: Further input to issue-42 (Fwd: Meta Data related to Configurations for/Information on Linguistic Processors)

Tadej,

My one worry about having only document-level is that increasingly we are seeing workflows where chunks are reused in multiple documents. While in most cases they will have undergone the same processes, that won't always be the case. So it could easily happen that you have a topic-based CMS (like a DITA-based system) and maybe some topics were processed by ACME-Extractor 2.0, but some older ones were processed by ACME-Extractor 1.5 prior to an upgrade of the software. It may not be worthwhile to reprocess the old content, but you also don't want to misstate that those old topics were processed with the wrong tool since that may set up improper expectations in the workflow (like expecting that a particular sort of annotation may be present if the older version didn't make it).

So even if you don't really need span-level annotation, document-level annotation will be too coarse in many real-life situations. As a result, I think you need to allow for element-level annotation. At least in any topic-based environment that will be the case, and perhaps in other cases as well.

Best,

Arle

On 2012 Sep 18, at 13:52 , Tadej Štajner <tadej.stajner@ijs.si> wrote:

> Hi, 
> this looks very well-developed and something that had practical usage. One thing that surprised me a bit is that the termExtrInfo data category (or the generalized lingProcInfo) doesn't seem to focus on targeting specific annotations, but annotates the whole document via the header. 
> 
> My question (for everyone using ISSUE-42 data categories), is it necessary to point to individual elements, or is a document-level annotation enough? As it stands, the TA Annotation draft has a per-instance mechanism via selectors. Could we simplify to a per-document ('e.g. this document was processed by 'ACME-Extractor 2.0') rule?
> 
> -- Tadej
>         

Received on Tuesday, 18 September 2012 12:50:14 UTC