MLW-LT WG -- 09 Aug 2012

<dF> Declan and Jan sent regrets on top of the regrets recorded on agenda in advance..

<dF> the Doodle based regrets: Pedro, Shaun, Milan, Raphael, Pablo, Giuseppe, Dave > *Additional regrets:* Tadej

agenda review

<dF> http://www.w3.org/2012/08/02-mlw-lt-minutes.html

df: Look at the minutes, any issues, any objections to these?
... accept the minutes and move onto topic 1

quality discussion

dF: Topic 1 Quality Discussion. Need to discuss issue 42 which is a more general issue. Progress of category, time frame etc?
... arle please report

<Arle> Current draft of Quality is here: http://dl.dropbox.com/u/223919/dfki/mlw-lt/locQuality.html

Arle: Just posted link to spec draft, still not in correct form but please use for reference. At this point need agreement on attributes listed in section 6.x.3. These need to be agreed upon, with the exception of ??
... all those in top half of table are agreed upon by phil, me, yvves etc.
... second half needs people to comit to implementations.

Felix: I think you can add me to the list to the people who agree on the information here but not on whether they will become attributes

Arle: useful distinction. Each "Attribute Name" represents pieces of information. Need to nail down and agree upon these. At Felix, do we need to issue a call for consensus on this?

Felix: No W3C process for this...

dF: I think that the quality thing should be addressed in a structured way
... Arle is the owner of this, if consesnsus needs to be achieved we should do this

Felix: But what if a decision is later overruled? All we can do is structure the discussion and come back to consensus later.

dF: Clear every consensus can be overrulled but structuring a discussion ?

fsasaki: Should start discussing the topic itself, not so much about the process

dF: There is one action item action-168 which does not seem to have developed much... Arle can you comment?

Arle: This has been ongoing, Yves has been active on this. Really the last piece of that was writing to ?cilgrave? about the XLIFF part.

dF: Not many recorded emails on this.

Arle: Lots of discussions going on elsewhere
... v. quickly. Some info that we have agreement on - we started out with the idea of having two seperate pieces to this, 1) What metric, process, tool has generated mark-up. This defines a q name with prefix and uri with more info
... think of it as a tool, metric, process signature.
... 2) Low quality score, allows a process to provide a score relavent to a docusment. 95, 32 etc, apply at document level. Some at moment are more inline, locQualityType, for example.
... these are designed for interoperability between tools.
... Allows common tagging between different tool.
... Low quality codes - Allows mapping of implementation tools to common set as well as passing over original code.

Arle: These are the ones we have agreement upon, there are five there that we dont have agreement upon. I wont go through those but please look at online document.

dF: Can this be wrapped up in August? Can a cut be made on information pieces that have not made process?

Arle: I think so, these seem stable. I think we have consensus on them.

dF: Are you prepared to cut those which are not mature enough?

Arle: Yes. Except in the case of arguments and impl commitments from Phil, Yves, etc?

dF: I would like to formalise this. Set an action to freeze number of information pieces. Would you be able to freeze the number by the next call, in a week??

Felix: If you look at issue 42 some of these info pieces are the same across data categories... Im not saying that we would disagree but where they belong to we may disagree.

Arle: That impacts the first two of these.. Whether they are here or move we need them. For all but first two (profile and score) we'll have a decision by next week?

ACTION

<fsasaki> ACTION: arle to freeze the number of information items in quality, with the reservation that some items might move to other areas [recorded in http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action01]

<trackbot> Created -192 - Freeze the number of information items in quality, with the reservation that some items might move to other areas [on Arle Lommel - due 2012-08-16].

<Arle> scribe: Arle

issue-42

Felix: I was looking at the proposals we currently have and in a number of categories we have data about what generated it and the confidence in that. Text analysis, mt confidence, and quality all have similar issues. People have to separate issues generated by multiple tools. Another common aspect between these categories is that these pieces of information are kind of general settings that inherit through the tree to where you need them, much like the language
... In our case, you might specify one tool, or, if needed, multiple tools used for creating annotations.
... There is one issue: in Quality, you identify the model, but in the others it is a tool.

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html

David: The common aspect is the state of inheritance, and that you may need to record multiple tools or models on the local level. How does the inheritance relate to global and local approaches?

<DES> +q

Felix: Global and local are just different ways to specify the metadata. But these are separate pieces of metadata. Once you have specified them (locally or globally) they inherit throughout the document.

David: Like with translate and they way it can switch it on and off. So the issue really is to specify that these inherit, correct?

Felix: I see this not as specific to these data categories, but rather as a separate data category. I'm not sure how you would describe the relationship from mtConfidence, quality, and text analysis to these. I don't yet know how it would work in detail.

David: So you propose to introduce a generalized originator category. Isn't that like provenance?

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html

<fsasaki> lingProcInfo

Felix: That's a good point. There is a clear relationship. I just pasted a link Christian Lieske supplied on this. It might be provenance or a subcategory of provenance. It is important for at least three categories, but maybe for others. This is so specific that I think maybe we need a specific mechanism. Provenance is really about more complex information related to provenance. This is more about identifying the process used to create something. I'd rather see
... E.g., pointing to the tool or process.

David: In Dublin I wanted provenance to be independent. I see only two options: (1) subsume it in provenance; (2) specialize it in the categories in question. For example, if I use the LISA QA Model, is it relevant to anything but quality. I don't think it would be problematic to have these done in specialized categories.
... I think this would work better to modularize ITS. But if we make them orthogonal, we should put them in provenance.

Felix: But if we specialize them, we run into the issue we see with quality that the ITS inheritance model.

David: So are you saying that ITS inheritance is for the content only, not the metadata?.

Felix: If you want to apply the same type of data category multiple instances of a data category to the same node, you cannot do it. You can't say that Tool A gives one value and Tool B gives another value for the same piece of content.

David: So you mean that if there are comparable originators, you can't apply multiple ones, correct?

Felix: Yes.

David: This won't be an issue for mtConfidence, because you are generally working with a single candidate at a time. If you need more, you should look at XLIFF or something.
... If you are composing a document from multiple sources, the normal inheritance model would work.

<fsasaki> scribe: fsasaki

arle: for quality, the normal inheritance model fails

<scribe> scribe: Arle

David: Would it be OK to state that inheritance is cancelled when two comparable originators are used on the same node?

Felix: We need to consider backward compatibility, and also the test suite, which has examples where inheritance deletes one piece of information. The test suite is just one example where this change would go against running implementations.

Phil: We are talking about child elements inheriting the metadata from a parent?

Felix: Yes. It is CSS-like inheritance.

Phil: Would it be permitted to replicate certain parts of the document when you need to apply multiple pieces to the same content? It would be building a pseudo-parent around multiple builds.

David: That would be out of scope for us.

Yves: What we could do is have a span with an attribute that points to an external element. That is stand-off annotation that could contain several entries, not just one.
... The inheritance model works fine in the document itself.

Felix: Yves is saying you have a pointer in the document to the list of alternatives. By using the stand-off list you can have all the annotations you want.

David: You wouldn't duplicate the content, but you would have a list of applicable metadata. This is a mechanism to be used for when there is clashing inheritance?

Felix: Arle and I discussed having a separate section in the HTML5 document that is not displayed where you put this information and then you ship around a single document.

David: I think we should specify this mechanism in a separate discussion.

Felix: I think this is related to Issue-37. I'll create an example.

<scribe> ACTION: Felix to create an HTML5 example of the externalized markup within a single file. [recorded in http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action02]

<trackbot> Created ACTION-193 - Create an HTML5 example of the externalized markup within a single file. [on Felix Sasaki - due 2012-08-16].

David: I think the high-level information is whether we keep the producer information in a specialized category, or whether we put it in provenance. I think we all agreed that in the case of clashing producers we have this other mechanism.

Yves: It's not just about different producers, but also about cases where the same information is applied in multiple places.

Felix: This is not producer-specific, but conflict-specific.

David: The use case I am thinking of is about two different reviewers using the same quality model.

<fsasaki> felix: or two different text analytics systems

Phil: The general condition is that you want multiple pieces of metadata. Whether they conflict or not, you can accommodate both within a single node(?)

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html

David: From the point of view of MT confidence, I don't think we need this special mechanism.

Felix: One other point (see pasted link). One part for opening Issue-42 is the conflict discussion, part of the issue is that we want to describe tool-specific data. Arle and I need to create a way to describe what generated the data.

David: I think we use the same templated piece about inheritance.

Des: I have a related issue. Quality score is normalized, but agent isn't mandatory, but agent is mandatory for MT and text analytics. We need to be consistent across these.

<fsasaki> +1 to des

David: If you had to include multiple MT results, you have to replicate content, but text analytics can use multiple tools for one piece of content.

<Yves_> +1 to des

Felix: There are limits to harmonization, but let me make some examples.

<philr> +1 to des

David: Is anyone here to tell us anything.

<fsasaki> ACTION: felix to work on issue-42, provide examples and template for various data categories [recorded in http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action03]

<trackbot> Created ACTION-194 - Work on issue-42, provide examples and template for various data categories [on Felix Sasaki - due 2012-08-16].

NIF_RDF rounddrip

<fsasaki> http://wiki.nlp2rdf.org/wiki/ITS2NIF2ITS

<fsasaki> Sebastian choose DBpedia Spotlight(web site) here as an example

Felix: Short update. Sebestian Hellman did all the work, but see the wiki link I posted. It shows how to go from HTML/arbitrary XML to RDF in the NIF format. Various tools understand this format. One application scenario is to produce named entity annotation with the DBPedia Spotlight tool.

The results can be integrated into the original XML. It provides a bridge to language-technology tools that use NIF. It does not impact the description of the data categories. I've started building a conversion. It will give us a nice bridge to other tooling.

test suite

Dom: I'd like people to look at what we've done. I'm going to start looking at the output that tools might produce. So by the beginning of September we should have agreed upon input files and output formats and we can tie implementations against data categories for testing in Prague.
... We're happy with progress, but want others to take a look.

mtConfidence

David: Yves pointed out some deficiencies. I will produce the next draft version. I won't touch the inheritance bit and would wait for Felix. But I think we only need normal inheritance here.

<scribe> ACTION: dF to produce next draft of mtConfidence. [recorded in http://www.w3.org/2012/08/09-mlw-lt-minutes.html#action04]

<trackbot> Created ACTION-195 - Produce next draft of mtConfidence. [on David Filip - due 2012-08-16].

Seattle event

<dF> *Topic 6* > *Seattle event* > http://www.localizationworld.com/lwseattle2012/feisgiltt/ > Felix's Action-191 > https://www.w3.org/International/multilingualweb/lt/track/actions/191 > Please tweet and retweet the I18n blog entry > http://www.w3.org/blog/International/2012/08/06/speaking-proposals-for-feisgillt-event-open-until-august-14-dont-delay/ > Please indicate your attendance on LinkedIn: http://linkd.in/Q5Tq7B > Submit speaking and demo proposals by August

<fsasaki> please spread the word :) :) :)

David: Please Tweet, build buzz, etc.

<fsasaki> thanks to dF for making all this happen!

David: Thanks to Felix for publishing blog entry, etc.
... I'll leave housekeeping topics for the next weeks.
... I think they are self-explanatory. No need to extend the meeting for now.

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-commits/

Felix: One final item. I've created a list at this URL that shows the commits to the W3C CVS. It shows you what changes the editors make.

Meeting closed.

- DRAFT -

MLW-LT WG

09 Aug 2012

Attendees

Contents

agenda review

quality discussion

issue-42

NIF_RDF rounddrip

test suite

mtConfidence

Seattle event

Summary of Action Items