W3C

MultilingualWeb-LT Workshop, Dublin

12 Jun 2012

Agenda

See also: IRC log

This is the raw scribe log for the sessions on day two of the MultilingualWeb workshop in Dublin. The log has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC is used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following IRC can also add contributions to the flow of text themselves.

Scribes
Yves Savourel, Arle Lommel, Felix Sasaki, Phil Ritchie, Milan Karasek, Tadej Štajner, Dave Lewis

Contents


Welcome

Scribes: Yves Savourel and Arle Lommel

<fsasaki> meeting not started yet

Felix: today we'll work on the ITS2.0 requirements
... first let's introduce ourselves

<philr> Phil Ritchie, CTO at VistaTEC. Industrial Partners in CNGL and MLW-LT. Within MLW-LT interested in the encapsulation of linguistic quality information and provenance within metadata.

<Sebastian> Sebastian Sklarss, Interoperability and Open Data Consultant at medium-sized privately owned company ]init[ (www.init.eu). My collegue Horst Kraemer will join later. Interested in implementing ITS in our customers' CMS

<XavierMaza_GALA> Xavier Maza, Language Services Manager at iDISC and GALA (Globalization and Localization Association) board member, interested in hearing from you to take back to our membership.

Felix: ITS 2.0 was started at the beginning of this year
... some people may not know ITS 1.0 very well, so I'll try to summarize it
... ITS defines "data categories"
... they are separated item (not necessarily related), allowing flexibility
... we provide non-application-specific definitions
... it's ok to implement only some data categories, not all
... for example the Translate data category
... you can express it locally (its:translate on an element)
... HTML5 also implement that data category: the 'translate' attribute
... it's easy to map the implementations
... In addition ITS offers the "global" approach
... ITS 1.0 offers global rules using XPath selectors that select to which nodes the data applies
... you can compare this to CSS: defaults, rules in files, rules in the document itself, and locally as well.
... In ITS 2.0 we want to apply ITS in HTML5, CMS content, etc.
... we want also to have some bridges to the semantic web

Felix shows Richard's test for the HTML5 translate attribute using different systems.

<r12a> http://www.w3.org/International/tests/html-css/translate/results-online

Felix: ITS 1.0 has 7 data categories, focused on XML

<r12a> note that these test results need updating - last week i found out that MS now produces positive results for all tests

<r12a> ie. <span translate=no>.....<span translate=yes>..... </span> ....</span> now works

Felix: ITS 2.0 Disambiguation would allow linking to thesauri, etc. for MT.

Felix: other data categories: text analysis annotation

David Filip: How is ITS 1.0 term different from disambiguation?

Felix: term is not application specific, is a general item. disambiguation data is specific to this and ties to resources specifically for the purpose of disambiguation. But we need to discuss these details to finalize our work.

.. We need disambiguation in other areas, so this is designed for that purpose.

Richard: There were some other categories in ITS 1.0 you didn't show in the ITS 2.0 slide. Will they be dropped?

Felix: It was just that nobody showed interest in working on them here. But because the data categories are independent, we don't have to deal with them. They may be handled elsewhere. But in any event, we keep ITS 1.0 categories. We may point to them somewhere else or develop them further.

Richard: I'm worried that we might lose important things like directionality. It is useful for people using XML to have guidance. We don't want to drop them.

Felix: The list I showed is take from the ToC of ITS 1.0. ITS 2.0 will contain all of them and then add more. So all of them will remain in ITS.

.. We will give guidance for what to do, but the actual specification may point to work in another working group, but we don't drop it.

.. E.g., the HTML5 working group might define parts of these.

Dave Lewis: The lists you showed are snapshots from today. All it means is that there was discussion about some points and others. It's where we are today, but it can be changed.

Felix: Just because something form ITS 1.0 is not on the ITS 2.0 list does not mean it will be dropped. Just because our group doesn't implement does not mean we can't refer to other specifications for those points.

Felix: We need to get to concrete details to find consensus and implementation commitments.

.. We need to decide how to implement these categories in various formats.

.. We have consensus on a small set of ideas, but ideas for others, so we need to come to consensus.

.. Our time-frame is that we need the general framework by the end of July. That does not mean all details need to be sorted out, but we need to have the list nailed down, with a list of what is to "be in the basket". We need a draft by October and a stable draft by November.

.. The group is funded by the EU. We need to be as implementation-driven as possible. E.g., the translate data category really helps convey the message about what can be done and also shows issues. If you follow Richard's test, it shows issues with nesting of different translate states. That is not handled yet. By prototyping simple categories we can tell what is feasible.

.. For participants, please think about what you really want to work on before the summer break. Do it for at least HTML 5 and XML, using both local and global (XPath) markup.

.. Also, engage customers to see what they want to do. Use real-world testing.

.. It is a chaotic process. Start with playing with stuff to see what works. When I say play and prototyping, those outside the group might ask what sorts of implementations might be produced.

.. Got the MLW-LT homepage and see what deliverables are needed. It shows the areas where we expect to see implementations, e.g., Drupal for CMS by Cocomore, annotation by Tadej, in MT (Linguaserve, DCU), annotation of MT data, quality (Phil), etc.

.. At the end of the day we need stable implementations created using the EU funding.

.. We need more implementation experience.

Richard: Microsoft Translator does now support nested translate attributes properly. I just tested it and it worked.

<r12a> http://www.w3.org/International/multilingualweb/lt/wiki/Deliverables

Felix: The purpose is to get your ideas and commitments. Look at the requirements document for the ones where we have consensus. We need to understand what is important for which community. We also want to see how you can make money by seeing the value for the cost of changing to use these.

.. Find where it makes the most sense/value. We need business case-level arguments.

.. The group is moving forward a lot. The chairs and participants provide some pointers for the discussion aligned with the sessions. Use the mail you got to help guide the discussion.

Richard: What's the process for stating that you like a category and deciding whether it is in or out?

Felix: Join the IRC and when we discuss the categories and implementation, mention your support and concerns there. After the meeting we will analyze the comments to see what we need to take into account, what people supported.

Work session on representation formats

Scribe: Felix Sasaki

Jirka: Maxime will talk about issues and then I will discuss other issues.

Maxime's presentation

maxime: RDFa representation format - drop as a requirement?
... no, since RDFa mapping of data categories is in the working group charter
... different conceptualization: RDFa is for statements embedded in HTML, ITS is about a specific piece of content
... Sebastian Hellmann proposed the NIF format to have context based URIs
... two approaches to generate URIs: hash based or XPointer based

<r12a> i guess a link to the requirements doc would be useful: http://www.w3.org/TR/its2req

maxime: comparison of what can be selected with NIF, CSS seleectors, XPath 1.0 / 2.0, XPointer
... XPointer 1.0 has a small extension to XPath that is hard to implement
... big issue in RDFa - how to deal with inheritance and overriding
... probably be out of scope for us
... CURIEs - use URIs with less verbosity

<Arle> I have a concern about the statement about MUST support. It runs into compliance issues for us since you only need to implement one data category in one format to claim compliance. If you are interested only in implementing translate in HTML5, this proposal would seem to require you to support stuff you don't care about or actually need.

<fsasaki> Arle, where is MUST written? In Maxime's presentation?

<Arle> Yes.

maxime: consumers of ITS could use CURIEs to shorten URIs
... inspiration from Provenance wg: they deal with XML and RDF at the same time
... PROV data model, PROV ontology,
... suggestion would be to have multiple facets

<Arle> Slide 11: "ITS 2.0 implementations MUST implement XPointer“

maxime: ITS data model, ITS-XML, ITS-O (Ontology with mapping ITS to RDF)
... ITS-HTML, its-* attributes
... ITS-HTML-RDFa
... ITS-HTML-Microdata
... provenance model relates agents and activities
... e.g. "translator leads LT-activities on fragments of text"
... suggestion to define prov:Organization, prov:Person, prov:SoftwareAgent, ...
... as agents; activities are human translation, machine translation, quality assessment
... issues of local ITS annotation: can't express complex set of ITS attribute, e.g. its-* elements
... possible solutions: wirte directly ITS-XML in a script element or a head element
... finally - about ITS namespace, should it be kept?

<daveL> TCD's CMS LION implementation and its use of provenance documented in recent LREc paper: http://www.w3.org/International/multilingualweb/lt/wiki/images/b/b6/LREC-lewis.pdf

maxime: can be kept if you use ITS with content negotation, example with SKOS that is always redirected to latest version of a schema
... for us, we would have a URI of ITS 2.0 specification
... and content neg for various schemas

pedro, yes, it will be in the wiki

Return to Work Session

<mlefranc> to [11:05] <Arle> , I agree, that would only be needed for data categories that have CURIEs as Datatype

jirka is describing how representation is done so far for XML in ITS 1.0 and HTML5

jirka: after discussion with HTML working group, we decided to use its- attributes

<Arle> maxime, thanks for clarifying. That wasn't necessarily clear from the slide, where it looked like a general requirement for ITS 2.0 conformance. So the proposal would need to be hedged a bit.

<Arle> Preliminary slides page URL will be: http://www.w3.org/International/multilingualweb/lt/wiki/WS5_Preliminary_Presentations

jirka showing how global rules can be linked from an HTML document

jirka showing microdata mapping from its- attributes

jirka: meta element cannot be used everywhere in HTML, that might be an issue for microdata mapping of ITS
... RDFa is most problematic - jirka is propose to kill that proposal
... maxime had some ideas to do that; now I am worried about this mapping
... RDFa is another syntax how to express RDF triples
... so the subject of the tripel is the whole page, not the original piece of content
... we promised a mapping to RDFa, so I'd be happy if we have a proposal to work with NIF
... maxime: agree that people from the SW area need to have ideas

<scribe> ACTION: maxime to lead discussion on RDF serialization in ITS, with "task force" people - Sebastian, Maxime, Dave, ... [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action01]

dave: is there a use case for RDFa expressing ITS?

tadej: i can provide the data, but if nobody is consuming it that's a problem
... text analytics software provides info often via URIs
... that can be expressed without RDFa
... so we can provide RDFa easily, but does it make sense to use it
... NIF serves the issues with RDFa

davidF: maxime prepared great stuff
... but I would aks for a use case
... maybe a clarification of the charter would help

<scribe> ACTION: Felix to work on charter clarification [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action02]

maxime: we could use ITS in RDF to localize ontologies

dave: are we taking localization of ontologies as a requirement on board?
... also, provenance that we are working on (RDF based) does not require RDFa
... question is really if we have a use case for generating an RDF graph from the content

jirka: other issues - how to represent in former versions of HTML
... e.g. HTML 4 or HTML 3.2
... I think we don't need to provide that - even in HTML 3 or HTML 4.2 you can use the its- attribute

<scribe> ACTION: jirka to make a clarification in the req draft about previous versions of HTML [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action03]

above is ISSUE-19

jirka: XPointer - is stil a working draft, will probably not be finished in time

maxime: we could use XPointer without the extension of range
... we would have the benefits of URIs

jira: so in a selector a URI that can use XPointer fragment?

maxime: yes

jirka: that will be used in global rules?

maxime: yes

jirka: will be current inconvinient for current ITS usage
... currently you say "all titles in all chapters" via XPath
... with XPointer you will just be about a particular document identified by the URI
... so there might be no real use case to switch to XPointer
... so I'd propose not to use the idea now

davidF: if XPointer spec will not be finished, we cannot do this

jirka: maxime was proposing to publish schemas with content negotiation
... these techniques are controversial also in SW - need to provide things that run more automatic over HTTP

davidF: like the idea of content negotation

maxime: not for DTD or XSD, but other areas it might be relevant

felix: at dF, the content negotiation is about the schemas
... so probably a different case than df has in mind

sebastian: you have an ontology in a sense

jirka: I mean ontology in terms OWL

sebastian: can be easily created

felix: is there a use case for the ontology?

sebastian: not sure

provenance issue will be postponed

Working Session: Quality Metadata

for more info on this session, see http://www.w3.org/TR/2012/WD-its2req-20120524/#Quality and http://www.w3.org/TR/2012/WD-its2req-20120524/#Quality_Assurance_.28QA.29

Phil: Their interest is language quality and QA. Language review process can be very complex.

.. This is an opportunity to see new approaches and solutions.

.. Additional support of the audience will be very interesting for the implementations of these data categories.

Arle: The target audience for Quality Metadata are: LSP doing QA, Content Creators doind quality verification,

.. Authors marking errors and posteditors providing info on efficiency / MT quality.

.. Motivations: 85% of QA is spent on about 10% of content.

.. so there are pontential cost savings.

.. to capture sistematically the problems you do not know where they come from. Provenace will help.

... Some other data categories (purpose, domain...) can help to build business rules

Arle: Two complex Data Categories: errors and error profiles

.. the point is to have DC independently of the metric

<daveL> Quality data categories in requirements doc at: http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#Quality

.. the data model can be very simple, but some examples seem to need more complex attributes

.. a simple sintax can refer to the basic error parameters.

Richard: too many attributes and info in that model

Arle: yes, this is one of the issues

.. an standard markup will be maybe a better solutions, but needs to be resolved.

dag: where would that be? In the content?

Arle: in the localization content

dag: so in XLIFF?

Arle: Xliff can be an ideal scenario, but QA can be done in other enviroments, processes, etc.

phil: we were looking into using RDF and putting that into a triple store
... we want to use RDF to do cross silo linking

Phil: Publishers need also to capture feedback of QA process.

http://www.w3.org/QA/2012/05/interview_ibm_on_a_linked_data.html

Felix: in localisation of software mechanisms of status os process are necessary

Dave: Need to carefully establish the scope

Felix: we need to have implementation for this markup

.. the botton line are the implementation of the Data Categories to see them in real apps.

<mlefranc> quality error description with RDF using Provenance datamodel: [ a its:qualityErrorDescription ; its:qaType "..." ; ... ; prov:wasGeneratedBy [ a its:QAActivity ; ... ; prov:used [ a str:String ; str:anchorOf "verbs agrees"]]]

Dave: contradiction between the scope and the way of representing only one QA run

Richard: maybe better to use a dedicated element rather than span, since it makes it easier to keep separate from the content

Arle: a dedicated element can be used, but that gives other problems

Tatiana: target users can be also MT training and development

Arle: additional specifier can be necessary in terms of recognition

.. more slides

.. profile must be flexible and capable to be used in a global manner

(below action is unrelated to this discussion, just so that I don't forget)

<scribe> ACTION: felix to check whether we can use META-SHARE for identifying resources to be used in disambiguation [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action04]

.. ex. qualityProfile

Dave: do you mean by "pass" the result of the QA?

Arle: Yes, but it is more intended to show what was done as QA

Phil: you can define errors with a high granularity, but also some scores for more important errors.

Arle: This is a very verbose markup

.. implementations need a big effort, so mechanism to know how is been done now are necessary

.. and it will affect the commitments and timeframe

.. it is out of scope to standardrise the different QA metrics

Yves: can you specify it making the distintion between consumer and provider

Felix and Phil: probably it is not necessary to separate consumer and producer

<scribe> ACTION: Dave to conclude quality discussion with Arle, including examples from existing implementation in CMS-LION - due to mid July [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action05]

FePhil: there are some metrics that already people capture and use

Felix: there are many tools that can use this

End of session

Yves will join the discussion, providing input about what current tools do and how that relates to the current propsals

Terminology metadata

Scribe: Phil Ritchie

Tadej Stajner presenting

Goal annotate fragments of text

Audiences, content authors, localizers, CMS, MT providers

Data categories, Term; Named entity; Disambiguation; Text analysis annotation

Annotations provide disambiguation through reference to semantic networks and aontologies

Challenges

Tadej: use ITS to support HTML5
... aid term matching in TM and CAT tools
... are there more challenges?

felix: will TA approach lead to real-time tagging?

tadej: there is a potential for semi-automatic tagging

What's being tagged are candidates, not terms themselves until they are human validated

Ioannis: not realistic to have full automatic solution
... need to find ways for semi-automatic

tadej: identification or construction?
... construction hard problem

r12a: ITS 1.0 was concerned with term definition, is this different?

fsasaki: one difference is disambiguation

daveL: ITS 1.0 is a reference

tadej: what are dereferencing scenarios?
... TBX/RDF
... would like to see some kind of retreival mechanism

pedro: candidate proposal interesting
... content authors will need tool assistance to mark terms
... glossaries are very guarded by companies
... need to link/map to proprietary glossaries

fsasaki: can this be used by MT providers?

MT providers have own methods to disambiguate

scribe: problem of extending lexicons online
... client terminology supplied with translation task

dgroves: open question of how SMT use

<r12a> i'm wondering whether term definition and cross-language term equivalence should be in the same data category

dgroves: difficult on-the-fly consumption

tadej: does this information help?

dgroves: unanswered question

dF: on-the-fly more useful for rules engines

tattiana: promising initiative.
... could be a foundation

tatiana: 30% increase in MT quality from terminology related work

Are proprietary glossaries not linked to public repositories?

Ioannis: term candidates go through a rigorous process of approval - can take months
... in enterprises

<dgroves> To note: the use of terminology in MT research is generally considered a type of domain adaptation

Ioannis: customers need help with terminology

Term lifecycle phase attribute?

tatiana: important to distinguish between aquisition and recognition

tadej: annotationAgent special case of provenance
... example markup being presented

Ioannis: we have linking lexicon links

Necessary to have more than one term bases; product specific; cascading

scribe: client specific

tadej: stand-off markup cleaner
... in favour of inline by default but need portability

session closed

Updating ITS 1.0

Scribe: Milan Karsek

Tadej: Managing a lifecycle of terms
... Confidence of the annotation (named entity)
... difficult for some approaches
... Disambiguation for distinct words

<scribe> ACTION: Tadej to create a summary of implementation status of Terminology Metadata Generation [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action06]

<Jirka> My notes from representation session are at http://lists.w3.org/Archives/Public/www-archive/2012Jun/att-0018/representaion.html

Felix: MLW-LT must support all ITS 1.0 and their functionality

Yves: How to distinguish 1.0 in 2.0?

Felix: We will have references (e.g. to Ruby)

Jirka: Prefer to keep Ruby in 2.0

<scribe> ACTION: Jirka to summarize usage of Ruby in DocBook [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action07]

<fsasaki> scribe: fsasaki

Work Session: Content Authoring Requirements

presentation from alex lik

alex: localized publications - instructions for use, release notes, systems messages in a GUI, ...
... full localization of GUI
... tagset for XML - textcontainer for software is XML. If the ITS tagset works for any XML, software localization can benefit from that too
... medical device manufactures have a highly regulated environment
... local regulations, CFR (QSR), ISO 13485, IEC 60601, directive 93/42/EEC
... challenges:
... variety of authoring platforms, even within one company
... end user materials is in DITA XML
... having a tagset for "any XML" is a good condition
... but we want to go further to have real single sourcing
... modification of underlying format (and in the tagset) leads to changes in the fragmentation
... that leads to changes when we analyse the material for localization price quotes
... there is material that has been translated in XML; when you send it to HTML the price quote is comparable to the original one
... there are materials that are in word documents that are not legal templates
... there are many companies that have mandatory end user material
... and other types of material
... on one hand the separation is logical, but it can also impose problems
... the localization costs can growth tremendously
... having all content in the same repository & container will be helpful
... question of information architecture - do we need to train developers to work with ITS?

moritz: depends a bit - one issue we are seeing.
... the end user might not have a technical background
... we are looking into finding ways for having interfaces for users

alex: I am talking about content managers & information architects

felix: these people need to understand XPath at least to be able to write some useful rules

olaf: documentum would produce a lot of pages, but multilingualism is not taken into account
... there is a huge re-training of the information architecture people necessary to understand internationalization

alex: thanks a lot for that comment
... back to the challenges - terminology mgmt was mentioned before
... we can have our content in DITA, output in HTML
... re-publishing is important.

pedro: some implementations we are doing give you some background:
... you cannot put in the CMS all complexity of the localization progress
... you have to connect through e.g. a gateway
... to a platform that can do your requirements

expectations: compatibility with DITA XML
... integrating in SW resource files
... interop with XLIF
... terminology mgmt
... removal and re-integration of ITS markup
... ease of implementatio for tool vendors
... visible ease on the bill

alex: my main point is about the XML deep theb
... not so much about CMS
... take-away for me: educational aspects are missing for authors and others

presentation from des oates

des: focusing on the processes that we adopt within Adobe
... I am interested in this group because it covers three domains that are interesting for me:
... content creation, publication, localization
... publishing happens in different ways: raw HTML using CQ5 CMS
... we publish software
... and also documentation
... localizable content in source control systems,
... a lot of content in multiple repository formats
... framemaker, CQ5 again, web CMS
... translation services used internally: TMS, another TMS, ...
... adobe translator https://community.translate.adobe.com
... many inputs, many outputs
... a lot of complexity!
... 18 months ago we created an internal mediation layer, connecting authoring / publication / translation together
... you still have the three domains
... they are connected to mediation layer
... that supplies filtering / normalization, leverage, terminology / QA check, MT service
... MT is an abstraction layer that allows to plug in various components:
... moses, external MT providers, ...
... we access them all through a set of APIs
... each of the services is a potential consumer or provider of the metadata that we are discussing
... they are decoupled, but they work together in workflows
... we have many translation processes
... we have to match different business requirements
... we don't want to create customizations all the time, customizing connectors etc.
... what is the purpose of the mediation
... example: MT workflow: from CMS > XML, normalization process, XLIFF transformation, leverage of XLIFF
... (check if everything is re-usable)
... if it is not re-usable it will not be propagated through the workflow
... after machine translation, content goes to post editors
... after that content goes to XML and CMS (HTML)
... that's a typical workflow that we deploy with our platform
... where would the metadata be important?
... in MT service: translate, in TMS: loc note and disambiguation
... above are just examples
... another example workflow: user generated content, also with XLIFF and MT
... above is a real time process: user says "I want to have page in a different language", clicks, and gets the content
... important here: translate, disambiguation, provenance metadata
... we need the metadata for our SOA based localization
... without a standard form of metadata we will loose data
... provenance is important
... ITS 2.0 should solve parts of these problems
... beyond the data categories
... I have an additional set of requirements, in addition to markup / attributes etc.
... it should be straightforward to establish which subset of ITS 2.0 an implementation supports
... in SOA, we need to know what metadata a system supports
... that's orthogonal to the data modeling confirming

<scribe> .. unknown domains and organizations private use of data categories should be considered

des: beyond / in concert with ITS2.0
... ITS 2.0 will solve parts of the problem, other components will need to be addressed too
... standardization of content packaging, see e.g. linport project
... standardization of service boundaries

<Arle> Here service boundaries = standard APIs for the various services.

des: clear opportunity of standardization of APIs
... could help integrating terminologoy systems with workflows

dave: really enjoyed your diagrams, want to re-use them for our use-case document
... also agree about service boundaries
... not sure if there is a deliverable of the working group
... but having examples / slides like that to communicate the problem is very helpful

des: yes, understand that at the moment it is out of scope, but we need to assure that it is taken up at some point

dave: I also agree that for a service you need a clean way to state what data categories you support

des: it is part of the business agreement, needs to be very clear

<Arle> Felix: The conformance statements we have for 1.0, for 2.0 we need to make it clearer that someone implementing must have these statements. We should be clearer about what must be provided: e.g., machine-readable, human readable, various implementations

dave: some things are important for the in-formative part of the spec
... e.g. in very clear best practices documents, see richard's example

xyz: in the "example workflow", is there a way to transform various formats into HTML?

des: xliff is the interchange format that we use across the platforms
... normally we would use XLIFF to translate data from one service to another service

s/xyz/Kerstin/

davidF: XML is a transition format
... if you start with HTML you will have HTML, same for other formats

daveF: XSLT is the transformation language

kerstin: would it make sense to convert the lexicons into HTML already?

des: don't see the use case for HTML, since that's a publication format
... if you have word or pagemaker etc., you just need a filter to convert things to XLIFF or another interchange format
... that's the rationale for the conversion

moritz: one issue - what is if we get metadata that we don't support?

des: it is important to know what you expect

<Arle> This is an interesting issue: when can you strip metadata? What happens when you get back metadata invalid for your domain>

moritz: should metadata be stripped out?

<Arle> s/domain>/domain?/

davidF: it is important to have defaults
... there are ways around it even if you don't support everything

<Arle> Felix: One last point. I will take an action point to come up with examples of implementations and what they can do. See if they fulfill your requirements.

<scribe> ACTION: felix to come up with example of SOA related presentation of metadata capabilities for des' requirement [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action08]

moritz: an ontology of process states - can we agree on that?

dave: we cannot standardize the process
... we can just try to "normalize the language"
... comes down to people like des and dag who have a whole view on the process

<Arle> Agreed with Dave. The moment you claim to standardize the process, it creates problems. What you can do, however, is standardize the boundaries (per Des). Treat the process as a black box in your definitions, but one with well defined inputs and outputs. E.g. don't tell localization HOW to do things, but you can saw what it must make at the end of the day.

<scribe> ACTION: moritz, dave and others to look into process areas [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action09]

davidF: interest is cooperation

felix: need to make sure that we resolve this is an timely manner

dave: this is also a part of our public relations work, not so much the data categories
... people in the working group are familiar with the terminology

<Arle> Felix: We need to have things resolved ASAP, but July.

davidF: will work on that

Localization Requirements

Scribe: Tadej Štajner

Bryan Schnabel: XLIFF Extensibility and Metadata

Bryan: There are three main ways to extend XLIFF1.2: elements, attributes, attribute values
... with elements, you can use the usual XML namespace declaration mechanism
... similarly with attributes, where it is allowed.
... with attribute values, you can prepend x- to your value where none of the existing options work for your use case.
... There is an DITA OpenToolit XLIFF/DITA roundtripping tool you can use
... It's implemented as an Ant tool, and the tools also keep the original PDF to preserve context of the content
... The roundtripping XSLT required two distinct operational modes: the skeleton mode and the body mode.
... A custom namespace preserves the formatting information, so it is fully reconstructible.
... Here, I'm demonstrating a Drupal module for this.
... All Drupal information gets stored in a custom namespace. Some of the metadata also shows up in the XLIFF namespace. After the XLIFF file is translated, the plugin can import it and everything works.
... In XLIFF 2.0, the extensibility was more restricted. Previously, people re-implemented XLIFF functionality in their extension, which we do not want.
... We then allowed custom namespaces for elements that are not already in XLIFF 2.0.
... After discussion, the WG vote tied on allowing elements-only and elements-and-custom-namespaces.

<Arle> Just as a reminder to folks, I am posting all slides here: http://www.w3.org/International/multilingualweb/lt/wiki/WS5_Preliminary_Presentations.

Bryan: The tie still remains, the TC will decide shortly - you all can also get involved and influence the discussion.

r12a: What about background compatibility? Will XLIFF1.2 users be stranded?

dF: XLIFF is a perishable transport format, it is usually not persisted.

fsasaki: If during the generation of the metadata some extensibility is used in a particular use case, the whole extensibility layer needs to be rewritten for 2.0

Yves_: For more complex metadata, we want to structure elements in the document and carry that information forward.

DagS: How do you map ITS to XLIFF? i.e. how to map a <term> tag to XLIFF?:

Yves_: For <term>, there is a specific element that is compatible. As long as we don't do complex terminology stuff, it's fine. If we do complex things with other data categories, we can use namespaces to extend.
... In XLIFF1.2, you could use ITS as a namespace extension.

daveL: The extensibility seems to have implications on validation, making it more complex. For instance, implements should be able to say which parts should be supported in their implementations.
... Do you have a criteria on what kind of extension are "acceptable" to you?

Yves_: Even internally in XLIFF, many components are modular and live in module namespaces which extend the core.
... That strategy could be good for supporting ITS tags within XLIFF.

Bryan: If the current extensiblity strategy doesn't work for you, you can raise your voice.

DagS: If it will go in this direction of restricting extensiblity, that is worrisome for the ITS ecosystem.

Des: We spent a long time shaping the requirements in the content domain, and we need to spend some time with supporting ITS in the localization domain. XLIFF is the workhorse there, and ITS needs to be integrated with XLIFF 2.0

DagS: Is restricting extensiblitly really going to help with the new implementations?

Yves_: One unsolved issue in 1.2 was segmentation representation. However, there is a mapping from 1.2 to 2.0.

Bryan: The people against extensiblity are just listening to the community, which voiced legitimate concerns about extensiblity. The opposition's opinion is that the problem was not extensiblity, but misuse of it. We can solve this by enforcing conformance clauses and setting expectations.

DagS: Can there be a special case for ITS?

Bryan: Possibly - see mailing list for XLIFF.

r12a: They should be closely tied. You don't need to use its: for using our data categories, but you can also integrate the data categories into the XLIFF markup and provide conformance support.

Bryan: That's not being planned, but people would listen to these suggestions.

dF: There will be custom namespaces. Even if XLIFF by default won't have the extensiblity, ITS can still be suggested as an OASIS namespace and module.
... There is a proposed feature in XLIFF2.0 for ITS support.

r12a: I would really like to move this even closer: I realize it's still an optional extensional. Ideally, support would be integrated in the core. This doesn't affect inline markup?

fsasaki: There are other communities that are affected by this policy.

Yves_: Yes, like TBX.

daveL: Do we have any consensus here? How many people would like to say "Please keep namespaces in 2.0"?

DagS: We want to have XLIFF support ITS. The exact implementation is not that important (namespaces, modules, etc.)

Bryan: If you feel passionate about that, comment yourself.

<fsasaki> ACTION: felix to write a mail to XLIFF tc, check that on Wednesday morning again [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action10]

Return to Working Session

Yves_: We introduce the data category idValue as a selector of content.
... Some discussion was about what do we need to identify? Segments? How do we select the ids?

mlefranc: This could be relevant to the provenance discussion tomorrow.

daveL: We need to be able to point to a portion of text when we talk about provenance.
... There are several ways: introducing new <span> tags, re-using existing markup, or doing completely stand-off annotation.

Yves_: I am referring to existing IDs in the document and should be persistent throught the process.
... Data categories are discrete, the more orthogonal the better.

daveL: How important is this requirement for mapping translation units?

Yves_: It's important for localization, but XLIFF has its own id space.

fsasaki: Let's continue this discussion tomorrow, there's also the targetPointer debate.

BCP 47 Developments

Scribe: Dave Lewis

Mark Davis presents remotely

scribe: introduces unicode locale/lang ID which is based on BCP47
... there are extensions to these code
... extension U relate to locales with various calendars, phone number formats, digit sets etc
... e.g. arabic with arabic numbers or western numbers
... t extensions, indicate transform of content, see rfc6497
... e.g. transliterattion, translation, transscription
... intended for interchange circumstances where there is no structured way of expressing the transform
... options can indicate transform mechanism, input method, ketboard method and a specific on for machine translation - plus ones for private use
... resources avalable for choosing language tags and extension fields and sub/fields

Yves asks confirmation that ID is both an inication to future action and can be used to report past actions

Mark suggests using differing tag codes for these different uses

scribe: but confirms that the coding scheme could be used for either

Dag asks confirmation that this is not intended to replace locale tag

Mark states this could be used to tag content as being in one language but additionally that it was machine translated from english

Dag concerns that if this replaced locale tag could confuse implementer, overriding exisitng functions that understand straigh BCP language tags

Mark responds that the tag could be ignored if the application is not understood

David Filip reiterated concern that overloading the tag for both instructions and reporting would be dangerous

Mark responds that its up to the author of the content to decide how to use the tag

David expanded to ask if the usage should be context driven

Mark responds that it can be used for request and response, and response doesn't have to conform to request

scribe: but does expect to use tthis in a richer environment, e.g. and XLIFF document

r12a state that the attribute should dictate whether this is infooration or an instruction

Jirka asks if there had been consideration of using such tags in http headers

Mark states that the intention is this should be used in a very wide range of circumstances and languages

scribe: purpose for putting it in BCP47 is to support this wide variety of contexts

Felix asks if process tracing data would use this tag, or would it be in more structured markup

Mark responds that this could be used in other standards, and perhps the constituent codes could be used

Felix asks for some guideance on more usage on this extension, he will follow up.

<fsasaki> ACTION: felix to follow up on bcp 47 "t" guidance in i18n core working group [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action11]

Arle encourages peopple not already in the group to get more involved in WG

<r12a> i also wanted to make the point that lang= or xml:lang= in a web page ONLY means that the content within the associated element is in a particular language - if you want to indicate some other thing, such as the language of an external resource or a request for information in a particular language (see http://www.w3.org/International/questions/qa-when-xmllang)

<r12a> this is something that often trips up working groups

wrap-up for today

Arle recognises sponsorship of CNGL in this event

scribe: and highlights next MLW workshop in March 2013 in Rome

<fsasaki> link to locworld event at http://www.localizationworld.com/lwseattle2012/feisgiltt/

scribe: and there is a workshop planned on cross institute interoperability called FEISGILT colocated with LocWorld in Seattle in oct 2012

<dF> FEISGILTT 2012 call for papers: http://www.localizationworld.com/lwseattle2012/feisgiltt/

scribe: thanks you and see you tonight in turks head and tomorrow on second day

Summary of Action Items

[NEW] ACTION: Dave to conclude quality discussion with Arle, including examples from existing implementation in CMS-LION - due to mid July [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action05]
[NEW] ACTION: felix to check whether we can use META-SHARE for identifying resources to be used in disambiguation [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action04]
[NEW] ACTION: felix to come up with example of SOA related presentation of metadata capabilities for des' requirement [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action08]
[NEW] ACTION: felix to follow up on bcp 47 "t" guidance in i18n core working group [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action11]
[NEW] ACTION: Felix to work on charter clarification [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action02]
[NEW] ACTION: felix to write a mail to XLIFF tc, check that on Wednesday morning again [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action10]
[NEW] ACTION: jirka to make a clarification in the req draft about previous versions of HTML [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action03]
[NEW] ACTION: Jirka to summarize usage of Ruby in DocBook [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action07]
[NEW] ACTION: maxime to lead discussion on RDF serialization in ITS, with "task force" people - Sebastian, Maxime, Dave, ... [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action01]
[NEW] ACTION: moritz, dave and others to look into process areas [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action09]
[NEW] ACTION: Tadej to create a summary of implementation status of Terminology Metadata Generation [recorded in http://www.w3.org/2012/06/12-mlwDub-minutes.html#action06]
 
[End of minutes]