Re: [Issue-75] - Domain

Hi Christian, all,

this is still a personal response, but from what you write below:

[I understand your point. I guess that slightly different 
assumptions/views on MT-related processes exist. The Uses Cases above 
from my point of view all pertain to “single engine” scenarios. ]

I think you express that the current formulation of "Domain" is useful 
for some MT related processes, but not for all. So I'm inclined to 
reject the comment as a "new feature request to address new usage 
scenarios", for reasons and consequences (see "later" tracker product) 
among others of timing, see also
http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0124.html

Best,

Felix

Am 18.01.13 16:50, schrieb Lieske, Christian:
>
> Hi Felix, Jörg, all,
>
> Please find some my thoughts (CL>CL>) on the reply below.
>
> Cheers,
>
> Christian
>
> *From:*Felix Sasaki [mailto:fsasaki@w3.org]
> *Sent:* Donnerstag, 17. Januar 2013 18:26
> *To:* Lieske, Christian
> *Cc:* joerg@bioloom.de; public-multilingualweb-lt-comments@w3.org
> *Subject:* Re: [Issue-75] - Domain
>
> Hi Christian, Jörg, all,
>
> co-chair hat on: I think the idea of "adding domain information" is 
> clear, and Pablo said it could be useful for his customer, and Yves 
> said it could be useful for XLIFF mapping.
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0053.html
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0059.html
> So we can move this topic to the next stage: who from the implementers 
> for domain
> http://htmlpreview.github.com/?https://raw.github.com/finnle/ITS-2.0-Testsuite/master/its2.0/testSuiteDashboard.html
> would implement local domain, and who thinks (this question is 
> important too) that this is worth a delay?
>
> Co-chair hat of, and replying to your proposal at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0087.html
> (replying here so that we have only one thread)
>
> [
>
> CL>>>>> I understand the point. My suggestion would be to refine the requirement for the revised domainMapping that I sketched: the information about the target environment/engine is optional.
> CL>>>>> Thus, you could have the following:
> CL>>>>> <its:domainRule ...
> CL>>>>>        domainMapping=
> CL>>>>>                'MT-engine-X,"automotive auto, medical medicine, 'criminal law' law, 'property law' law"',
> CL>>>>>                 'TM-system-Y,"automotive X, 'criminal law' L, 'property law' law"'
> CL>>>>>                "automotive Z, 'criminal law' C, 'property law' law"'  <---- here is the change (no info about the target environment/engine)
> CL>>>>> />
> CL>>>>>
> CL>>>>> Aside: I am a bit unsure how realistic the scenario "specify domainMapping without knowing the engine/environment" is.
>
> ]
>
> Making the engine information optional doesn't solve the problem I 
> described:
> - domainMapping expresses "choose MT-engine-X"
>
> CL>CL> This is not what I had in mind as semantics for the first 
> parameter of a list item in the revised “domainMapping”. To me, the 
> semantics was “If you pass through MT-engine-X, then work with the 
> following domain information”.
>
> - it also expresses "map the domain 'automotive' to 'auto'
> - later in the workflow there are several engines available: 
> MT-engine-X, MT-engine-Y
> - only MT-engine-Y knows about 'auto', so the "choose MT-engine-X" 
> information from domainMapping disturbes the workflow
>
> Wrt to 'I am a bit unsure how realistic the scenario "specify 
> domainMapping without knowing the engine/environment" is. ': so far it 
> was helpful for starting work on three implementations (if I count 
> correctly) using domain information in MT workflows. See
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Machine_Translation
> http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Online_MT_System_Internationalization
> http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary#Simple_Segmente_Machine_Translation
>
> It even has a benefit not to specify the engine: content can be 
> prepared for processing of all these services. Since there is no need 
> to acomodate "engine" information, the content can choose freely which 
> engine works best - based purely on domain information.
>
> CL>CL> I understand your point. I guess that slightly different 
> assumptions/views on MT-related processes exist. The Uses Cases above 
> from my point of view all pertain to “single engine” scenarios.
>
> CL>CL> In this kind of scenario it is not really necessary to provide 
> information “this is for engine X”. In a “multi-engine” scenarios, the 
> situation is different. In order to see why, one first needs to
>
> CL>CL> acknowledge that at least two flavors of “multi-engine” 
> scenarios exist: multi-engine in pipeline (e.g. first X, then for 
> anything below a confidence of 0.5 Y) vs. multi-engine exclusive (e.g. 
> X for domain “financials”,
>
> CL>CL> Y for domain “health”. In both scenarios, you need a mechanism 
> to specify which domain information is for engine X, and which is for 
> engine Y.
>
>
> So my questions to you, Christian, and to at least above three 
> implementers would be: do you see implementers processing domain, who 
> would be willing to contribute to testing the engine information? If 
> not (again co-chair hat on) we don't have a use case on the group, it 
> seems, and can't bring such a feature through the standardization process.
>
> Best,
>
> Felix
>
> Am 17.01.13 16:07, schrieb Lieske, Christian:
>
>     Hi Jörg, Felix, all,
>
>       
>
>     Unfortunately, I still don't understand, the current draft doesn't have provisions for
>
>       
>
>     CL>>    Global: <its:domainRule selector="/h:html/h:body" its-domain="financials">
>
>     CL>>    Local: <em its-domain="financials">IMF</em>
>
>       
>
>     If we don't have these provisions, we may end up with the messy situation/solution that Jörg sketches.
>
>       
>
>     Cheers,
>
>     Christian
>
>       
>
>     -----Original Message-----
>
>     From: Jörg Schütz [mailto:joerg@bioloom.de]
>
>     Sent: Mittwoch, 16. Januar 2013 15:28
>
>     To:public-multilingualweb-lt-comments@w3.org  <mailto:public-multilingualweb-lt-comments@w3.org>
>
>     Cc:public-multilingualweb-lt-comments@w3.org  <mailto:public-multilingualweb-lt-comments@w3.org>
>
>     Subject: Re: [Issue-75] - Domain
>
>       
>
>     Hi Felix, Christian, and all,
>
>       
>
>     ITS should not be hijacked to take over the role of a workflow engine or
>
>     similar application because there might be several consumers of ITS information...
>
>       
>
>     @Christian > [Could you provide one or two examples/proofs for this?]
>
>       
>
>     Here is an outline of my idea (which potentially also hijacks ITS to
>
>     some extend):
>
>       
>
>     Possible ITS Application Scenario to Extend the "Domain" Data Category
>
>       
>
>     (1) Use (general) domain pointing for the broad classification of your
>
>     content (global reach), i.e. employ the domain data categroy.
>
>     (2) In cases where (1) is either too general (broad), or you want to
>
>     further classify only parts of your content (local reach), use the
>
>     disambiguation data category. This includes the further classifying of a
>
>     sequence of strings which do not represent what usually is called a term
>
>     (domain-specific vocabulary) or a multi-word unit (mwu).
>
>     (3) For the term and mwu case use the terminology data category.
>
>       
>
>     Case (3) is applied as described in the ITS 2.0 specification; always
>
>     consider to link to an appropriate authoritative internal or external
>
>     terminology resource or ontology (e.g. Cyc, Snomed, MeSH, etc.) on which
>
>     both producer and consumer have agreed upon (in this sense ITS is also
>
>     part of a contract).
>
>       
>
>     In this scenario, case (2) is a bit trickier because "officially"
>
>     disambiguation is also applied to meaningful string sequences, i.e. a
>
>     word or a mwu, as in the terminology case, but now we extend this data
>
>     category to arbitary elements, for example an entire paragraph, with the
>
>     restriction that the attributes disambigConfidence and particularly
>
>     disambigGranularity have a broader meaning such as the conceptual
>
>     association to a domain's root element or to certain upper model elements.
>
>       
>
>     HTML Example (local)
>
>     ...
>
>     <p><span its-disambig-confidence="0.9"
>
>       
>
>     its-disambig-class-ref="http://snowowl.sample.com/SNOMED_CT_Concept/Pharmaceutical_Product"  <http://snowowl.sample.com/SNOMED_CT_Concept/Pharmaceutical_Product>>
>
>          Ambroxol has mucolytic and local-anaesthetic pharmacological effects
>
>          </span>.
>
>     </p>
>
>     ...
>
>       
>
>     Note: In this example, only the disambigClassRef attribute is used to
>
>     account for the "broader" employment of the data category.
>
>       
>
>     This use case scenario might sound like a bootstrap paradox... but this
>
>     is one possibility of using ITS 2.0 ... ;-)
>
>       
>
>     All the best -- Jörg
>
>       
>
>     On Jan 16, 2013, at 14:23 (CET), Felix Sasaki wrote:
>
>         Am 16.01.13 12:15, schrieb Lieske, Christian:
>
>             Hi Felix, Pablo, all,
>
>               
>
>             Please find some my thoughts on the reply below.
>
>               
>
>             Cheers,
>
>             Christian
>
>               
>
>             -----Original Message-----
>
>             From: Felix Sasaki [mailto:fsasaki@w3.org]
>
>             Sent: Mittwoch, 16. Januar 2013 08:07
>
>             To: Pablo Nieto Caride
>
>             Cc: Lieske, Christian;public-multilingualweb-lt-comments@w3.org  <mailto:public-multilingualweb-lt-comments@w3.org>
>
>             Subject: Re: [Issue-75] - Domain
>
>               
>
>             (trying to minimize the number of mails, hence replying to several
>
>             aspects in this mail)
>
>               
>
>             Hi Christian, Pablo, all,
>
>               
>
>             at Christian: you write at
>
>             http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0034.html
>
>               
>
>             that 2b of your comment is resolved. How about 2a? If you are not
>
>             satisfied with the replies in this thread, could you propose a change to
>
>             the spec?
>
>               
>
>             CL>> Currently, I consider 2a as being unresolved.
>
>             CL>> Addressing 2a (capture the information "This is for component X")
>
>             to me does not appear to be straightforward, since
>
>             CL>> you would need to accommodate an addition piece of information.
>
>             One could imagine representations such as
>
>             CL>>     <its:domainRule ...
>
>             CL>>        domainMapping=
>
>             CL>>            'MT-engine-X,"automotive auto, medical medicine,
>
>             'criminal law' law, 'property law' law"',
>
>             CL>>             'TM-system-Y,"automotive X, 'criminal law' L,
>
>             'property law' law"'
>
>             CL>>      />
>
>           
>
>         Such a specification of the engine could lead to conflicting information:
>
>         MT-engine-X has a module for automotive. If however the engine is not
>
>         mentioned in a domain mapping, but a different one (which does not have
>
>         the automotive module): which one to choose?
>
>         It looks like what you add as information (= choosing the engine) is
>
>         something one would do after the domain mapping, not at the same time.
>
>         Otherwise you may run into the conflict described above.
>
>           
>
>             CL>> This, however, is not in line with the current normative text on
>
>             "domain".
>
>               
>
>             Wrt to your proposal below (add a note about 2b to the spec): sure, do
>
>             you want to draft something? The same for 2a (if you don't have a
>
>             specific solution in mind, stating the issue might already be helpful).
>
>               
>
>             CL>> How about the following additional paragraph for the first note
>
>             in (http://www.w3.org/TR/2012/WD-its20-20121206/#domain) for 2b?
>
>             CL>>
>
>             CL>> "domainMapping" even allows "domain" systems/hierarchies to be
>
>             encoded. domainMapping="FIN, 'A A-1 A-1-X'" could for example be used
>
>             to capture the following information:
>
>           
>
>         Would it be OK to re-formulate that sentence above like this:
>
>         [
>
>         the domainMapping attribute does not itself specify how to encode
>
>         "domain" systems/hierachies. An application using domainMapping hence is
>
>         free to work with application specific hierarchies to capture
>
>         information like:
>
>         ]
>
>           
>
>         It seems this is more in line with the language tag example: it is
>
>         saying that applications can do things that are on purpose underspecified.
>
>             CL>> a. There exists a domain system that includes domains (e.g. A),
>
>             sub-domains (e.g. A-1), and sub-subdomains (e.g. A-1-X)
>
>             CL>> b. Prefer the lowest level in the system (e.g. work with an MT
>
>             engine for A-1-X if available, otherwise work with one for A-1 or even
>
>             A if available)
>
>             CL>>
>
>             CL>> This "power to encode and to interpret" is similar to matching of
>
>             language tags, seehttp://tools.ietf.org/html/rfc4647#section-3.2.
>
>             CL>> "Language tag matching is a tool, and does not by itself specify
>
>             a  complete procedure for the use of language tags ...
>
>             CL>> The matching specification itself makes clear that it there are many
>
>             CL>> aspects that are left out for actually using language tags. But
>
>             having no matching at all would be even less interoperability, hence
>
>             the "imperfect" matching scheme.
>
>           
>
>         Best,
>
>           
>
>         Felix
>
>           
>
>               
>
>             Wrt to 1 (local domain): would this also be relevant for other
>
>             implementers of domain (asking again)?
>
>           
>
>         About this one: we have Pablo and Yves saying in separate mails this
>
>         might be of interest - enough to get through the w3c process. But is it
>
>         worth another last call period?
>
>           
>
>         Best,
>
>           
>
>         Felix
>
>           
>
>               
>
>             Best,
>
>               
>
>             Felix
>
>               
>
>             Am 15.01.13 19:32, schrieb Pablo Nieto Caride:
>
>                 Hi all,
>
>                   
>
>                 Felix, I think that a local domain could be interesting, at least WP4
>
>                 client would be happy with that, I don't know what the others think.
>
>                   
>
>                 Christian, regarding the domain mapping I think that Yves and Felix
>
>                 are right, you can implement your own mapping, you can adapt it to
>
>                 specific MT if you want, as for the example <its:domainRule
>
>                 selector="/h:html/h:body" ... domainMapping="FIN, 'A A-1 A1-A1X'"/>,
>
>                 I certain MT Systems can manage the precedence by themselves.
>
>                   
>
>                 Cheers,
>
>                 Pablo.
>
>                 Hi,
>
>                   
>
>                 I wonder if it would be good idea to add the scenario I have provided
>
>                 (domain "system") and Felix' information on how to approach it
>
>                 (namely similar to language tag matching) to one of the "notes" that
>
>                 currently are in place for in the "domain" section.
>
>                 Best regards,
>
>                 Christian
>
>                   
>
>                 -----Original Message-----
>
>                 From:christian.lieske@sap.com  <mailto:christian.lieske@sap.com>
>
>                 Sent: Dienstag, 15. Januar 2013 08:10
>
>                 To: 'Felix Sasaki';public-multilingualweb-lt-comments@w3.org  <mailto:public-multilingualweb-lt-comments@w3.org>
>
>                 Subject: RE: [Issue-75] - Domain
>
>                   
>
>                 Hi Felix,
>
>                   
>
>                 I follow your line of thought related to the similarities between
>
>                 "domainMapping" and matching of language tags. Thus, it would be OK
>
>                 for me to consider 2.b of
>
>                 http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Jan/0022.html
>
>                 closed.
>
>                   
>
>                 Cheers,
>
>                 Christian
>
>                   
>
>                 -----Original Message-----
>
>                 From: Felix Sasaki [mailto:fsasaki@w3.org]
>
>                 Sent: Montag, 14. Januar 2013 19:27
>
>                 To:public-multilingualweb-lt-comments@w3.org  <mailto:public-multilingualweb-lt-comments@w3.org>
>
>                 Subject: Re: [Issue-75] - Domain
>
>                   
>
>                 Hi Christian, Yves, all,
>
>                   
>
>                 Am 14.01.13 16:52, schrieb Yves Savourel:
>
>                     Hi Christian, all,
>
>                       
>
>                       
>
>                     CL>> It seems as if I didn't manage to my point about this aspect of
>
>                     "domain" is clear.
>
>                     CL>> Let me to try to provide a remedy by adding to my original
>
>                     comment:
>
>                     CL>> Something like its-domain="financials" could not just be imagined
>
>                     CL>>to work in  a global rule (e.g. instead of a pointer); in
>
>                     addition, a local use of "domain"
>
>                     CL>> could be imagined
>
>                     CL>>    Global: <its:domainRule selector="/h:html/h:body"
>
>                     its-domain="financials">
>
>                     CL>>    Local: <em its-domain="financials">IMF</em>
>
>                       
>
>                     So (If I'm getting this right) you'd like a way to override the
>
>                     domain for spans of content? (Since the Dublin Core in HTML doesn't
>
>                     let you do that (the subject is define at the document level)).
>
>                       
>
>                     I think one of the reasons I hear early on was that today it would
>
>                     be difficult to make that distinction at the MT level. But I suppose
>
>                     MT engine selection is not the only application for domain. Maybe
>
>                     others have additional reason why we don't have a local domain?
>
>                 Given the implementation driven approach we have made so far I would
>
>                 ask: is there an implementation on the horizon that would process
>
>                 local domain?
>
>                   
>
>                     CL>> Why do you think that the scenario that I sketch (multiply domain
>
>                     CL>> "systems" used in a processing chain) implies that a standard
>
>                     exists?
>
>                     CL>> I would rather think that the implication is the other way round:
>
>                     CL>> Since there is no standard, there is a need to accommodate
>
>                     heterogeneity.
>
>                       
>
>                     I agree, but so far that has not been part of the scope of ITS.
>
>                       
>
>                       
>
>                     CL>> I guess your point is valid in the sense that one could go for
>
>                     CL>> something like <its:domainRule selector="/h:html/h:body" ...
>
>                     CL>> domainMapping="FIN, 'A A-1 A1-A1X'"/>.
>
>                     CL>> However, this would require that additional information would have
>
>                     CL>> to be captured elsewhere (so that for example, the precedence
>
>                     CL>> 'A > A-1 > A1-A1X' could be captured).
>
>                       
>
>                     ITS doesn't prescribe what the right part of the mapping must be or
>
>                     how it should be used.
>
>                     It's really just a way to allow user-defined mechanisms to be
>
>                     connected to the input metadata.
>
>                     I suppose it is also beyond the scope of ITS.
>
>                 As I understand Christian he does not ask to prescripe a mapping, but
>
>                 "to accomodate for heterogeneity": allow people to formulate their own
>
>                 mapping.
>
>                   
>
>                 I think we do that: we don't make the usage of the mapping attribute
>
>                 mandatory. It is an optional attribute. If "our" mapping algorithm
>
>                 doesn't respond to a specific mapping approach, everybody can implement
>
>                 his own mapping.
>
>                   
>
>                 This is similar to matching of language tags, see
>
>                 http://tools.ietf.org/html/rfc4647#section-3.2
>
>                 "Language tag matching is a tool, and does not by itself specify a
>
>                 complete procedure for the use of language tags.  Such procedures are
>
>                 intimately tied to the application protocol in which they occur."
>
>                 The matching specification itself makes clear that it there are many
>
>                 aspects that are left out for actually using language tags. But having
>
>                 no matching at all would be even less interoperability, hence the
>
>                 "imperfect" matching scheme.
>
>                   
>
>                 Best,
>
>                   
>
>                 Felix
>
>                   
>
>                     cheers,
>
>                     -yves
>
>                       
>
>       
>

Received on Saturday, 19 January 2013 07:51:26 UTC