Re: [All] domain data category section proposal, please review

Hi Felix,
One question on the domainMapping example you give for the domain data 
category. This assumes the workflow has a single canonical set of IDs 
identifying 'auto', 'medicine', 'law', but this may not always be the 
case, e.g. where SMT engines are trained on a mix of parallel data with 
their own separate corpora domain naming schemes. So a simple naming 
scheme means that the workflow provider must ensure consistency of that 
scheme and that the document editor (often the client) has knowledge of 
that scheme.

So could the data category  as is accommodate multiple naming schemes 
(e.g. from the client and from third parties) within the workflow by 
simply using a URL instead of a simple name?  e.g.

domainMapping="automotive auto, medical medicine, 'criminal law' http://www.taus.org/domain/law, 'property law' http://www.client.com/domain-names/law"


cheers,
Dave

On 29/06/2012 07:52, Felix Sasaki wrote:
> Hi all,
>
> FYI, I wrote the domain section based on the initial proposal and this 
> thread, please have a look at 
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#domain
>
> This closes ACTION-144. I also updated
>
> http://www.w3.org/International/multilingualweb/lt/wiki/Implementation_Commitments#New_ITS_2.0_categories
> With a link to the section.
>
> Best,
>
> Felix
>
> 2012/6/27 Felix Sasaki <fsasaki@w3.org <mailto:fsasaki@w3.org>>
>
>     Declan, all, thanks a lot for your feedback. I think we are close
>     to consensus about this, and I have given myself an ACTION-144 to
>     put this into the draft by next week.
>
>     Best,
>
>     Felix
>
>
>     2012/6/26 Declan Groves <dgroves@computing.dcu.ie
>     <mailto:dgroves@computing.dcu.ie>>
>
>         Felix,
>
>         Thanks for your proposal for domain category, which I think
>         outlines the best approach for dealing with the complex domain
>         category so good job!
>
>         The data category agnostic approach makes more sense, and
>         allows for more flexibility, particularly for existing
>         commercial MT service providers who will already have their
>         own list of pre-defined domain categories. I am not too
>         familiar with DCR so I dont feel qualified to comment on
>         Arle's suggestion. o
>
>         Using Dublin Core, however, is a good pointer to use due to
>         its fairly wide adoption (on this - is it worth providing a
>         URL to the relevant Dublin Core content?) - I know that many
>         MT systems that do implement domain metadata do so using
>         high-level domains either taken directly from Dublin Core or
>         adapted from it (e.g. I think the LetsMT project use dublin
>         core as a starting point for defining domain). One thing to
>         keep in mind is that the proposal should be as clear and
>         concise as possible. In terms of providing pointers to what
>         codes people can use, I think we are better off limiting this
>         as promoting interoperabilityis key and providing a list of
>         alternative implementation strategies may over-complicate things.
>
>         It is good to emphasise the optional domainMapping attribute,
>         and I would perhaps add to the paragraph concerning the
>         explanation of domainMapping that although optional, it is
>         recommended that details for the attribute be provided. For
>         our implementation, I expect to carry out something similar to
>         Thomas - create a mapping from the provided domain metadata to
>         domains that are available for our trained systems.
>
>         typo: "In source content... " -> "In the source content..."
>               "no agreed upon set of value sets" -> "no agreed upon
>         value sets"
>
>         Declan
>
>
>
>         On 25 June 2012 15:43, Felix Sasaki <fsasaki@w3.org
>         <mailto:fsasaki@w3.org>> wrote:
>
>             Hi Arle, Thomas, all,
>
>             thanks for your feedback, Thomas, I'll fix the typos you
>             found.
>
>             2012/6/25 Arle Lommel <arle.lommel@dfki.de
>             <mailto:arle.lommel@dfki.de>>
>
>                 Was this an area where the ISO data category registry
>                 might come into play?
>
>
>             No - this proposal is "data category agnostic". The idea
>             is to provide a mechanism to map existing value lists
>             (like the one Thomas mentioned).
>
>                 That is, could we declare an agreed upon selection of
>                 fairly broad top-level domains to promote
>                 interoperability while still allowing for
>                 specification by users?
>
>
>
>             After our discussion in Dublin and quite a few mails about
>             this, see e.g. the summary at
>             http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0165.html
>             or David's proposal at
>             http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0079.html
>
>             I don't see an agreement for even top level domains.
>
>
>                 Unfortunately there is a lot of complexity around this
>                 issue in general that we will not resolve and that may
>                 indeed be fundamentally unresolvable. But perhaps
>                 using the DCR as a place where domain ontologies can
>                 be declared in an authoritative resource and pointed
>                 to we could at least provide a way for someone to
>                 share what they mean.
>
>
>
>             There are so many running systems using their own value
>             lists for domain - I wouldn't expect that Lucy software or
>             others would change their systems. The benefit they would
>             get with the proposal in this thread is that connecting
>             systems (e.g. MT + CMS) gets easier.
>
>             Of course one could point users to what codes they should
>             use. The dublin core subject field I have put into the
>             draft is such a pointer. In addition I would be happy to
>             name DCR as another area to look into, like TAUS top level
>             categories, Let's MT top level categories, etc. That is,
>             of course we want people to be aware of DCR.
>
>             I also saw your question wrt DCR in the other thread, but
>             I also don't recall an area where we would have a direct
>             dependency. But as I said above, it would be good to
>             inform readers of ITS 2.0 about where relying on DCR makes
>             sense.
>
>             A related question: if I want to refer to DCR in an HTML
>             "meta" element, how would the DCR "scheme" be identified?
>             Here is an example from dublin core:
>
>             <meta name="DCTERMS.issued" scheme="DCTERMS.W3CDTF"
>             content="2003-11-01" />
>
>
>             If there is an approach to do that with DCR, I think we
>             should have an example about it in ITS 2.0. Maybe you can
>             check with the DCR experts in Madrid?
>
>
>             Best,
>
>
>             Felix
>
>
>                 Arle
>
>                 -- 
>                 Arle Lommel
>                 Berlin, Germany
>                 Skype: arle_lommel
>                 Phone (US): +1 707 709 8650 <tel:%2B1%20707%20709%208650>
>
>                 Sent from a mobile device. Please excuse any typos.
>
>                 On Jun 25, 2012, at 16:02, "Thomas Ruedesheim"
>                 <thomas.ruedesheim@lucysoftware.com
>                 <mailto:thomas.ruedesheim@lucysoftware.com>> wrote:
>
>>                 Hi Felix,
>>                 I agree with your proposal. (There are just 2 typos
>>                 in the examples: "" in domainPointer attributes.)
>>                 Lucy's MT engine accepts a global SUBJECT_AREAS
>>                 parameter holding a list of domain names. Domains are
>>                 organized in a hierarchy.
>>                 Here is a short excerpt(first 2 levels):
>>                   General Vocabulary
>>                 Common Social Voc.
>>                 Art & Literature
>>                 Ecology, Environment Protection
>>                 Economy & Trade
>>                 Law & Legal Science
>>                 ...
>>                 Common Technical Voc.
>>                 Agriculture & Fishing
>>                 Civil Engineering
>>                 Data Processing
>>                 ...
>>                 We will read the meta data and apply the mapping. Of
>>                 course, the mapping is specific for the used MT tool.
>>                 Cheers,
>>                 Thomas
>>                 ------------------------------------------------------------------------
>>                 *From:* Felix Sasaki [mailto:fsasaki@w3.org
>>                 <mailto:fsasaki@w3.org>]
>>                 *Sent:* Montag, 25. Juni 2012 08:48
>>                 *To:* public-multilingualweb-lt@w3.org
>>                 <mailto:public-multilingualweb-lt@w3.org>
>>                 *Subject:* [All] domain data category section
>>                 proposal, please review
>>
>>                 Hi all,
>>
>>                 I have created a proposal for the domain data
>>                 category, see attachment. This would resolve
>>                 ISSUE-11, with the input from ACTION-87 taken into
>>                 account.
>>
>>                 Declan, Thomas, I think this is esp. important for
>>                 you - we need to know whether an implementation as
>>                 described would be feasible and useful for you. Of
>>                 course, others, feel welcome to contribute.
>>
>>                 Please make comments in this thread - I will use them
>>                 to provide another version of the section.
>>
>>                 Thanks,
>>
>>                 Felix
>>
>>                 -- 
>>                 Felix Sasaki
>>                 DFKI / W3C Fellow
>>
>
>
>
>             -- 
>             Felix Sasaki
>             DFKI / W3C Fellow
>
>
>
>
>         -- 
>         Dr. Declan Groves
>         Research Integration Officer
>         Centre for Next Generation Localisation (CNGL)
>         Dublin City University
>
>         email: dgroves@computing.dcu.ie
>         <mailto:dgroves@computing.dcu.ie><mailto:dgroves@computing.dcu.ie>
>         phone: +353 (0)1 700 6906
>
>
>
>
>     -- 
>     Felix Sasaki
>     DFKI / W3C Fellow
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>

Received on Wednesday, 4 July 2012 01:16:07 UTC