IssuesAndProposedFeatures

From ITS
Jump to: navigation, search

Issues and Proposed Features (For updating ITS 2.0)

Currently none. NOTE: when adding a proposal here, plase use the structure like below, that is: provide the proposal, a rationale, depending on the concreteness even text proposals and examples.

Proposal: Missing support for localizable items that are not translatable

Currently just a placeholder. Details are at https://www.w3.org/International/multilingualweb/lt/track/issues/95

Proposal: "Readiness" data category

See thread at http://lists.w3.org/Archives/Public/public-i18n-its-ig/2013Aug/0029.html, you can follow the discussions from here.

Rationale

Readiness is an extension data category for ITS 2.0 (http://www.w3.org/2008/12/its-extensions). It indicates the readiness of a document for submission to l10n, i18n and other processes and provides an estimate of when it will be ready for a particular process.

Proposed Text

Definition

Readiness is an extension data category for ITS (http://www.w3.org/2008/12/its-extensions). It indicates the readiness of a document for submission to l10n, i18n and other processes and provides an estimate of when it will be ready for a particular process. This data category addresses various challenges:

  • It makes more flexible and independent the interoperability connections (WebServices, RESTful), providing relevant information within the content whether it is XML or XLIFF, independently of the connection, e.g. between a Content Management System (CMS) and a Translation Management System (TMS).
  • It enables the extensible specification of the process required for systems that directly takes as input the published content.
  • It allows to automate processes for both Global Management Systems (GMS) and Multilingual Publication Systems.


Implementation

The Readiness data category can be expressed with global rules, or locally on individual elements. For elements, the data category information inherits to the textual content of the element, including child elements and attributes.

Data category (identifier) Local Usage Global, rule-based selection Global adding of information Global pointing to existing information Default Values Inheritance for elements nodes
Readiness (readiness) Yes Yes No Yes No Textual content of element, including child elements and attributes

The date and time datatypes described here (updated, readyAt, and completeBy) were inspired by ISO 8601 on its latest version. The lexical representation MUST be as follows YYYY-MM-DDThh:mmTZD, where:

  • YYYY is a four-digit numeral that represents the year (legal values are from 0 to 9);
  • MM is a two-digit numeral that represents the month (from 01=January to 12= December);
  • DD is a two-digit numeral that represents day of month (the two digits in a DD format can have values from 1 to 28 if the month value equals 2, 1 to 29 if the month value equals 2 and the year is a leap year, 1 to 30 if the month value equals 4, 6, 9 or 11, and 1 to 31 if the month value equals 1, 3, 5, 7, 8, 10 or 12);
  • T is a separator indicating that time-of-day follows;
  • hh is a two-digit numeral that represents the hour (00 through 23, am/pm NOT allowed);
  • mm is a two-digit numeral that represents the minutes (00 through 59);
  • ss is a two-digit numeral that represents the seconds (00 through 59);
  • TZD represents the time zone designator with the value Z for UTC or +hh:mm or -hh:mm, with the hour magnitude limited to at most 14, and the minute magnitude limited to at most 59, except that if the hour magnitude is 14, the minute value must be 0, where:
    • hh is a two-digit numeral (with leading zeros as required) that represents the hours,
    • mm is a two-digit numeral that represents the minutes,
    • + indicates a nonnegative duration,
    • - indicates a nonpositive duration.

For example, 2014-01-10T12:00:00+05:00 is the same as 2014-01-10T07:00:00Z.


GLOBAL: The readinessRule element contains the following:

  • A required selector attribute. It contains an absolute selector that selects the nodes to which this rule applies.
  • Exactly one of the following:
    • A readyToProcess attribute that contains a comma separated list of values of the process requested for the content [Review semantics of the list and Add Unicode Codes for languages] in the order with which they should be applied. For possible values see list below, and other possible user defined values are possible.
    • An readyToProcessRef that contains an absolute selector pointing to an external specification of the processes that should be applied to the content, e.g. an executable workflow specification.
  • An optional updated attribute. Indicate the time at which the content to be processed was last updated. This enables checks to be performed on the update of the content to be processed.
  • An optional priority attribute that indicates the priority of the content for the process. For example values could be selected from values 1/4|2/4|3/4|4/4. The lower priority would be 1/4 (1 out of 4 that is the total number of priorities), the highest would be 4/4 (urgent).
  • An optional readyAt attribute. Defines the time the content is ready for the process.
  • An optional completeBy attribute that provides a target date-time for completing the process.

Note:

For readyToProcessRef it can be a default list of process in a stable URI to be used or custom URIs for concrete implementations published by the users.

Example 1: The Readiness data category used globally with HTML5

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset=utf-8> 
   <title>ITS 2.0 – Java Hello World!</title>
   <link href="Rules.xml" rel="its-rules"/>
  </meta>
 </head>
 <body>
  <section>
   <span id="languageSelector">
	<ul>
	 <li><a href="/en/index.html" translate="no">English</a></li>
	 <li><a href="/es/index.html" translate="no">Castellano</a></li>
	</ul>
   </span>
  </section>
  <section>
   <p><span> 
	Here it's the code of a "Hello World!" application in Java
   </span></p>
   <p><code translate="no">
    class HelloWorldApp {
     public static void main(String[] args) {
      System.out.println("Hello World!"); // Display the string.
     }
    }
   </code></p>
   <p><span>
    The "Hello World!" application consists of two primary components: the HelloWorldApp class definition, and the main method. Now compile and run it!
   </span></p>
  </section>
 </body>
</html>

XML Rules: Rules.xml

<its:rules version="2.0" xmlns:its="http://www.w3.org/2005/11/its" xmlns:h="http://www.w3.org/1999/xhtml">
<itsx:readinessRule readyToProcess="posteditQA" updated="no" priority="2/4" completeBy="2013-12-16T16:24+01:00" selector="//h:section/h:p/h:span" xmlns:itsx="http://www.w3.org/2008/12/its-extensions"/>
</its:rules>

Note: A default list of values can be: [For further work. Investigate encoding. Suggest subclassing from prov:activity]

ProcessDefinition
contentQuote indicates that a quoting or pricing is requested, not to perform the job
contentAlignment in case the content is to add to a Translation Memory (?)
contentL10N localize (cultural and formal adaptation to local) the content
contentI18N internationalize the content
contentDtp desktop publishing of content
contentSubtitle subtitling of content
contentVoiceOver voice over of content
sourceRewrite rewrite the source content
hTranslate human translation
mTranslate machine translation
hTranscreate human transcreation
posteditQA human postediting of mTranslate
reviewQA human review for quality assurance only the target text, without the source text (see UNE 15038 “review”), by an expert for instance
reviseQA human revision for quality assurance examining the translation and comparing source and target (see UNE 15038 “revision”)
proofQA human checking of proofs before publishing for quality assurance (see UNE15038 “proofreading”)
sourceReview review the source content
sourceTranscribe transcribe the source content
sourceTransliteration transliterate the source content
importCMSWF ready for importing of delivery in the CMS
validationCMSWF ready for validation of delivery in the CMS
publishCMSWF ready for publishing of delivery in the CMS


LOCAL: The following local markup is available for the Readiness data category:

  • Exactly one of the following:
    • A readyToProcess attribute that contains a comma separated list of values of the process requested for the content [Review semantics of the list and Add Unicode Codes for languages] in the order with which they should be applied. For possible values see list below, and other possible user defined values are possible.
  • An optional updated attribute. Indicate the time at which the content to be processed was last updated. This enables checks to be performed on the update of the content to be processed.
  • An optional priority attribute that indicates the priority of the content for the process. For example values could be selected from values 1/4|2/4|3/4|4/4. The lower priority would be 1/4 (1 out of 4 that is the total number of priorities), the highest would be 4/4 (urgent).
  • An optional readyAt attribute. Defines the time the content is ready for the process.
  • An optional completeBy attribute that provides a target date-time for completing the process.
Example 2: HTML with local inline markup

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset=utf-8> 
   <title>ITS 2.0 – Java Hello World!</title>
  </meta>
 </head>
 <body>
  <section>
   <span id="languageSelector">
	<ul>
	 <li><a href="/en/index.html" translate="no">English</a></li>
	 <li><a href="/es/index.html" translate="no">Castellano</a></li>
	</ul>
   </span>
  </section>
  <section>
   <p><span ready-to-process="posteditQA" updated="no" priority="2/4" complete-by="2013-12-16T16:24+01:00"> 
	Here it's the code of a "Hello World!" application in Java
   </span></p>
   <p><code translate="no">
    class HelloWorldApp {
     public static void main(String[] args) {
      System.out.println("Hello World!"); // Display the string.
     }
    }
   </code></p>
   <p><span>
    The "Hello World!" application consists of two primary components: the HelloWorldApp class definition, and the main method. Now compile and run it!
   </span></p>
  </section>
 </body>
</html>

[Add examples for XML]

Proposal: Update to Localization Quality Issue

Placeholder for addition of comments received from Christian Lieske that were not already addressed in ITS 2.0

See: http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/2013Aug/0000.html

(Historical) Issues and Proposed Features (For updating ITS 1.0)

NOTE: below Issues and Proposed Features have been made for ITS 1.0. Please do not update them but rather enter new items for ITS 2.0 (which may refer to the below).

Proposal: targetPointer

This feature would complement the global rule for the Translate data category.

Rational

There is no way in the 1.0 translateRule element to handle multilingual document designed to have a source and one or more target text. XML format such as XLIFF, TMX, TS and others have such capability but cannot be processed without some kind of pre-processing by an ITS-aware tool.

The proposed additional attribute to the translateRule would simply provide a way to point to the node where the translation of the selected node should go or could be found.

Proposed Text

GLOBAL: The translateRule element contains the following:

  • A required selector attribute. It contains an XPath expression which selects the nodes to which this rule applies.
  • A required translate attribute with the value "yes" or "no".
  • An optional targetPointer attribute. It contains the relative XPath expression which selects the node where the translation of the node to which this rule applies should be located.

For example:

 <file>
  <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"
   xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
   <its:translateRule translate="no" selector="//file"/>
   <its:translateRule translate="yes" selector="//source"
    itsx:targetPointer="../target"/>
  </its:rules>
  <entry xml:id="one">
   <source>Text one of the source</source>
   <target>Text one of the target</target>
  </entry>
  <entry xml:id="two">
   <source>Text two of the source</source>
   <target></target>
  </entry>
 </file>

Proposal: idValue

This feature would complement the global rule for the Translate data category.

Rational

There is no way in the 1.0 translateRule element to associate content with an identifier. Using identifiers with content is a very common activity in localization and follows the best practices for internationalization (See http://www.w3.org/TR/xml-i18n-bp/#DevUniqueID). For example unique IDs can be used to leverage the same translation from one version of the document to another.

The attribute xml:id is the standard way of representing an identifier in ITS (See http://www.w3.org/TR/xml-i18n-bp/#AuthUniqueID. However, in some case the document may be using other attributes.

Having a mechanism in place to associate a given identifier value for a selected node would be a significant improvement of the ITS capabilities.

The proposed additional attribute to the translateRule would simply provide a way to construct an identifier value based on the context of the selected node that should be identified.

Proposed Text

GLOBAL: The translateRule element contains the following:

  • A required selector attribute. It contains an XPath expression which selects the nodes to which this rule applies.
  • A required translate attribute with the value "yes" or "no".
  • An optional idValue attribute. It contains an XPath expression which constructs a string corresponding to the identifier of the node to which this rule applies should be located. If the attribute xml:id is present for the selected node the value of the xml:id attribute takes precedence over the idValue value.

For example:

 <file>
  <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"
   xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
   <its:translateRule translate="no" selector="//file"/>
   <its:translateRule translate="yes" selector="//source"
    itsx:idPointer="../@name"/>
  </its:rules>
  <entry name="one">
   <source>Text one of the source</source>
  </entry>
  <entry name="two">
   <source>Text two of the source</source>
  </entry>
 </file>

The idValue attribute allows to build 'complex' values based on different attributes, element or event hard-coded text. Any of the String functions offered by XPath can be used.

For example, in the document below, the two elements <text> and <desc> are translatable, but they have only one corresponding identifier, the name attribute in their parent element. To make sure you have a unique identifier for both the content of <text> and the content of <desc>, you can use the rules set in the example. The XPath expression concat(../@name, '_t') will give the identifier "id1_t" and the expression concat(../@name, '_d') will give the identifier "id1_d".

 <doc>
  <its:rules version="1.0" xmlns:its="http://www.w3.org/2005/11/its"
   xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
   <its:translateRule selector="//text" translate="yes" itsx:idValue="concat(../@name, '_t')"/>
   <its:translateRule selector="//desc" translate="yes" itsx:idValue="concat(../@name, '_d')"/>
  </its:rules>
  <msg name="id1">
   <text>Value of text</text>
   <desc>Value of desc</desc>
  </msg>
 </doc>

Proposal: Local "Elements within Text"

See original proposal at http://lists.w3.org/Archives/Public/public-i18n-its-ig/2008Oct/0014.html

This would complement the current Global rule of the "Elements Within Text" data category.

Rational

There is no local rule for the "Element Within Text" data category. Have a local rule would allow ITS processor without XPath support to still identify element nested or within text from other elements.

Proposed Text

LOCAL: The following local markup is available for the Element Within Text data category:

A withinText attribute with the value "yes, "no or "nested".

Example: the Element Within Text data category expressed locally.

The itsext:withinText attribute indicates that the <bold> element should be treated as part of the flow of the <par> element content.

 <text
  xmlns:its="http://www.w3.org/2005/11/its"
  xmlns:itsx="http://www.w3.org/2008/12/its-extensions"
  its:version="1.0">
  <body>
   <par>Text with <bold itsx:withinText='yes'>bold</bold>.</par>
  </body>
 </text>

Proposal: whiteSpaces

This feature would complement the global rule for the Translate data category.

Rational

There is no way in the 1.0 translateRule element to indicate that the whitespaces of a content must be preserved. The xml:space attribute provided for this by XML is seldom used and its absence causes lost of information in many situation.

Having a mechanism in place to associate whietspace handling information for a selected node would be a significant improvement of the ITS capabilities.

Proposed Text

GLOBAL: The translateRule element contains the following:

  • A required selector attribute. It contains an XPath expression which selects the nodes to which this rule applies.
  • A required translate attribute with the value "yes" or "no".
  • An optional whiteSpaces attribute with a value "preserve" or "default". If the attribute xml:space is present for the selected node the value of the xml:space attribute takes precedence over the whiteSpaces value.

For example:

 <file>
  <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"
   xmlns:itsx="http://www.w3.org/2008/12/its-extensions">
   <its:translateRule translate="yes" selector="//source"
    itsx:whiteSpaces="preserve"/>
  </its:rules>
  <entry name="one">
   <source>Line one.
 Line 2.
 Line 3.</source>
 </file>

Proposal: "Context" data category

See original proposal at http://lists.w3.org/Archives/Public/public-i18n-its-ig/2008Oct/0015.html, thread at http://lists.w3.org/Archives/Public/public-i18n-its-ig/2008Nov/0002.html and thoughts in http://lists.w3.org/Archives/Public/public-i18n-its-ig/2010Sep/0003.html

Rationale

"Context" is an important concept for globalization-related processes such as human translation (actually to any communicative act). Unfortunately, the notion of "context" is not defined clearly. Sample definitions are the following:

  • The "Content Category" in which an expression appears; examples: the User Interface of a software program, the owner's manual for a device
  • The "Semantic Class" to which an expression belongs; example: a quotation - marked with a "cite" element - in an (X)HTML page
  • The "Structural Class" to which an expression belongs; example: a heading in an (X)HTML page
  • The "Presentational Class" to which an expression belongs; example: a string marked up with the "i" element in an (X)HTML page
  • The "Textual Environment" (sometimes referred to as "co-text"); examples: the paragraphs preceding or following an expression, the words preceding or following an expression
  • The "Locale-related Considerations" for an expression; example: considerations for choosing the correct wording (eg. in terms of honorifics) for Japanese
  • The "Production environment considerations" for an expression; example: company-specific standards and guidelines to which a text has to adhere
  • The "Related Objects" belonging to an expression (this is similar to the "Textual Environment"); example: all menu entries for a particular menu on a User Interface
  • ...

Given this ambiguity for the concept of "context", "context" in ITS is clearly defined. The definition includes a a set of categories for contexts. Examples:

  • string category (as possible incarnation of for example "Semantic Class" or "Structural Class"); examples: button, label, caption (see [a] in http://lists.w3.org/Archives/Public/public-i18n-its-ig/2008Oct/0015.html)
  • grouping category (as possible incarnation of for example "Textual Environment" or "Related Objects"); examples: the "alt" text for an "img" element, all menu entries for a particular menu on a User Interface
  • ...

Note: The examples above show that a single string may be related to several "context" (e.g. be classified as both "alt" and as "belonging to img X").

Proposed Text

ITS implements "context" based on ideas from RDFa 1.1

Example:

 <file>
  <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"
   xmlns:itsx="http://www.w3.org/2008/12/its-extensions" its:prefix="dc:http://purl.org/dc/terms">
   <itsx:contextRule category="semantic" property="dc:title" selector="//source" />
  </its:rules>
  <entry name="one">
  <source>Veni vidi vici</source>
 </file>

In this example, all "source" elements are categorized is "title". This is done by relating the "title" elements to a data category from Dublin Core.

Proposal: localeSpecificContent

Add a data category for indicating that certain content is only relevant for a specific locale.

Rationale

In some contexts, a single content container (e.g. a file) "hosts" content for several locales/countries.

Example:

 <c>
  <item id="1">The German law ...</item>
  <item id="2">The French law ...</item>
  <item id="3">The Italian law ...</item>
 </c>

In this example, the content of the individual "item" elements is meant to be relevant for only one single locale/country: for Germany, France and Italy respectively. Accordingly, there may be a need to make localization-related processes "sensitive" to this information. Content that for example is only relevant for Italy may not have to be considered for a translation in to French.

Currently, ITS does not include a data category for capturing "this is locale-specific". This type of data category can, however, help to improve localization processes. It can for example descrease translation efforts and costs (since for example "Italy only" content may not need to be translated into French).

Note: The proposed data category is different from xml:lang. xml:lang is used to identify the natural or formal language in which the content is written. The proposed ITS data category will identify if specific content pertains only to a certain locale/country.

Proposed Text

GLOBAL: The localeSpecificContentRule element contains the following:

  • A required selector attribute. It contains an XPath expression which selects the nodes to which this rule applies. The selector identifies content that pertains only to a certain locale/country.

For example:

 <c>
  <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="1.0"
   <its:localeSpecificContentRule selector="//*[@id='1']" locale="de-DE"/>
  </its:rules>
  <item id="1">The German law ...</item>
  <item id="2">The French law ...</item>
  <item id="3">The Italian law ...</item>
 </c>

LOCAL:

Proposal: data category for automated language processing

This data category captures information that it is acceptable to create target language content purely based on automated language processing (such as automated transliteration, or machine translation).

Rationale

Some content types, or content consumption scenarios lend themselves to fully automated language processing. Currently, the corresponding information cannot be captured.

Proposed Text

GLOBAL: The autoLanguageProcessingRule element contains the following:

A required "selector" attribute. It contains an XPath expression which selects the nodes to which this rule applies. A required "process" attribute with the values "transliteration" or "machineTranslation".

For example:

 <file>
 <its:rules xmlns:its="http://www.w3.org/200x/yy/its" version="2.0">
  <its:autoLanguageProcessingRule process="transliteration" selector="//name"/>
 </its:rules>
        <credit type="author">
           <name>Shaun</name>
           <email>shaun@example.org</name>
        </credit>
 </file> 

LOCAL: The following local markup is available:

A "autoLanguageProcessing" attribute with the values "transliteration" or "machineTranslation".

For example:

 <file>
        <credit type="author">
           <name its:autoLanguageProcessing="transliteration">Shaun</name>
           <email>shaun@example.org</name>
        </credit>
 </file>