Okapi Use Case - Quality Check

From MultilingualWeb-LT EC Project Wiki
Jump to: navigation, search

1 Description

XML, HTML5 and XLIFF documents are loaded into CheckMate, a tool that performs various quality verifications on monolingual and bilingual documents.

The XML and HTML5 documents are extracted based on their ITS properties, and their ITS metadata are mapped into the extracted content. The XLIFF document is also extracted and its ITS-equivalent metadata also mapped.

In this demonstration, the Storage Size data category is supported in the three file formats and its information is used by CheckMate to verify if the content is too large for the given storage size.

2 Data categories

The following data categories are directly used:

  • Translate - The non-translatable content is protected.
  • Locale Filter - Only the parts in the scope of the locale filter are extracted, the others are treated as 'do not translate' content.
  • Element Within Text - The information is used to decide what elements are extracted as in-line codes and sub-flows.
  • Preserve Space - The information is used when comparing leading and trailing white space differences.
  • Id Value - The id values are used to identify the entry with an issue.
  • Storage Size - The content is verified against the storage size constraints.
  • Allowed Characters - The content is verified against the pattern matching allowed characters.

3 Benefits

  • The ITS markup provides the key information that drives the extraction in both XML and HTML5.
  • The set of ITS metadata carried in the files allows the three file formats to be handled the same way by the verification tool.

Note: additional data categories will be implemented to allow both additional types of verification, as well as to store the issues.