Methods to Localize Content in Drupal 7
There are two methods available to localize Drupal content. Neither works perfectly, and both require non-core modules to be installed.
- The first ("Node sets" method) one is available since Drupal 5 and uses one node per language, and the nodes corresponding to each other have a common identifier.
- The second one ("Field Entities" method) is available since Drupal 7 and uses multilingual fields in the same node to store the translation. (It also seems to have been selected as the way for the future).
For more details on the differences see the video here: http://groups.drupal.org/node/165194
Findings from Fredrik/Yves when working on the Okapi filter for Drupal:
Field Entities Method
This method requires several non-Core modules:
(those modules may have dependencies to other non-core modules)
With the "Field Entities" method the body, summary and title entries of each node are language-specific fields.
Note: It's not clear how to translate tags, user-menus, and possibly a few other items related to content.
"value" vs "safe_value" fields
Each field seems to have a "value" (or "summary", etc.) and a "safe_value" (or "safe_summary", etc.) field in the node. The "value" field is used by the UI to let the user enter the content. It can be in different formats for the body. The "safe_value" seems to be the cleaned-up result of the "value" field. It contains full HTML code. For example paragraph elements, anchors, etc.
It is not clear yet which field should be translated. Using the "value" field as input would make the entries not completely HTML and therefore more difficult to parse as HTML. but using the "save_value" field as input and saving the modified data to "value" does not result in the same "safe_value" as expected.
Based on Ronny's input, the solution is to always translate the "value" field.
Extracting and Merging back
Access using the REST services
The Services module provides a REST API to access nodes and other resources.
Accessing nodes is not documented very well, and accessing things other than nodes (comments, tags, etc.) seems completely un-documented.
It also seems that only the source language values of the translatable fields can be updated through the API. So far we have not been able to create a new language using the API.
Another issue related to the REST access with the Services module is the speed. Extraction needs to be done node by node, and merge requires at least two node access.
Access using directly the backend database
It could be possible to access directly the underlying MySQL database.
But this would require user rights to the database, and it would be too low-level: the database may change often. The Drupal layer allows a better disconnected access to the data.
Access using a new Drupal module
One could imagine a new Drupal module that handles most of the gathering of the content needing to be translated, and provides a specialized REST API to pull and push the content.
If the API allows to get a list of nodes at once (rather than a single node), the access would be a lot faster and more scalable. The content pulled/pushed could be organized in JSON fields.
The same module could also provides (and read back) XLIFF format, but the cost of implementing this is higher. Using the native Drupal content in JSON fields would be easy to implement.