W3C Glossary and Dictionary requirements

Here are early ideas, both technical and operational, for a project of combined "glossary" and "dictionary" for W3C. Input goes to olivier.

This page is kept as historical background. Refer to the Glossary project page for up-to-date information.

Existing work

Here is a short list of "similar" work known to me, that I have categorized in 2 categories: "data" and "technologies".

Data (at W3C)
Data (provided by W3C volunteer translators or translator groups)
- dictionnary (french / fr_fr) submitted by Karl Dubost
- dictionnary (german) submitted by Stefan Schmacher
- trasnlation (spanish) of WAI glossary submitted by SIDAR
- dictionnary (french / fr_ca) submitted by Richard Parent
Technologies
- Terminology Markup Framework (ISO16642, work of TC37)
- Topicmaps (not directly related but worth a look)
- lots of XML / Database related resources

Architecture

Data Model

Since the previous draft, requirements for the data model have been clarified, mostly thanks to the pointers to the TMF (submitted by Laurent Romary). The plan would now be to go with DXLT, or something similar.

Technology / Back-end

Still working on it.

Lookup

The Glossary will be available in several "views".
Each of the view listed below should be available in full, "alphabetically splitted", and "one term" modes.

Definitions
Definitions, with details
Translations in a given language

Contribution

As far as my analysis went, I can see four kinds of submissions:

A whole glossary, from a spec, or from an existing activity glossary,
A whole dictionnary, related to a spec or to W3C at large,
an additional term,
An additional translation for an existing term.

Contribution process

The "contribution" (input might be a better word) process will be largely dependent on which type of input is done.

The first kind, "whole glossary", will be done mostly within W3C, and little external contribution is expected. Those contributions are expected to be the first ones done (to populate the database).
The second kind might be the most difficult, given the data model, mapping terms to translate to their actual ID does not seem straightforward (at this stage), so automatic submission and input into the DB is not easy to do.
However, I don't think there will be a lot of such submissions, and the process might as well be "translators get in touch with the glossary maintainers to work on their data and include them in the database.
Adding a term, or a definition for a term, would be made through a specific interface. The "moderation" idea proposed in a former draft of this requirements document has been abandoned for a "sign-in" system.
Rationale for this choice is:
- The system would, eventually, allow people to edit their own data (so it needs to know who submitted what)
- There *is* a need to protect the system from bots, and the possibility of a bad behaviour. Accountability should help limit the risks of such problems.

Project Roadmap

Analysys of our needs (data structure, etc), already started here
Check if existing XML grammars fit
find appropriate software for backend
code frontend
Test phase
Data import (coord with different bodies)
call for participation (users, translators, etc)