W3C Glossary and Dictionary requirements
Here are early ideas, both technical and operational, for a project of combined
"glossary" and "dictionary" for W3C. Input goes to olivier.
This page is kept as historical background. Refer to the Glossary project page for up-to-date information.
Existing work
Here is a short list of "similar" work known to me, that I have categorized in 2 categories: "data" and "technologies".
- Data (at W3C)
- Data (provided by W3C volunteer translators or translator groups)
- Technologies
Architecture
Data Model
Since the previous draft, requirements for the data model have been clarified, mostly thanks to the pointers to the
TMF (submitted by Laurent Romary). The plan would now be to go with DXLT,
or something similar.
Technology / Back-end
Still working on it.
Lookup
The Glossary will be available in several "views".
Each of the view listed below should be available in full, "alphabetically splitted", and "one term" modes.
- Definitions
- Definitions, with details
- Translations in a given language
Contribution
As far as my analysis went, I can see four kinds of submissions:
- A whole glossary, from a spec, or from an existing activity glossary,
- A whole dictionnary, related to a spec or to W3C at large,
- an additional term,
- An additional translation for an existing term.
Contribution process
The "contribution" (input might be a better word) process will be largely dependent on which type of input is done.
- The first kind, "whole glossary", will be done mostly within W3C,
and little external contribution is expected. Those contributions are
expected to be the first ones done (to populate the database).
- The second kind might be the most difficult, given the data
model, mapping terms to translate to their actual ID does not seem
straightforward (at this stage), so automatic submission and input into
the DB is not easy to do.
However, I don't think there will be a
lot of such submissions, and the process might as well be "translators
get in touch with the glossary maintainers to work on their data and
include them in the database.
- Adding a term, or a definition for a term, would be made through a
specific interface. The "moderation" idea proposed in a former draft of
this requirements document has been abandoned for a "sign-in" system.
Rationale for this choice is:
- The system would, eventually, allow people to edit their own data
(so it needs to know who submitted what)
- There *is* a need to protect the system from bots, and the possibility of a bad behaviour.
Accountability should help limit the risks of such problems.
Project Roadmap
- Analysys of our needs (data structure, etc), already started here
- Check if existing XML grammars fit
- find appropriate software for backend
- code frontend
- Test phase
- Data import (coord with different bodies)
- call for participation (users, translators, etc)