Re: Draft for review: Personal names around the world

Hi all.

I found the W3C personal names draft via my activity in the G+ nymwars
(e.g. https://plus.google.com/103112149634414554669/posts/WAu688n8JgZ
and https://plus.google.com/103112149634414554669/posts/KGn5ezKLTC4).

I'm thinking of making a simple website that, given a single Unicode
string full name as input, outputs a best guess as to:
a) the country(ies) of origin, language(s), gender, etc cultural
properties of the name
b) a logical-segment breakdown of the name (e.g. given name,
patronymic, matronymic, generation name, etc)
c) the variant forms of the name (e.g. formal, familiar,
transliterated) for use in various situations like Mark Davis
described

This is much akin to the IBM commercial product
http://www-01.ibm.com/software/data/infosphere/global-name-recognition/
— basically, I see the desire of programmers to chop up names or get
them pre-chopped and would like to provide a model implementation that
does it nonstupidly.

Does anyone know where I can find some large, computer parsable,
republishable databases of names from around the world, and/or would
any of you be interested in helping with this?

Also, in case you've not already seen it:
http://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
— not sure whether it's the sort of thing you'd link to in a public
W3C spec (it's a bit snarky), but it's fairly incisive about this
issue.

Thanks,
- Sai

P.S. Personal investment in this issue: I'm mononymic. :-P

Received on Monday, 5 September 2011 13:07:54 UTC