Re: Wrong language identification

Unfortunately, along with the fact that the language guesser sometimes
guesses wrong, it’s also not deterministic — that is, one time when you
check, it might not guess wrong, but another time it will.

I realize that’s suboptimal. But as Jukka notes, it doesn’t guess wrong
very often — but instead only a very limited number of cases. The
http://coinoter.com/miembros.html document is one such case.

  –Mike

P.S.  For anybody who’s curious, the actual language-guessing library we’re
using is https://github.com/shuyo/language-detection. There are several
alternatives we could be using instead — but after trying a number of them,
I ended up choosing that one because it hits the sweet spot as far as being
relatively small and performant while guessing quite accurately in the vast
majority of cases.

"Jukka K. Korpela" <jukkakk@gmail.com>, 2020-08-18 13:35 +0300:
> Archived-At: <https://www.w3.org/mid/CAGHxYa6Zm9x+wafFxDJp+fWVpKUQsKJdBXcnXKtwTtyEgqfu3w@mail.gmail.com>
> 
> The validator uses a heuristic language guesser, which may guess wrong, but
> not often. When I just cheched, the validator did not issue any error or
> warning message.
> 
> https://validator.w3.org/nu/?doc=http%3A%2F%2Fcoinoter.com%2Fmiembros.html
> 
> ti 18. elok. 2020 klo 12.41 Comintt Comtt <comintt@mail.com> kirjoitti:
> 
> > Dear Sirs
> >
> > The validator says the language is Portuguese when it is actually Spanish.
> >
> > Sending screenshot.
> >
> > http://coinoter.com/miembros.html
> >
> >
> >
> >
> >

-- 
Michael[tm] Smith https://people.w3.org/mike

Received on Saturday, 29 August 2020 23:25:10 UTC