Hunspell: about suggesting the right spelling

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Hunspell: about suggesting the right spelling

Olivier R.-2
Hi,

I would like to understand how hunspell tries to suggest the right spelling.

Here is some examples of the strange behaviour we get:


***** example 1 *****
_déterrer_ is the correct spelling of a verb ("to dig up" in English)

a. If I write: _détérer_
Hunspell suggests: déférer, détirer

b. If I write: _détèrer_ (the second accent is different)
Hunspell suggests: détirer, délétère, détourer, _déterrer_ and a lot of
others words.
The fourth word is the correct one.

But why Hunspell is able to suggest it if I write _détèrer_, but is not
able to do the same if I write détérer.

e, é and è are defined as similar characters with the line
MAP eéèêë

If I write _détêrer_, _déterrer_ is suggested at the third position.
If I write _détërer_, _déterrer_ is suggested at the second position.


***** example 2 *****
_fumer_ is the correct spelling of a verb ("to smoke" in English)

If I write: _fûmmer_
Hunspell suggests: gemmer, nommer, gommer, sommer, pommer, fermer,
frimer, former, filmer, fûtier, enflammer, emmerdé, drummer, commerce,
emmerde

There is not one word close to the right one.

It should be easy for Hunspell to suggest _fumer_ with the lines:
MAP uùûü
REP mm m

But Hunspell believes that _gemmer_ is closer to _fûmmer_ than _fumer_.
Why?


***** end of examples *****


I just don't understand how Hunspell makes suggestions.

I tried for example to remove the line KEY (see the Annex below).
With _détérer_, Hunspell suggests now a lot of words instead of 2, and
the right one (_déterrer_) is at the eighth position.
But it does not change anything else for the others wrong spelling and
for _fûmmer_.


Best regards,
Olivier

Annex: Rules about suggestions in the French affixes file:

TRY
aàâäbcçdeéèêëfghiîïjklmnoôöpqrstuùûüvwxyzæœAÀÂÄBCÇDEÉÈÊËFGHIÎÏJKLMNOÔÖPQRSTUÙÛÜVWXYZÆŒáíÿñåóşăã

MAP aàâä
MAP eéèêë
MAP iîïy
MAP oôö
MAP uùûü
MAP cç
MAP AÀÂÄ
MAP EÉÈÊË
MAP IÎÏY
MAP OÔÖ
MAP UÙÛÜ
MAP CÇ

REP f ph
REP ph f
REP c qu
REP qu c
REP k qu
REP qu k
REP x ct
REP ct x
REP bb b
REP b bb
REP cc c
REP c cc
REP ff f
REP f ff
REP ll l
REP l ll
REP mm m
REP m mm
REP nn n
REP n nn
REP pp p
REP p pp
REP rr r
REP r rr
REP ss s
REP s ss
REP ss c
REP c ss
REP ss ç
REP ç ss
REP tt t
REP t tt
REP œ oe
REP oe œ
REP æ ae
REP ae æ
REP ai é
REP é ai
REP ai è
REP è ai
REP ai ê
REP ê ai
REP ei é
REP é ei
REP ei è
REP è ei
REP ei ê
REP ê ei
REP o au
REP au o
REP o eau
REP eau o

KEY
azertyuiop|qsdfghjklmù|wxcvbn|aéz|yèu|iço|oàp|aqz|zse|edr|rft|tgy|yhu|uji|iko|olpm|qws|sxd|dcf|fvg|gbh|hnj

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell: about suggesting the right spelling

Marcin Miłkowski
Olivier R. pisze:

> Hi,
>
> I would like to understand how hunspell tries to suggest the right
> spelling.
>
> Here is some examples of the strange behaviour we get:
>
>
> ***** example 1 *****
> _déterrer_ is the correct spelling of a verb ("to dig up" in English)
>
> a. If I write: _détérer_
> Hunspell suggests: déférer, détirer
>
> b. If I write: _détèrer_ (the second accent is different)
> Hunspell suggests: détirer, délétère, détourer, _déterrer_ and a lot of
> others words.
> The fourth word is the correct one.
>
> But why Hunspell is able to suggest it if I write _détèrer_, but is not
> able to do the same if I write détérer.

if I'm not wrong, by default, the number of single letter replacement
defines the order of suggestions here. Now, two letters would have to be
changed in case a. to get the correct version; in case b. it's only one
letter. This seems to explain all other cases as well.

Regards
Marcin


> e, é and è are defined as similar characters with the line
> MAP eéèêë
>
> If I write _détêrer_, _déterrer_ is suggested at the third position.
> If I write _détërer_, _déterrer_ is suggested at the second position.
>
>
> ***** example 2 *****
> _fumer_ is the correct spelling of a verb ("to smoke" in English)
>
> If I write: _fûmmer_
> Hunspell suggests: gemmer, nommer, gommer, sommer, pommer, fermer,
> frimer, former, filmer, fûtier, enflammer, emmerdé, drummer, commerce,
> emmerde
>
> There is not one word close to the right one.
>
> It should be easy for Hunspell to suggest _fumer_ with the lines:
> MAP uùûü
> REP mm m
>
> But Hunspell believes that _gemmer_ is closer to _fûmmer_ than _fumer_.
> Why?
>
>
> ***** end of examples *****
>
>
> I just don't understand how Hunspell makes suggestions.
>
> I tried for example to remove the line KEY (see the Annex below).
> With _détérer_, Hunspell suggests now a lot of words instead of 2, and
> the right one (_déterrer_) is at the eighth position.
> But it does not change anything else for the others wrong spelling and
> for _fûmmer_.
>
>
> Best regards,
> Olivier
>
> Annex: Rules about suggestions in the French affixes file:
>
> TRY
> aàâäbcçdeéèêëfghiîïjklmnoôöpqrstuùûüvwxyzæœAÀÂÄBCÇDEÉÈÊËFGHIÎÏJKLMNOÔÖPQRSTUÙÛÜVWXYZÆŒáíÿñåóşăã
>
>
> MAP aàâä
> MAP eéèêë
> MAP iîïy
> MAP oôö
> MAP uùûü
> MAP cç
> MAP AÀÂÄ
> MAP EÉÈÊË
> MAP IÎÏY
> MAP OÔÖ
> MAP UÙÛÜ
> MAP CÇ
>
> REP f ph
> REP ph f
> REP c qu
> REP qu c
> REP k qu
> REP qu k
> REP x ct
> REP ct x
> REP bb b
> REP b bb
> REP cc c
> REP c cc
> REP ff f
> REP f ff
> REP ll l
> REP l ll
> REP mm m
> REP m mm
> REP nn n
> REP n nn
> REP pp p
> REP p pp
> REP rr r
> REP r rr
> REP ss s
> REP s ss
> REP ss c
> REP c ss
> REP ss ç
> REP ç ss
> REP tt t
> REP t tt
> REP œ oe
> REP oe œ
> REP æ ae
> REP ae æ
> REP ai é
> REP é ai
> REP ai è
> REP è ai
> REP ai ê
> REP ê ai
> REP ei é
> REP é ei
> REP ei è
> REP è ei
> REP ei ê
> REP ê ei
> REP o au
> REP au o
> REP o eau
> REP eau o
>
> KEY
> azertyuiop|qsdfghjklmù|wxcvbn|aéz|yèu|iço|oàp|aqz|zse|edr|rft|tgy|yhu|uji|iko|olpm|qws|sxd|dcf|fvg|gbh|hnj
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell: about suggesting the right spelling

Olivier R.-2
In reply to this post by Olivier R.-2
I should have added that Hunspell works perfectly if there is only one
mistake in a word.

If I write: _déterer_, _détérrer_ or _deterrer_
Hunspell suggests: _déterrer_ (correct spelling)

If I write: _fûmer_ or _fummer_
Hunspell suggests at first: _fumer_ (correct spelling)

But if there is several mistakes, the suggestions are often crazy.

Regards,
Olivier

Olivier R. a écrit :

> I would like to understand how hunspell tries to suggest the right
> spelling.
>
> Here is some examples of the strange behaviour we get:
>
>
> ***** example 1 *****
> _déterrer_ is the correct spelling of a verb ("to dig up" in English)
>
> a. If I write: _détérer_
> Hunspell suggests: déférer, détirer
>
> b. If I write: _détèrer_ (the second accent is different)
> Hunspell suggests: détirer, délétère, détourer, _déterrer_ and a lot of
> others words.
> The fourth word is the correct one.
>
> But why Hunspell is able to suggest it if I write _détèrer_, but is not
> able to do the same if I write détérer.
>
> e, é and è are defined as similar characters with the line
> MAP eéèêë
>
> If I write _détêrer_, _déterrer_ is suggested at the third position.
> If I write _détërer_, _déterrer_ is suggested at the second position.
>
>
> ***** example 2 *****
> _fumer_ is the correct spelling of a verb ("to smoke" in English)
>
> If I write: _fûmmer_
> Hunspell suggests: gemmer, nommer, gommer, sommer, pommer, fermer,
> frimer, former, filmer, fûtier, enflammer, emmerdé, drummer, commerce,
> emmerde
>
> There is not one word close to the right one.
>
> It should be easy for Hunspell to suggest _fumer_ with the lines:
> MAP uùûü
> REP mm m
>
> But Hunspell believes that _gemmer_ is closer to _fûmmer_ than _fumer_.
> Why?
>
>
> ***** end of examples *****
>
>
> I just don't understand how Hunspell makes suggestions.
>
> I tried for example to remove the line KEY (see the Annex below).
> With _détérer_, Hunspell suggests now a lot of words instead of 2, and
> the right one (_déterrer_) is at the eighth position.
> But it does not change anything else for the others wrong spelling and
> for _fûmmer_.
>
>
> Best regards,
> Olivier
>
> Annex: Rules about suggestions in the French affixes file:
>
> TRY
> aàâäbcçdeéèêëfghiîïjklmnoôöpqrstuùûüvwxyzæœAÀÂÄBCÇDEÉÈÊËFGHIÎÏJKLMNOÔÖPQRSTUÙÛÜVWXYZÆŒáíÿñåóşăã
>
>
> MAP aàâä
> MAP eéèêë
> MAP iîïy
> MAP oôö
> MAP uùûü
> MAP cç
> MAP AÀÂÄ
> MAP EÉÈÊË
> MAP IÎÏY
> MAP OÔÖ
> MAP UÙÛÜ
> MAP CÇ
>
> REP f ph
> REP ph f
> REP c qu
> REP qu c
> REP k qu
> REP qu k
> REP x ct
> REP ct x
> REP bb b
> REP b bb
> REP cc c
> REP c cc
> REP ff f
> REP f ff
> REP ll l
> REP l ll
> REP mm m
> REP m mm
> REP nn n
> REP n nn
> REP pp p
> REP p pp
> REP rr r
> REP r rr
> REP ss s
> REP s ss
> REP ss c
> REP c ss
> REP ss ç
> REP ç ss
> REP tt t
> REP t tt
> REP œ oe
> REP oe œ
> REP æ ae
> REP ae æ
> REP ai é
> REP é ai
> REP ai è
> REP è ai
> REP ai ê
> REP ê ai
> REP ei é
> REP é ei
> REP ei è
> REP è ei
> REP ei ê
> REP ê ei
> REP o au
> REP au o
> REP o eau
> REP eau o
>
> KEY
> azertyuiop|qsdfghjklmù|wxcvbn|aéz|yèu|iço|oàp|aqz|zse|edr|rft|tgy|yhu|uji|iko|olpm|qws|sxd|dcf|fvg|gbh|hnj
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--

== N'écrivez pas à cette adresse. Réservée aux listes de discussion. ==
** Do not reply at this address. Mailing-list only. **

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell: about suggesting the right spelling

Olivier R.-2
In reply to this post by Marcin Miłkowski
Hi,

Marcin Miłkowski a écrit :

> if I'm not wrong, by default, the number of single letter replacement
> defines the order of suggestions here. Now, two letters would have to be
> changed in case a. to get the correct version; in case b. it's only one
> letter. This seems to explain all other cases as well.

In all the cases of example 1, the second e with an accent must be
changed in an e with no accents, and one r is missing.

Correct spelling: déterrer

             Hunspell suggestions:
détérer --> déterrer is not suggested (8th position if line KEY removed)
détèrer --> déterrer is at 4th position
détêrer --> déterrer is at 3rd position
détërer --> déterrer is at 2nd position
    ^^
    ||
    |`-----> one r is missing
    |
    `------> should be e


In the affixes file:
MAP eéèêë
REP r rr
REP rr r


Is that not strange?


Regards,
Olivier

--

== N'écrivez pas à cette adresse. Réservée aux listes de discussion. ==
** Do not reply at this address. Mailing-list only. **

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell: about suggesting the right spelling

Marcin Miłkowski
Olivier R. pisze:

> Hi,
>
> Marcin Miłkowski a écrit :
>
>> if I'm not wrong, by default, the number of single letter replacement
>> defines the order of suggestions here. Now, two letters would have to
>> be changed in case a. to get the correct version; in case b. it's only
>> one letter. This seems to explain all other cases as well.
>
> In all the cases of example 1, the second e with an accent must be
> changed in an e with no accents, and one r is missing.

That makes two letters.

> Correct spelling: déterrer
>
>             Hunspell suggestions:
> détérer --> déterrer is not suggested (8th position if line KEY removed)
> détèrer --> déterrer is at 4th position
> détêrer --> déterrer is at 3rd position
> détërer --> déterrer is at 2nd position
>    ^^
>    ||
>    |`-----> one r is missing
>    |
>    `------> should be e

In all these cases, two letters must be replaced. In terms of
Levenshtein distance (the standard measure of the difference between
strings), the correct form is actually a "worse" suggestion than other
forms that require a change of only one letter. Of course, the space of
corrections is not as uniform as Levenshtein suggests, so some changes
should be given preference to others. I don't know how to do that
besides changing the TRY line and REPs.

I'm only saying that it looks as if the Levenshtein distance was used
but there surely is another way to find better suggestions. Anyone?

Regards
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell: about suggesting the right spelling

thomas.lange
In reply to this post by Olivier R.-2

Hi all,

Marcin Miłkowski wrote:

...

>> Correct spelling: déterrer
>>
>>             Hunspell suggestions:
>> détérer --> déterrer is not suggested (8th position if line KEY removed)
>> détèrer --> déterrer is at 4th position
>> détêrer --> déterrer is at 3rd position
>> détërer --> déterrer is at 2nd position
>>    ^^
>>    ||
>>    |`-----> one r is missing
>>    |
>>    `------> should be e
>
> In all these cases, two letters must be replaced. In terms of
> Levenshtein distance (the standard measure of the difference between
> strings), the correct form is actually a "worse" suggestion than other
> forms that require a change of only one letter. Of course, the space of
> corrections is not as uniform as Levenshtein suggests, so some changes
> should be given preference to others. I don't know how to do that
> besides changing the TRY line and REPs.
>
> I'm only saying that it looks as if the Levenshtein distance was used
> but there surely is another way to find better suggestions. Anyone?
>

For a reference:
http://en.wikipedia.org/wiki/Levenshtein_distance

Befor looking for a completely different solution I'd like to try
experimenting with a slight modification of that algorithm

Usually the Levenshtein distance is calculated by counting changes with
the following weight (as in the link above)
(A)
  - adding a character:   +1
  - deleting a character: +1
  - changing a character: +1
But sometimes I have seen weights that prefer changes over insertions
and deletions, in those cases the weights were like this:
(B)
  - adding a character:   +2
  - deleting a character: +1
  - changing a character: +2

Because of this, and since the actual problem is only with getting a
better proposal if the character differs only in its 'decoration' I'd
like to suggest trying the following idea: shifting the weights.

Provided hunspell uses weights like this (the actual values do not matter!)
  - adding a character:   +A
  - deleting a character: +D
  - changing a character: +C
then the weights should be calculated like this instead
  - adding a character:   +2*A
  - deleting a character: +2*D
  - changing a character: +2*C, if the characters differ not just by
'decoration'
  - changing a character: +C, if the characters differ *only in* the
decoration

That way changes like é to c will have double weight and changes like e
to é will have only single weight. Thus the latter changes should be
preferable compared to other changes and therefor the respective
suggestions being higher up in the list of proposals.

Obviously the suggestion mechanism now needs to accept words of twice
the Levenshtein distance than before.


Regards,
Thomas




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell: about suggesting the right spelling

thomas.lange
In reply to this post by Olivier R.-2

One more comment:

> For a reference:
> http://en.wikipedia.org/wiki/Levenshtein_distance
>
> Befor looking for a completely different solution I'd like to try
> experimenting with a slight modification of that algorithm
>
> Usually the Levenshtein distance is calculated by counting changes with
> the following weight (as in the link above)
> (A)
>   - adding a character:   +1
>   - deleting a character: +1
>   - changing a character: +1
> But sometimes I have seen weights that prefer changes over insertions
> and deletions, in those cases the weights were like this:
> (B)
>   - adding a character:   +2
>   - deleting a character: +1
>   - changing a character: +2
>
> Because of this, and since the actual problem is only with getting a
> better proposal if the character differs only in its 'decoration' I'd
> like to suggest trying the following idea: shifting the weights.
>
> Provided hunspell uses weights like this (the actual values do not matter!)
>   - adding a character:   +A
>   - deleting a character: +D
>   - changing a character: +C
> then the weights should be calculated like this instead
>   - adding a character:   +2*A
>   - deleting a character: +2*D
>   - changing a character: +2*C, if the characters differ not just by
> 'decoration'
>   - changing a character: +C, if the characters differ *only in* the
> decoration
>
> That way changes like é to c will have double weight and changes like e
> to é will have only single weight. Thus the latter changes should be
> preferable compared to other changes and therefor the respective
> suggestions being higher up in the list of proposals.
>
> Obviously the suggestion mechanism now needs to accept words of twice
> the Levenshtein distance than before.

If that idea basically proves to be functional but not good enough you
may try two more modifications:

a) use a even higher standard weight
   (that is use +5*A instead if +2*A, ...)
b) you may consider changes like the ones in decoration to be completely
   neglectable and give them a weight of 0.
   But since I'm not aware how a weight of 0 might actually impact the
   reliability of the algorithm one needs to verify that the algorithm
   will still be fine with a weight of 0

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell: about suggesting the right spelling

Olivier R.-2
In reply to this post by thomas.lange
Thomas Lange - Sun Germany - ham02 - Hamburg a écrit :

> Because of this, and since the actual problem is only with getting a
> better proposal if the character differs only in its 'decoration' I'd
> like to suggest trying the following idea: shifting the weights.
>
> Provided hunspell uses weights like this (the actual values do not matter!)
>   - adding a character:   +A
>   - deleting a character: +D
>   - changing a character: +C
> then the weights should be calculated like this instead
>   - adding a character:   +2*A
>   - deleting a character: +2*D
>   - changing a character: +2*C, if the characters differ not just by
> 'decoration'
>   - changing a character: +C, if the characters differ *only in* the
> decoration
>
> That way changes like é to c will have double weight and changes like e
> to é will have only single weight. Thus the latter changes should be
> preferable compared to other changes and therefor the respective
> suggestions being higher up in the list of proposals.

I don't know if it's feasible, but we could also consider that adding or
removing a letter has only one weight if a line REP says so.

Examples:
REP r rr
REP rr r

r --> rr     weight:1
rr --> r     weight:1
r --> f      weight:2 or more
fr --> f     weight:2 or more

Actually, what I want to know is why the lines MAP, which usually
describe which letters are similar (with diacritics or "decorations"),
and the lines REP (which usually describe common replacements) seem to
be ignored when the distance goes beyond 1.

Imho, these lines offer a better way to get a correct spelling than
simply calculating the Levenshtein distance, however we calculate it.

I have a suggestion:
Maybe it would improve the spellchecker suggestions if it tried first to
apply the rules MAP and REP, without calculating anything. And if it
does not find anything, try again with the Levenshtein distance.

Exemple:
_gommer_ and _fumer_ are both at a Levenshtein distance of 2 from
_fûmmer_ (wrong spelling), but Hunspell could find the correct spelling
_fumer_, just by shifting letters as described in lines MAP and REP.

Regards,
Olivier

--

== N'écrivez pas à cette adresse. Réservée aux listes de discussion. ==
** Do not reply at this address. Mailing-list only. **

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell: about suggesting the right spelling

Németh László-2
In reply to this post by Olivier R.-2
Hi,

2009/2/24 Olivier R. <[hidden email]>:
> Hi,
>
> I would like to understand how hunspell tries to suggest the right spelling.

It uses a mix of different suggestion algorithms (some of them are
dictionary based).
The base TRY algorithm searches all suggestions with 1 Levenshtein
distance from the misspelled word.

>
> Here is some examples of the strange behaviour we get:
>
>
> ***** example 1 *****
> _déterrer_ is the correct spelling of a verb ("to dig up" in English)
>
> a. If I write: _détérer_
> Hunspell suggests: déférer, détirer
>
> b. If I write: _détèrer_ (the second accent is different)
> Hunspell suggests: détirer, délétère, détourer, _déterrer_ and a lot of
> others words.
> The fourth word is the correct one.
>
> But why Hunspell is able to suggest it if I write _détèrer_, but is not able
> to do the same if I write détérer.

In the case of the successful substitution by the TRY algorithm, there
is no dictionary based search.
The chief reason is the time efficiency, so the future versions of
Hunspell won't contain this limitation.
In fact, next Hunspell in OOo uses dictionary based search despite of
the successful TRY suggestions, when these TRY suggestions contain
only deletions and insertions:

$ ~/hunspell-1.1.12/src/tools/hunspell -d fr_FR
Hunspell 1.1.12
dééterrer
& dééterrer 1 0: déterrer

éterrer
& éterrer 2 0: terrer, déterrer

$ ~/hunspell-1.2.8/src/tools/hunspell -d fr_FR
Hunspell 1.2.8
dééterrer
& dééterrer 5 0: déterrer, déterreur, déterrement, déterrage, déterrée

éterrer
& éterrer 6 0: terrer, déterrer, déterreur, éternuer, éterniser, éternelle

>
> e, é and è are defined as similar characters with the line
> MAP eéèêë
>
> If I write _détêrer_, _déterrer_ is suggested at the third position.
> If I write _détërer_, _déterrer_ is suggested at the second position.
>
>
> ***** example 2 *****
> _fumer_ is the correct spelling of a verb ("to smoke" in English)
>
> If I write: _fûmmer_
> Hunspell suggests: gemmer, nommer, gommer, sommer, pommer, fermer, frimer,
> former, filmer, fûtier, enflammer, emmerdé, drummer, commerce, emmerde
>
> There is not one word close to the right one.
>
> It should be easy for Hunspell to suggest _fumer_ with the lines:
> MAP uùûü
> REP mm m
>
> But Hunspell believes that _gemmer_ is closer to _fûmmer_ than _fumer_.
> Why?

Unfortunately, MAP and REP data haven't used by the dictionary based
suggestion algorithm yet, so û is a quite different character for the
n-gram dictionary based suggestion algorithm, also words with "mm"
have greater n-gram values here, than words with "me". The long n-gram
value of gemmer etc., and the equal word length and characters in the
same character positions of fermer, etc. wins.

Using PHONE could help here, but the PHONE algorithm doesn't support
accented characters in the recent Hunspell version. I hope, this will
be fixed within a few months. Also

Best regards,
László

>
>
> ***** end of examples *****
>
>
> I just don't understand how Hunspell makes suggestions.
>
> I tried for example to remove the line KEY (see the Annex below).
> With _détérer_, Hunspell suggests now a lot of words instead of 2, and the
> right one (_déterrer_) is at the eighth position.
> But it does not change anything else for the others wrong spelling and for
> _fûmmer_.
>
>
> Best regards,
> Olivier
>
> Annex: Rules about suggestions in the French affixes file:
>
> TRY
> aàâäbcçdeéèêëfghiîïjklmnoôöpqrstuùûüvwxyzæœAÀÂÄBCÇDEÉÈÊËFGHIÎÏJKLMNOÔÖPQRSTUÙÛÜVWXYZÆŒáíÿñåóşăã
>
> MAP aàâä
> MAP eéèêë
> MAP iîïy
> MAP oôö
> MAP uùûü
> MAP cç
> MAP AÀÂÄ
> MAP EÉÈÊË
> MAP IÎÏY
> MAP OÔÖ
> MAP UÙÛÜ
> MAP CÇ
>
> REP f ph
> REP ph f
> REP c qu
> REP qu c
> REP k qu
> REP qu k
> REP x ct
> REP ct x
> REP bb b
> REP b bb
> REP cc c
> REP c cc
> REP ff f
> REP f ff
> REP ll l
> REP l ll
> REP mm m
> REP m mm
> REP nn n
> REP n nn
> REP pp p
> REP p pp
> REP rr r
> REP r rr
> REP ss s
> REP s ss
> REP ss c
> REP c ss
> REP ss ç
> REP ç ss
> REP tt t
> REP t tt
> REP œ oe
> REP oe œ
> REP æ ae
> REP ae æ
> REP ai é
> REP é ai
> REP ai è
> REP è ai
> REP ai ê
> REP ê ai
> REP ei é
> REP é ei
> REP ei è
> REP è ei
> REP ei ê
> REP ê ei
> REP o au
> REP au o
> REP o eau
> REP eau o
>
> KEY
> azertyuiop|qsdfghjklmù|wxcvbn|aéz|yèu|iço|oàp|aqz|zse|edr|rft|tgy|yhu|uji|iko|olpm|qws|sxd|dcf|fvg|gbh|hnj
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]