hunspell dictionary extension by Google

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

hunspell dictionary extension by Google

Daniel Naber-9
Hi,

Google uses hunspell in its Chrome browser. I coincidentally found they try
to extend the dictionary with new words. These extensions might be
interesting for the original authors. Look for ".dic_delta" files here:

http://src.chromium.org/viewvc/chrome/trunk/src/chrome/third_party/hunspell/dictionaries/

More information:
http://blog.chromium.org/2009/02/spell-check-dictionary-improvements.html

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

ge-7
There are 80470 German words there.
However, when I download them, all non-ascii chars
get translated into the same character.
Both from screen and by download.

Anyone had success to get them properly downloaded?

thanks, eleonora


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

Daniel Naber-9
In reply to this post by Daniel Naber-9
On Sunday 15 February 2009, ge wrote:

> Anyone had success to get them properly downloaded?

I can download them using the "(download)" link. Some files (e.g.
de_DE.dic) are in Latin1, I can recode them to UTF-8 and they look okay.
I'm not sure if it makes sense to look at the .dic files, all changes are
supposed to be in the _delta files.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

ge-7
In reply to this post by Daniel Naber-9
Thanks, now I could download them properly.

-eleonora


I can download them using the "(download)" link. Some files (e.g.
de_DE.dic) are in Latin1, I can recode them to UTF-8 and they look okay.
I'm not sure if it makes sense to look at the .dic files, all changes are
supposed to be in the _delta files.

Regards
 Daniel



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

Marcin Miłkowski
In reply to this post by Daniel Naber-9
Hi,

note for other dictionary developers: in case of some languages, they
seem to use antiquated versions. For Polish, the dictionary seems to be
indeed veeery old (misses some 40 thousand entries from our current
release, so delta files are pretty mostly useless). So you might want to
ping them to include newer versions.

Regards
Marcin

Daniel Naber pisze:

> Hi,
>
> Google uses hunspell in its Chrome browser. I coincidentally found they try
> to extend the dictionary with new words. These extensions might be
> interesting for the original authors. Look for ".dic_delta" files here:
>
> http://src.chromium.org/viewvc/chrome/trunk/src/chrome/third_party/hunspell/dictionaries/
>
> More information:
> http://blog.chromium.org/2009/02/spell-check-dictionary-improvements.html
>
> Regards
>  Daniel
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

thomas.lange
In reply to this post by Daniel Naber-9

Hi,

Daniel Naber wrote:

> Hi,
>
> Google uses hunspell in its Chrome browser. I coincidentally found they try
> to extend the dictionary with new words. These extensions might be
> interesting for the original authors. Look for ".dic_delta" files here:
>
> http://src.chromium.org/viewvc/chrome/trunk/src/chrome/third_party/hunspell/dictionaries/
>
> More information:
> http://blog.chromium.org/2009/02/spell-check-dictionary-improvements.html

Looking at two of the delta files it seems like they are simply a list
off add-on entries to hunspell. Thus even chrome does not seem to have
the ability to remove/fix misspelled words from the dictionary
content... (unless they get removed by a patch)

As for the .bdic file, I have no idea about that one. ^^°


I still like the old idea to replace the OOo user-dictionaries by
hunspell dictionaries (or at least a format that hunspell can read).
But that would require hunspell to read 'negative dictionaries' (often
called exception dictionaries) and to provide user-supplied suggestions
for those entries. If we then can also have means for a 'Language All'
dictionary then we could replace the user-dictionaries by hunspell
compatible ones, and that would be a nice thing to do I believe.

Does someone feel tempted? ^_-


Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

ge-7
> Looking at two of the delta files it seems like they are simply a list
> off add-on entries to hunspell. Thus even chrome does not seem to have
> the ability to remove/fix misspelled words from the dictionary
> content... (unless they get removed by a patch)

Removing misspelled words must always remain under
control of dictionary maintainer for obvious reasons.
And also the adding of missing words.

> As for the .bdic file, I have no idea about that one. ^^°

I assume, some internal google format, like aspell's internal
format. We can safely ignore that.

> I still like the old idea to replace the OOo user-dictionaries by
> hunspell dictionaries (or at least a format that hunspell can read).
> But that would require hunspell to read 'negative dictionaries' (often
> called exception dictionaries) and to provide user-supplied suggestions
> for those entries.

Hunspell does a good job at suggestions. (I personally never
use them, but others, who use them, state the above.) Hunspell
does not need any "bad word list" to list the suggestions,
it relies on the replacement patterns in the affix file.
Could you please give an example, to understand what you mean here?

>If we then can also have means for a 'Language All'
> dictionary then we could replace the user-dictionaries by hunspell
> compatible ones, and that would be a nice thing to do I believe.

Please explain, what do you mean with "language all" dictionary,
best with some examples.

Thanks, eleonora
--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

thomas.lange
In reply to this post by Daniel Naber-9

Hi,

[hidden email] wrote:

>> Looking at two of the delta files it seems like they are simply a list
>> off add-on entries to hunspell. Thus even chrome does not seem to have
>> the ability to remove/fix misspelled words from the dictionary
>> content... (unless they get removed by a patch)
>
> Removing misspelled words must always remain under
> control of dictionary maintainer for obvious reasons.
> And also the adding of missing words.
>
>> As for the .bdic file, I have no idea about that one. ^^°
>
> I assume, some internal google format, like aspell's internal
> format. We can safely ignore that.
>
>> I still like the old idea to replace the OOo user-dictionaries by
>> hunspell dictionaries (or at least a format that hunspell can read).
>> But that would require hunspell to read 'negative dictionaries' (often
>> called exception dictionaries) and to provide user-supplied suggestions
>> for those entries.
>
> Hunspell does a good job at suggestions. (I personally never
> use them, but others, who use them, state the above.) Hunspell
> does not need any "bad word list" to list the suggestions,
> it relies on the replacement patterns in the affix file.
> Could you please give an example, to understand what you mean here?


Hunspell may not have need for that, but users have. ^_-

There are two basic usages:


- First: exception dictionaries usually consist of correct words that
you don't like to use in your text or context for some reason.

a) please consider writing a fairy tale for children to read, there are
a lot of words in regular English that you don't want to appear in
there. (Though for that we may better have an English-Child-Safe
dictionary). But it could also be done by a larger exception dictionary.

b) You (or your company) may have a list of words that you are not to
use in your public documents.
Or maybe of two possible and valid choices you still want to use only
one. For example in German, according to the latest spelling reform, we
can either write dolphin as 'Delfin' or 'Delphin' both are valid, but
you don't want them both to appear in a single text. One way to solve
this is to declare one of them as an exception (and to provide the other
as suggestion).
Those words can then be added to an exception dictionary and hence forth
the spell checker should complain about them.


- Second: It allows the user to customize the spelling suggestions.

If for example you tend to make the typo 'rigth' then you could add that
word to an exception dictionary and by providing only a single
suggestion ('right') one would expect the spell checker to return onyl
that one (and none from it's dictionary base) or at least to put that
single word at the top of the suggestion list.

And of course you should be allowed to make more than one suggestion
(OOo currently does not allow for that though), and again the list
should replace the list returned by hunspell or hunspell should add that
word list at the top of the words itself has found.


>>If we then can also have means for a 'Language All'
>> dictionary then we could replace the user-dictionaries by hunspell
>> compatible ones, and that would be a nice thing to do I believe.
>
> Please explain, what do you mean with "language all" dictionary,
> best with some examples.

A 'Language All' dictionary will be a list of words that are correct
that way in ALL languages (usually because they won't get translated).
Common examples are peoples or company names.
E.g.
  OpenOffice.org
  ASCII
  HTML
  Thomas
  Alva
  Edison
If you are writing multilingual documents or if you have a server
installation with a number of multi lingual users, you can add all those
words that would be spelled the same regardless of the texts language in
a single dictionary instead of creating a dictionary for each of those
languages.
And then, for every language and word the spell checker has always to
look up into those dictionaries of 'Language All' as well before
deciding to declare a word as misspelled.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

ge-7
Thomas,

Thanks for the very clear explanations.

> - First: exception dictionaries usually consist of correct words that
> you don't like to use in your text or context for some reason.
>
> a) please consider writing a fairy tale for children to read, there are
> a lot of words in regular English that you don't want to appear in
> there. (Though for that we may better have an English-Child-Safe
> dictionary). But it could also be done by a larger exception dictionary.
>
> b) You (or your company) may have a list of words that you are not to
> use in your public documents.
> Or maybe of two possible and valid choices you still want to use only
> one. For example in German, according to the latest spelling reform, we
> can either write dolphin as 'Delfin' or 'Delphin' both are valid, but
> you don't want them both to appear in a single text. One way to solve
> this is to declare one of them as an exception (and to provide the other
> as suggestion).
> Those words can then be added to an exception dictionary and hence forth
> the spell checker should complain about them.

I think, that could be fixed, if hunspell was able to read
in more than one dictionary at speller class initialization
time, or even better, while in work. Then arbitrary user
dictionaries could be enabled, that could inhibit certain words
from being shown as good ones or add certain words as good ones.
Then it would be up to the dictionary provider's phantasy,
what he adds.

I do not know, how László sees this, he might have some comments
about this.
 

> - Second: It allows the user to customize the spelling suggestions.
>
> If for example you tend to make the typo 'rigth' then you could add that
> word to an exception dictionary and by providing only a single
> suggestion ('right') one would expect the spell checker to return onyl
> that one (and none from it's dictionary base) or at least to put that
> single word at the top of the suggestion list.
>
> And of course you should be allowed to make more than one suggestion
> (OOo currently does not allow for that though), and again the list
> should replace the list returned by hunspell or hunspell should add that
> word list at the top of the words itself has found.

Understood, no idea here. László knows this very well, he might want
to comment this also.

> >>If we then can also have means for a 'Language All'
> >> dictionary then we could replace the user-dictionaries by hunspell
> >> compatible ones, and that would be a nice thing to do I believe.
> >
> > Please explain, what do you mean with "language all" dictionary,
> > best with some examples.
>
> A 'Language All' dictionary will be a list of words that are correct
> that way in ALL languages (usually because they won't get translated).
> Common examples are peoples or company names.
> E.g.
>   OpenOffice.org
>   ASCII
>   HTML
>   Thomas
>   Alva
>   Edison
> If you are writing multilingual documents or if you have a server
> installation with a number of multi lingual users, you can add all those
> words that would be spelled the same regardless of the texts language in
> a single dictionary instead of creating a dictionary for each of those
> languages.
> And then, for every language and word the spell checker has always to
> look up into those dictionaries of 'Language All' as well before
> deciding to declare a word as misspelled.

Yes, that is also a nice suggestion, and could be added to
the first request, since an additional dictionary would solve it.

For this, however please consider, that even German flektates
words, so for example Edison should be able also recognized
as Edisons in German.

For Hungarian (or Turkish, Finnish, Estonian, Basque, Persian, etc...)
the situation  is more sharp, because Edison has roughly
2500 derivates in Hungarian, therefore if Edison needs to be recognized
as a correct word in Hungarian, it is far more productive to add
that word to the Húngarian .dic list with the proper affix list.

Also some German or Danish cities are for example different
from the German or Danish pronounciation in Hungarian for
historical reasons. Therefore a German city names list is not usable
in Hungarian.

Regards: eleonora
--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

Németh László-2
Hi,

I just recognised the new user dictionary UI of OpenOffice.org 3. I
have an old issue about solving of the user dictionary problems
(http://qa.openoffice.org/issues/show_bug.cgi?id=61525), so I'm
interesting in it. Hunspell will be able to handle the exception
format and its suggestion, also the Language All option, too. I will
write about the tasks and their possible solutions by Hunspell.

Regards,
László

2009/2/16  <[hidden email]>:

> Thomas,
>
> Thanks for the very clear explanations.
>
>> - First: exception dictionaries usually consist of correct words that
>> you don't like to use in your text or context for some reason.
>>
>> a) please consider writing a fairy tale for children to read, there are
>> a lot of words in regular English that you don't want to appear in
>> there. (Though for that we may better have an English-Child-Safe
>> dictionary). But it could also be done by a larger exception dictionary.
>>
>> b) You (or your company) may have a list of words that you are not to
>> use in your public documents.
>> Or maybe of two possible and valid choices you still want to use only
>> one. For example in German, according to the latest spelling reform, we
>> can either write dolphin as 'Delfin' or 'Delphin' both are valid, but
>> you don't want them both to appear in a single text. One way to solve
>> this is to declare one of them as an exception (and to provide the other
>> as suggestion).
>> Those words can then be added to an exception dictionary and hence forth
>> the spell checker should complain about them.
>
> I think, that could be fixed, if hunspell was able to read
> in more than one dictionary at speller class initialization
> time, or even better, while in work. Then arbitrary user
> dictionaries could be enabled, that could inhibit certain words
> from being shown as good ones or add certain words as good ones.
> Then it would be up to the dictionary provider's phantasy,
> what he adds.
>
> I do not know, how László sees this, he might have some comments
> about this.
>
>> - Second: It allows the user to customize the spelling suggestions.
>>
>> If for example you tend to make the typo 'rigth' then you could add that
>> word to an exception dictionary and by providing only a single
>> suggestion ('right') one would expect the spell checker to return onyl
>> that one (and none from it's dictionary base) or at least to put that
>> single word at the top of the suggestion list.
>>
>> And of course you should be allowed to make more than one suggestion
>> (OOo currently does not allow for that though), and again the list
>> should replace the list returned by hunspell or hunspell should add that
>> word list at the top of the words itself has found.
>
> Understood, no idea here. László knows this very well, he might want
> to comment this also.
>
>> >>If we then can also have means for a 'Language All'
>> >> dictionary then we could replace the user-dictionaries by hunspell
>> >> compatible ones, and that would be a nice thing to do I believe.
>> >
>> > Please explain, what do you mean with "language all" dictionary,
>> > best with some examples.
>>
>> A 'Language All' dictionary will be a list of words that are correct
>> that way in ALL languages (usually because they won't get translated).
>> Common examples are peoples or company names.
>> E.g.
>>   OpenOffice.org
>>   ASCII
>>   HTML
>>   Thomas
>>   Alva
>>   Edison
>> If you are writing multilingual documents or if you have a server
>> installation with a number of multi lingual users, you can add all those
>> words that would be spelled the same regardless of the texts language in
>> a single dictionary instead of creating a dictionary for each of those
>> languages.
>> And then, for every language and word the spell checker has always to
>> look up into those dictionaries of 'Language All' as well before
>> deciding to declare a word as misspelled.
>
> Yes, that is also a nice suggestion, and could be added to
> the first request, since an additional dictionary would solve it.
>
> For this, however please consider, that even German flektates
> words, so for example Edison should be able also recognized
> as Edisons in German.
>
> For Hungarian (or Turkish, Finnish, Estonian, Basque, Persian, etc...)
> the situation  is more sharp, because Edison has roughly
> 2500 derivates in Hungarian, therefore if Edison needs to be recognized
> as a correct word in Hungarian, it is far more productive to add
> that word to the Húngarian .dic list with the proper affix list.
>
> Also some German or Danish cities are for example different
> from the German or Danish pronounciation in Hungarian for
> historical reasons. Therefore a German city names list is not usable
> in Hungarian.
>
> Regards: eleonora
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

R.J. Baars
In reply to this post by Daniel Naber-9
Laszlo, all,

The use of the 'all'-dictionary will require validation by all language
teams.  And, as mentioned before, the flexes could quite different.

We might need a way to validate these words for all languages. Or a way,
as a language, to adopt the 'all' or not.

Limiting it to only proper names will reduce complexity.
But still, how will we check validity of the words for all languages, or
will we add a 'validity for' flag to the word specifying the language(s)
it is for?


> Hi,
>
> I just recognised the new user dictionary UI of OpenOffice.org 3. I
> have an old issue about solving of the user dictionary problems
> (http://qa.openoffice.org/issues/show_bug.cgi?id=61525), so I'm
> interesting in it. Hunspell will be able to handle the exception
> format and its suggestion, also the Language All option, too. I will
> write about the tasks and their possible solutions by Hunspell.
>
> Regards,
> László
>
> 2009/2/16  <[hidden email]>:
>> Thomas,
>>
>> Thanks for the very clear explanations.
>>
>>> - First: exception dictionaries usually consist of correct words that
>>> you don't like to use in your text or context for some reason.
>>>
>>> a) please consider writing a fairy tale for children to read, there are
>>> a lot of words in regular English that you don't want to appear in
>>> there. (Though for that we may better have an English-Child-Safe
>>> dictionary). But it could also be done by a larger exception
>>> dictionary.
>>>
>>> b) You (or your company) may have a list of words that you are not to
>>> use in your public documents.
>>> Or maybe of two possible and valid choices you still want to use only
>>> one. For example in German, according to the latest spelling reform, we
>>> can either write dolphin as 'Delfin' or 'Delphin' both are valid, but
>>> you don't want them both to appear in a single text. One way to solve
>>> this is to declare one of them as an exception (and to provide the
>>> other
>>> as suggestion).
>>> Those words can then be added to an exception dictionary and hence
>>> forth
>>> the spell checker should complain about them.
>>
>> I think, that could be fixed, if hunspell was able to read
>> in more than one dictionary at speller class initialization
>> time, or even better, while in work. Then arbitrary user
>> dictionaries could be enabled, that could inhibit certain words
>> from being shown as good ones or add certain words as good ones.
>> Then it would be up to the dictionary provider's phantasy,
>> what he adds.
>>
>> I do not know, how László sees this, he might have some comments
>> about this.
>>
>>> - Second: It allows the user to customize the spelling suggestions.
>>>
>>> If for example you tend to make the typo 'rigth' then you could add
>>> that
>>> word to an exception dictionary and by providing only a single
>>> suggestion ('right') one would expect the spell checker to return onyl
>>> that one (and none from it's dictionary base) or at least to put that
>>> single word at the top of the suggestion list.
>>>
>>> And of course you should be allowed to make more than one suggestion
>>> (OOo currently does not allow for that though), and again the list
>>> should replace the list returned by hunspell or hunspell should add
>>> that
>>> word list at the top of the words itself has found.
>>
>> Understood, no idea here. László knows this very well, he might want
>> to comment this also.
>>
>>> >>If we then can also have means for a 'Language All'
>>> >> dictionary then we could replace the user-dictionaries by hunspell
>>> >> compatible ones, and that would be a nice thing to do I believe.
>>> >
>>> > Please explain, what do you mean with "language all" dictionary,
>>> > best with some examples.
>>>
>>> A 'Language All' dictionary will be a list of words that are correct
>>> that way in ALL languages (usually because they won't get translated).
>>> Common examples are peoples or company names.
>>> E.g.
>>>   OpenOffice.org
>>>   ASCII
>>>   HTML
>>>   Thomas
>>>   Alva
>>>   Edison
>>> If you are writing multilingual documents or if you have a server
>>> installation with a number of multi lingual users, you can add all
>>> those
>>> words that would be spelled the same regardless of the texts language
>>> in
>>> a single dictionary instead of creating a dictionary for each of those
>>> languages.
>>> And then, for every language and word the spell checker has always to
>>> look up into those dictionaries of 'Language All' as well before
>>> deciding to declare a word as misspelled.
>>
>> Yes, that is also a nice suggestion, and could be added to
>> the first request, since an additional dictionary would solve it.
>>
>> For this, however please consider, that even German flektates
>> words, so for example Edison should be able also recognized
>> as Edisons in German.
>>
>> For Hungarian (or Turkish, Finnish, Estonian, Basque, Persian, etc...)
>> the situation  is more sharp, because Edison has roughly
>> 2500 derivates in Hungarian, therefore if Edison needs to be recognized
>> as a correct word in Hungarian, it is far more productive to add
>> that word to the Húngarian .dic list with the proper affix list.
>>
>> Also some German or Danish cities are for example different
>> from the German or Danish pronounciation in Hungarian for
>> historical reasons. Therefore a German city names list is not usable
>> in Hungarian.
>>
>> Regards: eleonora
>> --
>> Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
>> http://www.gmx.net/de/go/multimessenger01
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

Olivier R.-2
Hi,

I think the "language all" dictionary will be a nest of issues. :)

In French, we often rename foreign proper names.
You can't be even sure that a company name will be the same.
Vauxhall cars in UK are called Opel in France.

Examples:

London               --> Londres
Frankfurt            --> Francfort
Edinburgh            --> Édimbourg
AIDS                 --> SIDA
Vauxhall             --> Opel
Christopher Columbus --> Christophe Colomb

On the contrary to some others languages, there is no flexions for
proper names in French.
For example, there is no 's' at the end of the plural form.
The Edisons          --> les Edison
So, if someone write "les Edisons", the mistake won't be recognized due
to this "language all" dictionary, if flexions of proper names are
allowed by it.

This "all language" dictionary would probably interfere with others
words, and the spellchecker would suggest a wrong spelling for a word
similar to thoses in the "all language" dictionary.


I am also afraid it will be an english-american-centered culture dictionary.

In the few examples given, there is already one entry I don't know what
it is supposed to be.

What/who is Alva? A skateboarder?


Regards,
Olivier

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

Harold Fuchs-6
On 17/02/2009 18:04, Olivier R. wrote:

> Hi,
>
> I think the "language all" dictionary will be a nest of issues. :)
>
> In French, we often rename foreign proper names.
> You can't be even sure that a company name will be the same.
> Vauxhall cars in UK are called Opel in France.
>
> Examples:
>
> London               --> Londres
> Frankfurt            --> Francfort
> Edinburgh            --> Édimbourg
> AIDS                 --> SIDA
> Vauxhall             --> Opel
> Christopher Columbus --> Christophe Colomb
>
> On the contrary to some others languages, there is no flexions for
> proper names in French.
> For example, there is no 's' at the end of the plural form.
> The Edisons          --> les Edison
> So, if someone write "les Edisons", the mistake won't be recognized
> due to this "language all" dictionary, if flexions of proper names are
> allowed by it.
>
> This "all language" dictionary would probably interfere with others
> words, and the spellchecker would suggest a wrong spelling for a word
> similar to thoses in the "all language" dictionary.
>
>
> I am also afraid it will be an english-american-centered culture
> dictionary.
>
> In the few examples given, there is already one entry I don't know
> what it is supposed to be.
>
> What/who is Alva? A skateboarder?
>
>
> Regards,
> Olivier
>
Most true English people would also object strongly to a "shared"
American/English dictionary. We don't have "centers" or "theaters" or
"colors" and the metal is not "aluminum". We have "centres", "theatres",
"colours" and "aluminium". We don't go "traveling" but "travelling".
Before I was born I wasn't a "fetus" but a "foetus". The American
versions are just plain wrong in English, not just "acceptable
variants". Google finds many lists of differences. As a first example,
go to http://www2.gsu.edu/~wwwesl/egw/jones/differences.htm

Canadians, Australians and many other "English speaking" countries have
their own variations or mixtures. Hence separate Canadian English and
Australian English dictionaries for OOo, to mention just two.

I don't know who dreamed up this "all language" dictionary but I think
it's one of the most facile, imbecilic ideas I've heard in a very long
time. Have the developers nothing better to do with their time?

--
Harold Fuchs
London, England
Please reply *only* to [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

ge-7
In reply to this post by Daniel Naber-9
>>
I don't know who dreamed up this "all language" dictionary but I think
it's one of the most facile, imbecilic ideas I've heard in a very long
time. Have the developers nothing better to do with their time?
<<

Strongly agree!

-eleonora



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

Németh László-2
In reply to this post by Daniel Naber-9
2009/2/17 ge <[hidden email]>:
>>>
> I don't know who dreamed up this "all language" dictionary but I think
> it's one of the most facile, imbecilic ideas I've heard in a very long
> time. Have the developers nothing better to do with their time?
> <<
>
> Strongly agree!

Hi,

I'm afraid, there is a misunderstanding here. "All language" user
dictionary is an existing feature of Openoffice.org (see [All] in the
Tools→Options→Language Settings→Writing Aids→User-defined
dictionaries), so this is a back compatibility issue.

Regards,
László


>
> -eleonora
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

Harold Fuchs-6
On 17/02/2009 20:57, Németh László wrote:

> 2009/2/17 ge <[hidden email]>:
>  
>> I don't know who dreamed up this "all language" dictionary but I think
>> it's one of the most facile, imbecilic ideas I've heard in a very long
>> time. Have the developers nothing better to do with their time?
>> <<
>>
>> Strongly agree!
>>    
>
> Hi,
>
> I'm afraid, there is a misunderstanding here. "All language" user
> dictionary is an existing feature of Openoffice.org (see [All] in the
> Tools→Options→Language Settings→Writing Aids→User-defined
> dictionaries), so this is a back compatibility issue.
>
> Regards,
> László
>
>  
<snip>

In that case please explain (a) the usage/purpose of these dictionaries
and (b) how you think it is that the "misunderstanding" arose. For (a)
please give useful examples.

--
Harold Fuchs
London, England
Please reply *only* to [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

ge-7
> In that case please explain (a) the usage/purpose of these dictionaries
> and (b) how you think it is that the "misunderstanding" arose. For (a)
> please give useful examples.

I do not think, there is any "misunderstanding". In my OpenOffice, there are several general purpose libraries, sun, soffice and debian. They contain words like Adabas, Solaris and the like. These dictionaries can not be edited. Words in these dictionaries will not be flagged as wrong words in a text, however, no endings (conjugation) are accepted, therefore they usefulness is very limited. (for example eSun is OK, eSuns not)

There is also an ignore list, that can be edited. I assume its purpose is to add words, that you do not wish to be on the suggested words' list. (not sure)

I personally do not think, these dictionaries are too useful, and stick to your prevous comment.

-eleonora
--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

Németh László-2
Hi,

Most useful task of the user-defined "All" dictionary is to set the
correct spell checking and hyphenation of the personal and other
proper names of the users.

You are right, there is a tendency from the vendors to add
language-dependent, unused or questionable words to all languages, but
this is not related to the planned new implementation of the
user-defined dictionaries (fixing the problems of the suggestion and
the missing capitalization, affixation and multi-word expression
support). Please, report the dictionary problems to the vendors, for
example, the word "deinstall" of the vendor-defined dictionary
"soffice" or the "daughterboard" of the "sun" are not international
words. I have filed an issue about these problems:
http://www.openoffice.org/issues/show_bug.cgi?id=99359. Thomas Lange
and I wrote about the possible new implementation of the user-defined
dictionary handling, changing the subject, because we need better
user-defined dictionary support, too. I think, all problematic words
can be removed from the vendor-defined dictionaries. When everything
else had failed, you can simply switch of the vendor-defined
dictionaries in the Options. I'm sorry about this misunderstanding.

Regards,
László


2009/2/18  <[hidden email]>:

>> In that case please explain (a) the usage/purpose of these dictionaries
>> and (b) how you think it is that the "misunderstanding" arose. For (a)
>> please give useful examples.
>
> I do not think, there is any "misunderstanding". In my OpenOffice, there are several general purpose libraries, sun, soffice and debian. They contain words like Adabas, Solaris and the like. These dictionaries can not be edited. Words in these dictionaries will not be flagged as wrong words in a text, however, no endings (conjugation) are accepted, therefore they usefulness is very limited. (for example eSun is OK, eSuns not)
>
> There is also an ignore list, that can be edited. I assume its purpose is to add words, that you do not wish to be on the suggested words' list. (not sure)
>
> I personally do not think, these dictionaries are too useful, and stick to your prevous comment.
>
> -eleonora
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

thomas.lange
In reply to this post by Daniel Naber-9

Hi all,

R.J. Baars wrote:

> The use of the 'all'-dictionary will require validation by all language
> teams.  And, as mentioned before, the flexes could quite different.

You got me wrong!
I don't want anyone to provide a 'language-all' dictionary by default.

What I do like is to have hunspell support them. If that were the case
AND if user-dictionaries were compatible with hunspell, THEN the user
himself can add words/names to some very personal 'language-all'
user-dictionary. Thats what my concern is about.


A 'language-all' dictionary provided by default is a little bit
questionable since I don't really know for example how e.g. English
names are written in Chinese or Japanese.
Probably one of both will happen there:

1) The name is written as in English if the readers can understand that
2) The name will be written in Chinese or Japanese characters in a way
that, when read, they will sound like when that name is read in English.

Only in the first case it will be ok to have names already supplied in a
'default' dictionary. Thus the choice should probably be on the users
side. And that could best be done if it is him who makes those entries.

Also the vice-versa approach, having Chinese names romanized won't work
well with a 'languag-all' dictionary since there are more than one
method to romanize Chinese or Japanese names. Here also the best choice
will probably be to leave that to the user.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: hunspell dictionary extension by Google

thomas.lange
In reply to this post by Daniel Naber-9

Hi eleonora,

> I don't know who dreamed up this "all language" dictionary but

I don't know that myself. ^^

> I think
> it's one of the most facile, imbecilic ideas I've heard in a very long
> time. Have the developers nothing better to do with their time?


Not exactly (remember that it is only about user-dictionaries, not about
 default hunspell dictionaries). For example if you have a look in the
user-dictionary named 'sun', then you will find the following words:
- Adabas
- OpenOffice.org
- Webcast
...
It is a very convenient way to make sure that e.g. 'OpenOffice.org' will
always be regarded as correct even though it may not be part in any
hunspell dicitonary.
'Langugae-all' dictionaries do make sense to customize upon the existing
hunspell dictionaries. That they get either used/created by the provider
(here as part of OOo) or by the user himself.

On the other hand, as already mentioned elsewhere, I don't think they do
make much sense as regular dictionary source for the spell checker. They
should be maintained only by the provider or user.

And the only reason for me asking for hunspell support for them is to
allow for a implementation change of the user-dictionary format to one
directly readable by hunspell. This is purely to not loose functionality
when changing the format.


Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12