Slow dictionary load

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow dictionary load

Alan Yaniger
Hi list-members,

Using DicOO, I installed Hebrew dictionaries on 2.2.

They take a long time to load when I have autospellchecking on. If I
open OOo, and begin typing in Hebrew, I get stuck for about 15 seconds
while the dictionaries load. I'm using a Compaq Presario 900 running
Windows XP. The English dictionaries are also installed, and if I close
the Quickstarter, open OOo and type in English, there is no delay at all.

Many users have complained about this. How might I improve the loading
of the dictionaries? Is the problem in the the way the dictionaries were
created? Is it in the OOo code? If so, where should I look?

Thanks,
Alan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Daniel Naber-9
On Friday 27 April 2007 10:18, Alan Yaniger wrote:

> Is the problem in the the way the dictionaries were
> created?

I suggest you download hunspell and use it to check a very small text. This
way you can see if the problem is in OOo or in the spellchecker component
(hunspell). You could also compare with myspell. As hunspell has more
features than myspell, it might be slower.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Jancs
In reply to this post by Alan Yaniger
Citēju Alan Yaniger <[hidden email]>:

> They take a long time to load when I have autospellchecking on. If I
> open OOo, and begin typing in Hebrew, I get stuck for about 15 seconds

it may happen due to the way how the library is built also. Latvian had such
behaviour in the time of first versions...

Janis
***

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Alan Yaniger
Hi Janis,

What did you change to overcome the problem in Latvian?

Thanks,
Alan

Jancs wrote:

>Citēju Alan Yaniger <[hidden email]>:
>
>  
>
>>They take a long time to load when I have autospellchecking on. If I
>>open OOo, and begin typing in Hebrew, I get stuck for about 15 seconds
>>    
>>
>
>it may happen due to the way how the library is built also. Latvian had such
>behaviour in the time of first versions...
>
>Janis
>***
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Alan Yaniger
In reply to this post by Daniel Naber-9
Hi Daniel,

Thanks for your reply. I downloaded Hunspell and checked a very small
text with Hebrew dictionaries. There was a considerable delay until
hunspell exited. When I checked the same Hebrew text or a similarly
small English text using English dictionaries, hunspell exited immediately.
Could the size of the dictionaries be the reason for the delay? Here are
the sizes of the Hebrew and English dictionaries:

  386,182 he_IL.aff
3,103,184  he_IL.dic

   696,131  en_US.dic
       3,045 en_US.aff

Alan

Daniel Naber wrote:

>On Friday 27 April 2007 10:18, Alan Yaniger wrote:
>
>  
>
>>Is the problem in the the way the dictionaries were
>>created?
>>    
>>
>
>I suggest you download hunspell and use it to check a very small text. This
>way you can see if the problem is in OOo or in the spellchecker component
>(hunspell). You could also compare with myspell. As hunspell has more
>features than myspell, it might be slower.
>
>Regards
> Daniel
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Jancs
In reply to this post by Alan Yaniger
Citēju Alan Yaniger <[hidden email]>:

> What did you change to overcome the problem in Latvian?

actually, not very much regarding affixes (allthough), but made a heavy work of
filtering out non-existent but "spellcheckerily" correct words I used as roots.
As the results, unmunched dictionary shrunk from about 230 M to ~70 M. As the
consequence - loading time decresead significantly, from "why so long?" till
not noticeable comparing to the time of load of OO itself.

Janis
***

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Marcin Miłkowski
In reply to this post by Alan Yaniger
Hi Alan,

I don't think it's the reason. Polish dictionary file is about 4 MB and
it loads fast (however the affix file is about 200K). Check it yourself.
However, it's not UTF-8 - it's ISO-8859-2. Maybe UTF-8 makes it slower?

Regards,
Marcin


Alan Yaniger napisał(a):

> Hi Daniel,
>
> Thanks for your reply. I downloaded Hunspell and checked a very small
> text with Hebrew dictionaries. There was a considerable delay until
> hunspell exited. When I checked the same Hebrew text or a similarly
> small English text using English dictionaries, hunspell exited immediately.
> Could the size of the dictionaries be the reason for the delay? Here are
> the sizes of the Hebrew and English dictionaries:
>
>  386,182 he_IL.aff
> 3,103,184  he_IL.dic
>
>   696,131  en_US.dic
>       3,045 en_US.aff
>
> Alan
>
> Daniel Naber wrote:
>
>> On Friday 27 April 2007 10:18, Alan Yaniger wrote:
>>
>>  
>>
>>> Is the problem in the the way the dictionaries were
>>> created?
>>>  
>>
>> I suggest you download hunspell and use it to check a very small text.
>> This way you can see if the problem is in OOo or in the spellchecker
>> component (hunspell). You could also compare with myspell. As hunspell
>> has more features than myspell, it might be slower.
>>
>> Regards
>> Daniel
>>
>>  
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Alan Yaniger
Hi Marcin,

The Hebrew dictionary isn't UTF-8 either, but ISO-8859-8. Could the
difference in size of the affix file make that much difference?

Alan

Marcin Miłkowski wrote:

> Hi Alan,
>
> I don't think it's the reason. Polish dictionary file is about 4 MB
> and it loads fast (however the affix file is about 200K). Check it
> yourself. However, it's not UTF-8 - it's ISO-8859-2. Maybe UTF-8 makes
> it slower?
>
> Regards,
> Marcin
>
>
> Alan Yaniger napisał(a):
>
>> Hi Daniel,
>>
>> Thanks for your reply. I downloaded Hunspell and checked a very small
>> text with Hebrew dictionaries. There was a considerable delay until
>> hunspell exited. When I checked the same Hebrew text or a similarly
>> small English text using English dictionaries, hunspell exited
>> immediately.
>> Could the size of the dictionaries be the reason for the delay? Here
>> are the sizes of the Hebrew and English dictionaries:
>>
>>  386,182 he_IL.aff
>> 3,103,184  he_IL.dic
>>
>>   696,131  en_US.dic
>>       3,045 en_US.aff
>>
>> Alan
>>
>> Daniel Naber wrote:
>>
>>> On Friday 27 April 2007 10:18, Alan Yaniger wrote:
>>>
>>>  
>>>
>>>> Is the problem in the the way the dictionaries were
>>>> created?
>>>>  
>>>
>>>
>>> I suggest you download hunspell and use it to check a very small
>>> text. This way you can see if the problem is in OOo or in the
>>> spellchecker component (hunspell). You could also compare with
>>> myspell. As hunspell has more features than myspell, it might be
>>> slower.
>>>
>>> Regards
>>> Daniel
>>>
>>>  
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Alan Yaniger
In reply to this post by Marcin Miłkowski
Hi Marcin, Janis, Eleanora,

I did some debugging in the hunspell code, and found that the size of
the Hebrew dictionaries was the cause of the delay, similar to Janis's
problem in Latvian. The files are read line by line, and he_IL.dic has
329,326 entries, which is far more than the other dictionies I tried.
The main bottleneck was not in reading the files from the disk, but in
building the hash tables in hashmgr.cxx in add_word(). When I shortened
he_IL.dic to the size of the Hungarian dictionary, it took the same
amount of time to load Hebrew and Hungarian. Same with Hebrew and
English US.

To Hunspell developers out there: is there any way to make the building
of the hash tables more efficient?

Alan

Marcin Miłkowski wrote:

> Hi Alan,
>
> I don't think it's the reason. Polish dictionary file is about 4 MB
> and it loads fast (however the affix file is about 200K). Check it
> yourself. However, it's not UTF-8 - it's ISO-8859-2. Maybe UTF-8 makes
> it slower?
>
> Regards,
> Marcin
>
>
> Alan Yaniger napisał(a):
>
>> Hi Daniel,
>>
>> Thanks for your reply. I downloaded Hunspell and checked a very small
>> text with Hebrew dictionaries. There was a considerable delay until
>> hunspell exited. When I checked the same Hebrew text or a similarly
>> small English text using English dictionaries, hunspell exited
>> immediately.
>> Could the size of the dictionaries be the reason for the delay? Here
>> are the sizes of the Hebrew and English dictionaries:
>>
>>  386,182 he_IL.aff
>> 3,103,184  he_IL.dic
>>
>>   696,131  en_US.dic
>>       3,045 en_US.aff
>>
>> Alan
>>
>> Daniel Naber wrote:
>>
>>> On Friday 27 April 2007 10:18, Alan Yaniger wrote:
>>>
>>>  
>>>
>>>> Is the problem in the the way the dictionaries were
>>>> created?
>>>>  
>>>
>>>
>>> I suggest you download hunspell and use it to check a very small
>>> text. This way you can see if the problem is in OOo or in the
>>> spellchecker component (hunspell). You could also compare with
>>> myspell. As hunspell has more features than myspell, it might be
>>> slower.
>>>
>>> Regards
>>> Daniel
>>>
>>>  
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Laurent Godard-3
Hi

> To Hunspell developers out there: is there any way to make the building
> of the hash tables more efficient?
>

i may expereince the same problem on a project i'm working on
gascon file is 6.5 million lines before affix file
i'm not sure i'll be able to reduce it by 20

Laurent


--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

ge-7
In reply to this post by Alan Yaniger
Alan,

The size of the 2-nd Hungarian dictionary is:

   lines    words    characters
   22068   124931   622546 hu_HU.aff
  873355   873348 26481165 hu_HU.dic
  895423   998279 27103711 total

dic contains 873378 words, it is 8 times larger than Hebrew.
aff is roughly twice as big as Hebrew.

I assume, you used the 1-st Hungarian one,
with the small word count for your test.

I use the 2-nd all the time, and it loads in
less than 1 second for me.
Therefore I do not understand the effect you
describe.

-eleonora


> Hi Marcin, Janis, Eleanora,
>
> I did some debugging in the hunspell code, and found that the size of
> the Hebrew dictionaries was the cause of the delay, similar to Janis's
> problem in Latvian. The files are read line by line, and he_IL.dic has
> 329,326 entries, which is far more than the other dictionies I tried.
> The main bottleneck was not in reading the files from the disk, but in
> building the hash tables in hashmgr.cxx in add_word(). When I shortened
> he_IL.dic to the size of the Hungarian dictionary, it took the same
> amount of time to load Hebrew and Hungarian. Same with Hebrew and
> English US.
>
> To Hunspell developers out there: is there any way to make the building
> of the hash tables more efficient?
>
> Alan


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Alan Yaniger
Eleonora,

Yes, I used a different dictionary than yours. The hu_HU.dic I used has
96,461 lines. Apparently the Hungarian dictionary available through
DicOO isn't the latest.

Perhaps your hardware is faster than mine. In my slower(?) hardware, I
see a significant difference between building the hash table for large
dictionaries and for smaller ones. Many users have complained about OOo
"getting stuck" while the dictionaries load. So I think that it would be
useful if Hunspell developers could improve performance here.

Alan

ge wrote:

>Alan,
>
>The size of the 2-nd Hungarian dictionary is:
>
>   lines    words    characters
>   22068   124931   622546 hu_HU.aff
>  873355   873348 26481165 hu_HU.dic
>  895423   998279 27103711 total
>
>dic contains 873378 words, it is 8 times larger than Hebrew.
>aff is roughly twice as big as Hebrew.
>
>I assume, you used the 1-st Hungarian one,
>with the small word count for your test.
>
>I use the 2-nd all the time, and it loads in
>less than 1 second for me.
>Therefore I do not understand the effect you
>describe.
>
>-eleonora
>
>
>  
>
>>Hi Marcin, Janis, Eleanora,
>>
>>I did some debugging in the hunspell code, and found that the size of
>>the Hebrew dictionaries was the cause of the delay, similar to Janis's
>>problem in Latvian. The files are read line by line, and he_IL.dic has
>>329,326 entries, which is far more than the other dictionies I tried.
>>The main bottleneck was not in reading the files from the disk, but in
>>building the hash tables in hashmgr.cxx in add_word(). When I shortened
>>he_IL.dic to the size of the Hungarian dictionary, it took the same
>>amount of time to load Hebrew and Hungarian. Same with Hebrew and
>>English US.
>>
>>To Hunspell developers out there: is there any way to make the building
>>of the hash tables more efficient?
>>
>>Alan
>>    
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: [hidden email]
>For additional commands, e-mail: [hidden email]
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Kevin B. Hendricks
Hi,

Just a thought ...  did you remember to place a count of the number  
of entries as the first line in the *.dic file.  Without that count,  
the wrong size hashtable is created and then time is spent walking  
the hashtable entries instead of finding an empty slot (or nearly  
empty slot).

If you do not put the count as the first line, you will end up with  
very slow hashbuilding times.

Kevin

On May 1, 2007, at 1:08 PM, Alan Yaniger wrote:

> Eleonora,
>
> Yes, I used a different dictionary than yours. The hu_HU.dic I used  
> has 96,461 lines. Apparently the Hungarian dictionary available  
> through DicOO isn't the latest.
>
> Perhaps your hardware is faster than mine. In my slower(?)  
> hardware, I see a significant difference between building the hash  
> table for large dictionaries and for smaller ones. Many users have  
> complained about OOo "getting stuck" while the dictionaries load.  
> So I think that it would be useful if Hunspell developers could  
> improve performance here.
>
> Alan
>
> ge wrote:
>
>> Alan,
>>
>> The size of the 2-nd Hungarian dictionary is:
>>
>>   lines    words    characters
>>   22068   124931   622546 hu_HU.aff
>>  873355   873348 26481165 hu_HU.dic
>>  895423   998279 27103711 total
>>
>> dic contains 873378 words, it is 8 times larger than Hebrew.
>> aff is roughly twice as big as Hebrew.
>>
>> I assume, you used the 1-st Hungarian one, with the small word  
>> count for your test.
>>
>> I use the 2-nd all the time, and it loads in
>> less than 1 second for me.
>> Therefore I do not understand the effect you
>> describe.
>>
>> -eleonora
>>
>>
>>
>>> Hi Marcin, Janis, Eleanora,
>>>
>>> I did some debugging in the hunspell code, and found that the  
>>> size of
>>> the Hebrew dictionaries was the cause of the delay, similar to  
>>> Janis's
>>> problem in Latvian. The files are read line by line, and  
>>> he_IL.dic has
>>> 329,326 entries, which is far more than the other dictionies I  
>>> tried.
>>> The main bottleneck was not in reading the files from the disk,  
>>> but in
>>> building the hash tables in hashmgr.cxx in add_word(). When I  
>>> shortened
>>> he_IL.dic to the size of the Hungarian dictionary, it took the same
>>> amount of time to load Hebrew and Hungarian. Same with Hebrew and
>>> English US.
>>>
>>> To Hunspell developers out there: is there any way to make the  
>>> building
>>> of the hash tables more efficient?
>>>
>>> Alan
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: dev-
>> [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: dev-
> [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Kevin B. Hendricks
In reply to this post by Alan Yaniger
Hi Alan,

If you did place the count as the top line (to create a properly  
sized hash table) then perhaps the only potential speedup is to  
change hunspell to mmap a file that is the previously created  
hashtable similar to what ispell uses.

The problem only real problem is that all binary formats like that  
have endian issues across architectures that make things quite  
difficult.  That is why I decided with myspell to go with building  
the hash table on-the-fly so to speak.  There are no binary  
compatibility issues that way.

Another source of delay when starting up the spell-checker is when  
the user has checked "check word in all languages" option but doesn't  
realize that that they have a large number of dictionaries that have  
to be loaded when the first misspelt word is checked.

Obviously, for creating hash tables from large .dic files, available  
memory is an issue.  How much memory do you have available for your  
machine?

Kevin


On May 1, 2007, at 1:08 PM, Alan Yaniger wrote:

> Eleonora,
>
> Yes, I used a different dictionary than yours. The hu_HU.dic I used  
> has 96,461 lines. Apparently the Hungarian dictionary available  
> through DicOO isn't the latest.
>
> Perhaps your hardware is faster than mine. In my slower(?)  
> hardware, I see a significant difference between building the hash  
> table for large dictionaries and for smaller ones. Many users have  
> complained about OOo "getting stuck" while the dictionaries load.  
> So I think that it would be useful if Hunspell developers could  
> improve performance here.
>
> Alan
>
> ge wrote:
>
>> Alan,
>>
>> The size of the 2-nd Hungarian dictionary is:
>>
>>   lines    words    characters
>>   22068   124931   622546 hu_HU.aff
>>  873355   873348 26481165 hu_HU.dic
>>  895423   998279 27103711 total
>>
>> dic contains 873378 words, it is 8 times larger than Hebrew.
>> aff is roughly twice as big as Hebrew.
>>
>> I assume, you used the 1-st Hungarian one, with the small word  
>> count for your test.
>>
>> I use the 2-nd all the time, and it loads in
>> less than 1 second for me.
>> Therefore I do not understand the effect you
>> describe.
>>
>> -eleonora
>>
>>
>>
>>> Hi Marcin, Janis, Eleanora,
>>>
>>> I did some debugging in the hunspell code, and found that the  
>>> size of
>>> the Hebrew dictionaries was the cause of the delay, similar to  
>>> Janis's
>>> problem in Latvian. The files are read line by line, and  
>>> he_IL.dic has
>>> 329,326 entries, which is far more than the other dictionies I  
>>> tried.
>>> The main bottleneck was not in reading the files from the disk,  
>>> but in
>>> building the hash tables in hashmgr.cxx in add_word(). When I  
>>> shortened
>>> he_IL.dic to the size of the Hungarian dictionary, it took the same
>>> amount of time to load Hebrew and Hungarian. Same with Hebrew and
>>> English US.
>>>
>>> To Hunspell developers out there: is there any way to make the  
>>> building
>>> of the hash tables more efficient?
>>>
>>> Alan
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: dev-
>> [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: dev-
> [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Peter B. West
Kevin B. Hendricks wrote:

> Hi Alan,
>
> If you did place the count as the top line (to create a properly sized
> hash table) then perhaps the only potential speedup is to change
> hunspell to mmap a file that is the previously created hashtable similar
> to what ispell uses.
>
> The problem only real problem is that all binary formats like that have
> endian issues across architectures that make things quite difficult.
> That is why I decided with myspell to go with building the hash table
> on-the-fly so to speak.  There are no binary compatibility issues that way.
>
> Another source of delay when starting up the spell-checker is when the
> user has checked "check word in all languages" option but doesn't
> realize that that they have a large number of dictionaries that have to
> be loaded when the first misspelt word is checked.
>
> Obviously, for creating hash tables from large .dic files, available
> memory is an issue.  How much memory do you have available for your
> machine?
>
> Kevin

Kevin,

Would it make sense to pre-read the file to determine the number of
entries in the case where there was no count a the beginning?

I'm not familiar with mmap, but if the file were created on the user's
machine at the first invocation, mmapping from that file for subsequent
accesses would solve the architecture issues, wouldn't it?

Peter

--
Peter B. West <http://cv.pbw.id.au/>
Folio <http://defoe.sourceforge.net/folio/>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow dictionary load

Alan Yaniger
In reply to this post by Kevin B. Hendricks
Hi Kevin,

Thanks for your input. There is a count of the number of entries on the
top line of the Hebrew dictionary, so that's not a problem.

On the machine I'm working on now, the OOo installation doesn't have
"check all langages" marked.

There's plenty of memory, as the following output of "free" shows:

             total       used       free     shared    buffers     cached
Mem:       8109956     981068    7128888          0      88780     710764
-/+ buffers/cache:     181524    7928432
Swap:      5815488          0    5815488

The installed dictionaries are: English US, Hebrew. If I type in
English, there is no noticeable delay, and misspelled words are marked
in red. If I then start typing in Hebrew, there is a 5 second delay in
which OOo seems "stuck" while building the hash table.

Thanks,
Alan

Kevin B. Hendricks wrote:

> Hi Alan,
>
> If you did place the count as the top line (to create a properly  
> sized hash table) then perhaps the only potential speedup is to  
> change hunspell to mmap a file that is the previously created  
> hashtable similar to what ispell uses.
>
> The problem only real problem is that all binary formats like that  
> have endian issues across architectures that make things quite  
> difficult.  That is why I decided with myspell to go with building  
> the hash table on-the-fly so to speak.  There are no binary  
> compatibility issues that way.
>
> Another source of delay when starting up the spell-checker is when  
> the user has checked "check word in all languages" option but doesn't  
> realize that that they have a large number of dictionaries that have  
> to be loaded when the first misspelt word is checked.
>
> Obviously, for creating hash tables from large .dic files, available  
> memory is an issue.  How much memory do you have available for your  
> machine?
>
> Kevin
>
>
> On May 1, 2007, at 1:08 PM, Alan Yaniger wrote:
>
>> Eleonora,
>>
>> Yes, I used a different dictionary than yours. The hu_HU.dic I used  
>> has 96,461 lines. Apparently the Hungarian dictionary available  
>> through DicOO isn't the latest.
>>
>> Perhaps your hardware is faster than mine. In my slower(?)  hardware,
>> I see a significant difference between building the hash  table for
>> large dictionaries and for smaller ones. Many users have  complained
>> about OOo "getting stuck" while the dictionaries load.  So I think
>> that it would be useful if Hunspell developers could  improve
>> performance here.
>>
>> Alan
>>
>> ge wrote:
>>
>>> Alan,
>>>
>>> The size of the 2-nd Hungarian dictionary is:
>>>
>>>   lines    words    characters
>>>   22068   124931   622546 hu_HU.aff
>>>  873355   873348 26481165 hu_HU.dic
>>>  895423   998279 27103711 total
>>>
>>> dic contains 873378 words, it is 8 times larger than Hebrew.
>>> aff is roughly twice as big as Hebrew.
>>>
>>> I assume, you used the 1-st Hungarian one, with the small word  
>>> count for your test.
>>>
>>> I use the 2-nd all the time, and it loads in
>>> less than 1 second for me.
>>> Therefore I do not understand the effect you
>>> describe.
>>>
>>> -eleonora
>>>
>>>
>>>
>>>> Hi Marcin, Janis, Eleanora,
>>>>
>>>> I did some debugging in the hunspell code, and found that the  size of
>>>> the Hebrew dictionaries was the cause of the delay, similar to  
>>>> Janis's
>>>> problem in Latvian. The files are read line by line, and  he_IL.dic
>>>> has
>>>> 329,326 entries, which is far more than the other dictionies I  tried.
>>>> The main bottleneck was not in reading the files from the disk,  
>>>> but in
>>>> building the hash tables in hashmgr.cxx in add_word(). When I  
>>>> shortened
>>>> he_IL.dic to the size of the Hungarian dictionary, it took the same
>>>> amount of time to load Hebrew and Hungarian. Same with Hebrew and
>>>> English US.
>>>>
>>>> To Hunspell developers out there: is there any way to make the  
>>>> building
>>>> of the hash tables more efficient?
>>>>
>>>> Alan
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: dev-
>>> [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: dev- [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]