How to make the wordlist file and affix file?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to make the wordlist file and affix file?

Choi, JiHui
Hello, all

I'd like to make a korean spell checker for OOo.
We don't have any public dictionary or spell checker, wordlist, affix.
(Even there are some dictionary for stardict, but I'm not sure their
license is correct.)

So- I'd like to make one.
I guess at first I need a wordlist and affix file for hunspell or
myspell or ispell.
But I couldn't find any information about the first step to make those.

Is anyone who helps me?

--
Regards,
JiHui Choi
-------------------------------------------------------------------------------------
http://Mr-Dust.pe.kr
http://GIMP.kr,  http://OpenOffice.or.kr,  http://Ubuntu.or.kr

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to make the wordlist file and affix file?

Goran Rakic
У нед, 23. 11 2008. у 07:14 +0900, JiHui Choi пише:
>
> So- I'd like to make one.
> I guess at first I need a wordlist and affix file for hunspell or
> myspell or ispell.
>
> But I couldn't find any information about the first step to make those.
>
> Is anyone who helps me?
>

Affix file specifies rules that can be applied to words in a word list
to create new words. For some languages this can significantly reduce
word list and provide better coverage, and for others it is useless.
There is syntax for rules that delete, add or insert characters. Rules
are applied to specific class of words (think about making plural for
english nouns by adding 's' or removing 'y' and adding 'ies')

Minimal affix file (without any rules) consists of two lines:
SET UTF-8
TRY asfjlasjfl

If you want to write some rules you can look here for syntax:
http://tinyurl.com/hunspell-manual

First is encoding used for word list, and second should list every
letter from your language sorted by frequency.

Word list is just that, one word in a row (optionally followed by affix
file rule classes that should be applied to that rule). In first line
you should write approximate number of words. You can use Wikipedia or
other texts in your language available on the Internet to create basic
corpora.


Once you have kr.aff and kr.dic you need to package it as extension.

There is Wiki page that describe what additional files you should make:
http://wiki.services.openoffice.org/wiki/Extension_Dictionaries

You will need dictionaries.xcu with one DICT_SPELL entry, and
description.xml to name your extension.

The best way to do packaging is to download some dictionary extension
from http://extensions.services.openoffice.org/dictionary rename it to
".zip", extract it and then edit it's content.


I doubt that there is ispell/aspell Korean dictionary as they don't
support full Unicode. I will be glad to help you with packaging, but
word list is something you have to make.


Regards,
Goran Rakic
OpenOffice.org Serbian
native-lang project lead



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to make the wordlist file and affix file?

Ruud Baars-2
In reply to this post by Choi, JiHui
You could start wit a simple word list alone.

Have you got a (huge) lis tof words yet?


JiHui Choi schreef:

> Hello, all
>
> I'd like to make a korean spell checker for OOo.
> We don't have any public dictionary or spell checker, wordlist, affix.
> (Even there are some dictionary for stardict, but I'm not sure their
> license is correct.)
>
> So- I'd like to make one.
> I guess at first I need a wordlist and affix file for hunspell or
> myspell or ispell.
> But I couldn't find any information about the first step to make those.
>
> Is anyone who helps me?
>
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to make the wordlist file and affix file?

Choi, JiHui
Hello, all

Through your helps, I made a long wordlist, an affix file, and~ an
extension for OOo
http://openoffice.or.kr/team/doku.php?id=%EA%B3%B5%EA%B0%9C_%ED%95%9C%EA%B8%80_%EB%A7%9E%EC%B6%A4%EB%B2%95_%EA%B2%80%EC%82%AC%EA%B8%B0

Thanks a million, all of you!!!

But this is a very basic status. I think I need to study about my language.

--
Regards,
JiHui Choi
-------------------------------------------------------------------------------------
http://Mr-Dust.pe.kr
http://GIMP.kr,  http://OpenOffice.or.kr,  http://Ubuntu.or.kr

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: How to make the wordlist file and affix file?

Ruud Baars-2
If your language has compounding, sticking words together, you might
want to look at the compoudning options of hunspell. These ar
complicated, but good for flexibility of the word list, and compression
too.

Have a good time !

Ruud

JiHui Choi schreef:

> Hello, all
>
> Through your helps, I made a long wordlist, an affix file, and~ an
> extension for OOo
> http://openoffice.or.kr/team/doku.php?id=%EA%B3%B5%EA%B0%9C_%ED%95%9C%EA%B8%80_%EB%A7%9E%EC%B6%A4%EB%B2%95_%EA%B2%80%EC%82%AC%EA%B8%B0
>
> Thanks a million, all of you!!!
>
> But this is a very basic status. I think I need to study about my language.
>
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]