Re: chmorph and analyze

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: chmorph and analyze

ge-7
Hello, László,

Thanks for the explanation. I have some additional questions:

1. I also could not get any meaningful result from generate method of analyze.
input:
alma körte
alma almás
alma al
alma alma
Output:
generate(alma, körte) = NO DATA
generate(alma, almás) = NO DATA
generate(alma, al) = NO DATA
generate(alma, alma) = NO DATA
Could you please give a working example?

2. In fact analyze and stem pass back the same result:
input: almát
Output:
analyze(almát) =  st:almok -
analyze(almát) =  st:alma -
stem(almát) = almok
stem(almát) = alma
If these methods return always the same result, why two methods?

Thanks in advance: eleonora


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: chmorph and analyze

ge-7
Hallo László,

Thanks for the detailed explanations. I suggest to put these 4 emails to the faq of hunspell tools, since other users also might need these explanations.
Also proper instructions how to build a 'gen' dictionary would be useful.

Thanks, eleonora


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: chmorph and analyze

Németh László-2
In reply to this post by ge-7
Hello Eleonóra,

You need a dictionary with morphological data, for example for English
(http://www.openoffice.org/nonav/issues/showattachment.cgi/59629/english-dictionaries-2008-01-23.zip):

$ ~/hunspell-1.2.8/src/tools/analyze
~/english-dictionaries-2008-01-23/en_US.{aff,dic} /dev/stdin
men
> men
analyze(men) =  st:man ts:Ns
stem(men) = man
mouse men
generate(mouse, men) = mice

There are also different Hunspell dictionaries in the Magyar Ispell
distribution. For morphological analysis and generation, also for
better stemming you need one of the "gen" dictionary variants (they
contain morphological data fields, too):

~/magyarispell-1.4$ ~/hunspell-1.2.8/src/tools/analyze
hu_HU_u8_gen.{aff,dic} /dev/stdin
almát
> almát
analyze(almát) =  st:alom po:noun ts:PLUR ts:NOM is:POSS_SG_3 is:ACC
analyze(almát) =  st:alma po:noun ts:NOM is:ACC
stem(almát) = alom
stem(almát) = alma
alma körte
generate(alma, körte) = alma
generate(alma, körte) = alom
alma almás
generate(alma, almás) = almás
generate(alma, almás) = almos
alma al
generate(alma, al) = NO DATA
alma alma
generate(alma, alma) = alma
generate(alma, alma) = alom
generate(alma, alma) = almája
generate(alma, alma) = alomja

Regards,
László




2009/3/24 ge <[hidden email]>:

> Hello, László,
>
> Thanks for the explanation. I have some additional questions:
>
> 1. I also could not get any meaningful result from generate method of analyze.
> input:
> alma körte
> alma almás
> alma al
> alma alma
> Output:
> generate(alma, körte) = NO DATA
> generate(alma, almás) = NO DATA
> generate(alma, al) = NO DATA
> generate(alma, alma) = NO DATA
> Could you please give a working example?
>
> 2. In fact analyze and stem pass back the same result:
> input: almát
> Output:
> analyze(almát) =  st:almok -
> analyze(almát) =  st:alma -
> stem(almát) = almok
> stem(almát) = alma
> If these methods return always the same result, why two methods?
>
> Thanks in advance: eleonora
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: chmorph and analyze

Németh László-2
In reply to this post by ge-7
Hello,

2009/3/24 ge <[hidden email]>:
> Hallo László,
>
> Thanks for the detailed explanations. I suggest to put these 4 emails to the faq of hunspell tools, since other users also might need these explanations.
> Also proper instructions how to build a 'gen' dictionary would be useful.

I thank you for the proposal. The "gen" dictionaries of the Magyar
Ispell distribution are generated by a simple "make". For the English
morphological dictionary generation I have attached a script here:
http://www.openoffice.org/issues/show_bug.cgi?id=19563

A very short introduction of the stemming and morphological dictionary
development for OOo 3.1:

Most of the stemming issues will work with the recent Hunspell dictionaries.
For irregular dictionary items, you can use the "st:" field to
specify the stem (use tabulator instead of space for back compatibility):

----- dic file ------
best st:good

For morphological generation, you have to add the morphological categories
of the affixes and dictionary items by "ds:",  "is:", "ts:"
(derivative suffix, inflectional suffix, terminal suffix) fields, or
allomorphs by the "al:" items, like in the attached patches. An example for the
"al" items:

3
best st:good is:comp2
better st:good is:comp1
good al:better al:best ts:0

Note about "ts:0": morphological generation needs explicit suffix
fields on the dictionary items.

See also the manual and test dictionaries of the Hunspell distribution.

Regards,
László

>
> Thanks, eleonora
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]