chmorph

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

chmorph

ge-7
How can I get the proper  8-bit encoded morphological dictionaries?
The ones I downloaded from
ftp://ftp.mokk.bme.hu/Tool/Hunmorph/Resources/Morphdb.hu/morphdb-hu-20060525.tgz
(morphdb_hu.aff, dic) are obviously not in 8 bit encoded format.

Can I convert them to the proper form? If yes, how?

I tried:
en@anonymous:~/program/humorph$ cat morphdb_hu.aff | iconv -f latin2 -t utf-8 > morphdb_hu.aff.u8
en@anonymous:~/program/humorph$ cat morphdb_hu.dic | iconv -f latin2 -t utf-8 > morphdb_hu.dic.u8


In the *.aff.u8 file
SET ISO8859-2 replaced with SET UTF-8

The result is still no good:

en@anonymous:~/program/humorph$ echo program | chmorph *hu.aff.u8 *hu.dic.u8 /dev/stdin NOM ACC
program

en@anonymous:~/program/humorph$ echo program | chmorph *hu.aff.u8 *hu.dic.u8 /dev/stdin NOM POSS
program

en@anonymous:~/program/humorph$ echo program asztalt |./analyze *hu.aff.u8 *hu.dic.u8 /dev/stdin
generate(program, asztalt) = NO DATA

-eleonora



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]