Hunspell - morphological analysis

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Hunspell - morphological analysis

ge-7
There is an other tool, hunmorph for morphological analysis.
http://mokk.bme.hu/resources/hunmorph

Also the dictionary/affix pair has to be completed
with
- grammatical gender (if any)
- word type (verb substantiv, adjetcive, etc..)
- others (if any)

If you want to do morphological analysis

For English position in sentence analysers are the tool of choice
(because lots of verbs are substantives, e.g
walk
play, etc...)

-eleonora



Hi,

Have anyone tried to use Hunspell for morphological analysis? In our grammar
checker development (CoGrOO), we are using a morphological dictionary we
wrote. But it is big, especially because we didn't care about redundant
data, this would be solved using the Hunspell affixes, and even better, the
grammar checker and spell checker would share the same dictionary.
Does anyone know if the OOo make any interface available to its Hunspell, so
a grammar checker could use this interface to query the dictionaries?

Thanks!

William

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell - morphological analysis

Marcin Miłkowski
Hi Eleonora,

> For English position in sentence analysers are the tool of choice
> (because lots of verbs are substantives, e.g
> walk
> play, etc...)

For POS-tagging this is the tool of choice but not so for grammar
checking. We had a statistical sentence-level POS tagger in LanguageTool
but it had serious drawbacks: for some sentences, it simply assigned POS
tags which would have been in a correct sentence so we had no access to
real, incorrect POS tags. In reality, it turned out that a
dictionary-based POS tagger is better for grammar checking, takes less
space, and works faster (it's open source, you can look at sources, etc.
at LanguageTool CVS). And when you can look at a surrounding context,
the ambiguity of tagging isn't much of a problem. You can also use some
rules to disambiguate ambiguous tags in such cases.

Of course, in theory, you could try to train a statistical POS tagger on
bad and correct sentences but as far as I know such work hasn't been done.

Anyway, hunspell (hunmorph) is not the best tool for English - my
solution is not based on affixes, it's purely dictionary-based, and
hunmorph is not a statistical tagger.

Best,
Marcin

> -eleonora
>
>
>
> Hi,
>
> Have anyone tried to use Hunspell for morphological analysis? In our grammar
> checker development (CoGrOO), we are using a morphological dictionary we
> wrote. But it is big, especially because we didn't care about redundant
> data, this would be solved using the Hunspell affixes, and even better, the
> grammar checker and spell checker would share the same dictionary.
> Does anyone know if the OOo make any interface available to its Hunspell, so
> a grammar checker could use this interface to query the dictionaries?
>
> Thanks!
>
> William
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell - morphological analysis

ge-7
Hi Marcin,
> Anyway, hunspell (hunmorph) is not the best tool for English - my
> solution is not based on affixes, it's purely dictionary-based, and
> hunmorph is not a statistical tagger.

I agree with you , hunspell - as the name says- is a spelling tool and no morphological analysis tool. I do not know hunmorph, I just wanted to stress, that for English a pure dictionary based analysis is not enough due to the lots of identical verb/substative pairs like play, walk, cook, etc.... On the other hang English can be happily analysed without affixes, while for other languages affixes are vital (Hungarian, Turkish, Persian, Estonian,  etc...)

You speak about "your solution".
What is it? Is it a morphological analysis tool or a grammar checker? Only for English or also for other languages? Where is it?

-eleonora
--
"Feel free" - 5 GB Mailbox, 50 FreeSMS/Monat ...
Jetzt GMX ProMail testen: www.gmx.net/de/go/mailfooter/promail-out

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell - morphological analysis

Marcin Miłkowski
[hidden email] napisał(a):

> You speak about "your solution".
> What is it? Is it a morphological analysis tool or a grammar checker?

Dictionary-based POS-tagger for LanguageTool, using finite-state
automata format for storing data (one of the most efficient dictionary
formats, in terms of speed and space). Most languages supported by LT
use such dictionaries now.

I use a combination of scripts to re-use 12dicts Word Lists and AGID
files to get part of speech information, and then they clean it, add
some entries I added manually, etc. The overall solution is quite hybrid
but quite fast and efficient. Bugs are there but that's life.

> Only for English or also for other languages? Where is it?

This is a part of LanguageTool (Java version). All sources are in the
CVS (look in resources/en). Two files should be downloaded separately
(infl.txt and part-of-speech.txt from 12dicts and AGID), but it should
be specified in the sources.

We could of course release it separately if anyone else needs a nicely
wrapped package instead of dirty CVS ;)

Best,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell - morphological analysis

William Colen
Hi Marcin,

I've tried to create a Portuguese dictionary using the Language Tool, but
without success. Can you please help me?
Do you use Lametyzator to read the FSA file generated by fsa_build?
(http://www.eti.pg.gda.pl/katedry/kiw/pracownicy/Jan.Daciuk/personal/fsa.html)

How can I create an input file for fsa_build? How it should be formated?

Thanks!
William

On 3/6/07, Marcin Miłkowski <[hidden email]> wrote:

>
> [hidden email] napisał(a):
>
> > You speak about "your solution".
> > What is it? Is it a morphological analysis tool or a grammar checker?
>
> Dictionary-based POS-tagger for LanguageTool, using finite-state
> automata format for storing data (one of the most efficient dictionary
> formats, in terms of speed and space). Most languages supported by LT
> use such dictionaries now.
>
> I use a combination of scripts to re-use 12dicts Word Lists and AGID
> files to get part of speech information, and then they clean it, add
> some entries I added manually, etc. The overall solution is quite hybrid
> but quite fast and efficient. Bugs are there but that's life.
>
> > Only for English or also for other languages? Where is it?
>
> This is a part of LanguageTool (Java version). All sources are in the
> CVS (look in resources/en). Two files should be downloaded separately
> (infl.txt and part-of-speech.txt from 12dicts and AGID), but it should
> be specified in the sources.
>
> We could of course release it separately if anyone else needs a nicely
> wrapped package instead of dirty CVS ;)
>
> Best,
> Marcin
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>