Lingucomponent Sub-Project: Grammar Checking

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Lingucomponent Sub-Project: Grammar Checking

Lindie
 
 
Hi My user name is PumkinPie
 
 
I am a 40 something stay at home mom with a lot of time on my hands.  I am from South Africa, but currently live in Abu Dhabi (U.A.E.).  My home language is English but I am fluent in Afrikaans as well.
 
I would like to get involved with your project if you still have something to delegate.
 
Thank you
 
Lindie
 
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Lingucomponent Sub-Project: Grammar Checking

thomas.lange

Hi Lindie,

> Hi My user name is PumkinPie
>  
>  
> I am a 40 something stay at home mom with a lot of time on my hands.  I
> am from South Africa, but currently live in Abu Dhabi (U.A.E.).  My home
> language is English but I am fluent in Afrikaans as well.
>  
> I would like to get involved with your project if you still have
> something to delegate.

Well if you do not want to develop a grammar checker on your own please
have a look at LanguageTool.
http://extensions.services.openoffice.org/node/2119

If you open the options page of LanguageTool (LT) (I think it was under
'Tools/Language Tools') you can see a large number of rules per language.
If you can come up with a set of rules that would work with Afrikaans it
might be possible to integrate them in LT.

But first ask Marcin Miłkowski (see CC) about that. Maybe I have a
conceptual misunderstanding of LT.

Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lingucomponent Sub-Project: Grammar Checking

Carlos Menezes
Dear all,

CoGrOO is another option, but it uses some layers of statistical natural
language processing, such as part-of-speech tagging, chunker, syntactic
parser. These modules have to be trained with annotated corpus, that is a
bit difficult to get (rare, expensive, ..).
Regards,

Carlos Menezes


2009/3/4 Thomas Lange - Sun Germany - ham02 - Hamburg <[hidden email]>

>
> Hi Lindie,
>
> > Hi My user name is PumkinPie
> >
> >
> > I am a 40 something stay at home mom with a lot of time on my hands.  I
> > am from South Africa, but currently live in Abu Dhabi (U.A.E.).  My home
> > language is English but I am fluent in Afrikaans as well.
> >
> > I would like to get involved with your project if you still have
> > something to delegate.
>
> Well if you do not want to develop a grammar checker on your own please
> have a look at LanguageTool.
> http://extensions.services.openoffice.org/node/2119
>
> If you open the options page of LanguageTool (LT) (I think it was under
> 'Tools/Language Tools') you can see a large number of rules per language.
> If you can come up with a set of rules that would work with Afrikaans it
> might be possible to integrate them in LT.
>
> But first ask Marcin Miłkowski (see CC) about that. Maybe I have a
> conceptual misunderstanding of LT.
>
> Regards,
> Thomas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Loading one dictionary for different language variants

Simon Brouwer
Hi all,

The official spelling of Dutch is the same for the Netherlands and for
Belgium, so we have a single dictionary for Dutch. In OpenOffice.org 2,
I had seen that if you install the same dictionary for nl_NL and for
nl_BE it is loaded in memory twice. For this reason we have the
dictionary installed for nl_NL only. This is however awkward for our
Belgian users, so we're considering to change this.
Can anyone here tell me if installing the same dictionary for multiple
language variant the drawbacks are significant, such as longer startup
time and/or slower response? In this case, does OpenOffice.org 3
actually still load the dictionary into memory more than once?

--
Vriendelijke groet,
Simon Brouwer.

| http://nl.openoffice.org | http://www.opentaal.org |


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lingucomponent Sub-Project: Grammar Checking

Marcin Miłkowski
In reply to this post by thomas.lange
Thomas Lange - Sun Germany - ham02 - Hamburg pisze:

> Hi Lindie,
>
>> Hi My user name is PumkinPie
>>  
>>  
>> I am a 40 something stay at home mom with a lot of time on my hands.  I
>> am from South Africa, but currently live in Abu Dhabi (U.A.E.).  My home
>> language is English but I am fluent in Afrikaans as well.
>>  
>> I would like to get involved with your project if you still have
>> something to delegate.
>
> Well if you do not want to develop a grammar checker on your own please
> have a look at LanguageTool.
> http://extensions.services.openoffice.org/node/2119
>
> If you open the options page of LanguageTool (LT) (I think it was under
> 'Tools/Language Tools') you can see a large number of rules per language.
> If you can come up with a set of rules that would work with Afrikaans it
> might be possible to integrate them in LT.
>
> But first ask Marcin Miłkowski (see CC) about that. Maybe I have a
> conceptual misunderstanding of LT.

No, you don't :)

Basically, to have rules for Afrikaans, you can start quite easily. The
process is described on the page:

http://www.languagetool.org/development/#newlanguage

The description is probably a bit too technical and not really true -
you don't have to be a programmer to write rules. And, what's most
important, easy rules (like common typos in a context, e.g., "Their is a
house" instead of "There is a house") do not require any expensive
resource that Carlos mentioned. You can start by copying some English
rules to a new grammar.xml file (the instruction about the syntax is here:

http://www.languagetool.org/development/#xmlrules

but mostly on the wiki:

http://languagetool.wikidot.com/

Currently, the nex file must be named in the format "rules-xx-name.xml",
where "xx" is the two-character language code and "name" is the full
name of the language in English. Example: rules-en-English.xml.

Then you can run the LanguageTool standalone version, or via WebStart,
that is by clicking this link:

http://www.languagetool.org/webstart/LanguageTool.jnlp

Click File > Open rule file. Click "Add" to open a new rule file and see
if it works.

We will be updating the wiki with instructions soon. Adding new language
is not yet very easy, but not as hard as it might seem. Contributing to
existing languages is a lot easier however...

Regards
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading one dictionary for different language variants

Olivier R.-2
In reply to this post by Simon Brouwer
Hi Simon,

Simon Brouwer a écrit :

> The official spelling of Dutch is the same for the Netherlands and for
> Belgium, so we have a single dictionary for Dutch. In OpenOffice.org 2,
> I had seen that if you install the same dictionary for nl_NL and for
> nl_BE it is loaded in memory twice. For this reason we have the
> dictionary installed for nl_NL only. This is however awkward for our
> Belgian users, so we're considering to change this.
> Can anyone here tell me if installing the same dictionary for multiple
> language variant the drawbacks are significant, such as longer startup
> time and/or slower response? In this case, does OpenOffice.org 3
> actually still load the dictionary into memory more than once?

In the French dictionary extension, the dictionaries (spelling,
thesaurus and hyphenation) are assigned to six localizations:
France, Belgium, Switzerland, Canada, Monaco, Luxembourg

In the dictionaries.xcu file, there is:

<prop oor:name="Locales" oor:type="oor:string-list">
     <value>fr-FR fr-BE fr-CA fr-CH fr-MC fr-LU</value>
</prop>

I don't think that means the 3 dictianaries are loaded 6 times each.

Regards,
Olivier

--

== N'écrivez pas à cette adresse. Réservée aux listes de discussion. ==
** Do not reply at this address. Mailing-list only. **

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading one dictionary for different language variants

thomas.lange
In reply to this post by Simon Brouwer

Hi all,

> Hi all,
>
> The official spelling of Dutch is the same for the Netherlands and for
> Belgium, so we have a single dictionary for Dutch. In OpenOffice.org 2,
> I had seen that if you install the same dictionary for nl_NL and for
> nl_BE it is loaded in memory twice. For this reason we have the
> dictionary installed for nl_NL only. This is however awkward for our
> Belgian users, so we're considering to change this.
> Can anyone here tell me if installing the same dictionary for multiple
> language variant the drawbacks are significant, such as longer startup
> time and/or slower response? In this case, does OpenOffice.org 3
> actually still load the dictionary into memory more than once?
>


Actually the current implementation in
lingucomponent\source\spellcheck\spell\sspellimp.cxx is not optimized
since a new Hunspell object is created for each locale once that locale
is actually used first for spelling.
This if your extension implements a single dictionary that supports more
than one locale e.g. nl-BE and nl-NL the content will be read twice into
the memory when you are actually using both languages for spelling.
As long as you only use one of those languages the actual content will
only be read once.

However for the time being (as long as not being told differently) I'll
think it is usually a valid assumption that a single user will only
write in one language/locale. And if he were to use two or more than the
other ones will most likely be completely differnt, e.g. one may use
nl-NL and en-US along with fr-FR. But I would guess it is pretty
unlikely that the same user is going to use nl-NL AND nl-BE.
Thus I'm thinking the missing optimization is a venial sin.
Or am I mistaken with that?


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading one dictionary for different language variants

thomas.lange
In reply to this post by Simon Brouwer

Hi again,

> Hi all,
>
> The official spelling of Dutch is the same for the Netherlands and for
> Belgium, so we have a single dictionary for Dutch. In OpenOffice.org 2,
> I had seen that if you install the same dictionary for nl_NL and for
> nl_BE it is loaded in memory twice. For this reason we have the
> dictionary installed for nl_NL only. This is however awkward for our
> Belgian users, so we're considering to change this.
> Can anyone here tell me if installing the same dictionary for multiple
> language variant the drawbacks are significant, such as longer startup
> time and/or slower response? In this case, does OpenOffice.org 3
> actually still load the dictionary into memory more than once?
>


If forgot:
For the drawbacks: there is none. (Aside from that you will need twice
the memory if you are actually going to use both languages.)

The slower response time was an issue with the context menu for
misspelled words in older office versions. Since (I think it was up
until around in OOo 2.1) we used to check the word with each installed
dictionary at that point. And thus having many of them installed would
slow down the process and most or all of the dictionaries got loaded
into memory. But that has long since changed as well. It should be
better since 2.4 I think.


Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Loading one dictionary for different language variants

Simon Brouwer
Hi Thomas, all,

Thomas Lange - Sun Germany - ham02 - Hamburg schreef:

> Hi again,
>
>  
>> Hi all,
>>
>> The official spelling of Dutch is the same for the Netherlands and for
>> Belgium, so we have a single dictionary for Dutch. In OpenOffice.org 2,
>> I had seen that if you install the same dictionary for nl_NL and for
>> nl_BE it is loaded in memory twice. For this reason we have the
>> dictionary installed for nl_NL only. This is however awkward for our
>> Belgian users, so we're considering to change this.
>> Can anyone here tell me if installing the same dictionary for multiple
>> language variant the drawbacks are significant, such as longer startup
>> time and/or slower response? In this case, does OpenOffice.org 3
>> actually still load the dictionary into memory more than once?
>>
>>    
>
>
> If forgot:
> For the drawbacks: there is none. (Aside from that you will need twice
> the memory if you are actually going to use both languages.)
>
> The slower response time was an issue with the context menu for
> misspelled words in older office versions. Since (I think it was up
> until around in OOo 2.1) we used to check the word with each installed
> dictionary at that point. And thus having many of them installed would
> slow down the process and most or all of the dictionaries got loaded
> into memory. But that has long since changed as well. It should be
> better since 2.4 I think.
>  
Thanks for the clarification. We will change it then so that the
dictionary installs for both nl_NL and nl_BE.

This means that both the dictionary included in the (Dutch) build, and
its copy in the extension repository, must be updated. Should I open
separate issues for this? To whom should I assign them?

--
Vriendelijke groet,
Simon Brouwer.

| http://nl.openoffice.org | http://www.opentaal.org |


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]