Initial Version Of Hunspell en_US and en_CA dictionaries Available

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Initial Version Of Hunspell en_US and en_CA dictionaries Available

Kevin Atkinson
An initial version of the official Hunspell en_US and en_CA dictionaries,
created from SCOWL, is now available.  You can find it at
http://wordlist.sourceforge.net/.  There you will find a zip file with a
README and dictionary data.  No additional metadata is provided, ie it is
not an extension or anything like that.  This version should work any with
any version of OpenOffice that used Hunspell (and maybe even older
versions which used MySpell, but this is completely untested).

I am going to leave it to someone else to create an OOo 3.x extension and
the like.  If you are interested in doing this please email me privately.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Mathias Bauer
Hi Kevin,

Kevin Atkinson wrote:

> An initial version of the official Hunspell en_US and en_CA dictionaries,
> created from SCOWL, is now available.  You can find it at
> http://wordlist.sourceforge.net/.  There you will find a zip file with a
> README and dictionary data.  No additional metadata is provided, ie it is
> not an extension or anything like that.  This version should work any with
> any version of OpenOffice that used Hunspell (and maybe even older
> versions which used MySpell, but this is completely untested).
>
> I am going to leave it to someone else to create an OOo 3.x extension and
> the like.  If you are interested in doing this please email me privately.

If you don't mind, I will take the files and update the files we have in
the OOo repository. So we will have a rebuilt version of the bundled
dictionary extension in 3.1.

I'm still waiting for a plan how to upload all pre-bundled dictionaries
to our repository. But that's the second step.

Regards,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
Please don't reply to "[hidden email]".
I use it for the OOo lists and only rarely read other mails sent to it.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Kevin Atkinson
On Fri, 5 Dec 2008, Mathias Bauer wrote:

> Kevin Atkinson wrote:
>
>> An initial version of the official Hunspell en_US and en_CA dictionaries,
>> created from SCOWL, is now available.  You can find it at
>> http://wordlist.sourceforge.net/.  There you will find a zip file with a
>> README and dictionary data.  No additional metadata is provided, ie it is
>> not an extension or anything like that.  This version should work any with
>> any version of OpenOffice that used Hunspell (and maybe even older
>> versions which used MySpell, but this is completely untested).
>>
>> I am going to leave it to someone else to create an OOo 3.x extension and
>> the like.  If you are interested in doing this please email me privately.
>
> If you don't mind, I will take the files and update the files we have in
> the OOo repository. So we will have a rebuilt version of the bundled
> dictionary extension in 3.1.
Yes please do.

Sometime in the near future Németh László will have another update, but
since his changes are not ready yet he told me to go ahead and release
what I have.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Németh László-2
Hi,

2008/12/5 Kevin Atkinson <[hidden email]>

> On Fri, 5 Dec 2008, Mathias Bauer wrote:
>
>  Kevin Atkinson wrote:
>>
>>  An initial version of the official Hunspell en_US and en_CA dictionaries,
>>> created from SCOWL, is now available.  You can find it at
>>> http://wordlist.sourceforge.net/.  There you will find a zip file with a
>>> README and dictionary data.  No additional metadata is provided, ie it is
>>> not an extension or anything like that.  This version should work any
>>> with
>>> any version of OpenOffice that used Hunspell (and maybe even older
>>> versions which used MySpell, but this is completely untested).
>>>
>>> I am going to leave it to someone else to create an OOo 3.x extension and
>>> the like.  If you are interested in doing this please email me privately.
>>>
>>
>> If you don't mind, I will take the files and update the files we have in
>> the OOo repository. So we will have a rebuilt version of the bundled
>> dictionary extension in 3.1.
>>
>
> Yes please do.
>
> Sometime in the near future Németh László will have another update, but
> since his changes are not ready yet he told me to go ahead and release what
> I have.


It seems, within a few days I will finish the morphological extension of the
en_US dictionary (converting the morphological data of Wordlist project).

> I'm still waiting for a plan how to upload all pre-bundled dictionaries
> to our repository. But that's the second step.

I think, it's possible to upload all official (accepted and preferred by NLP
projects) extensions from
http://extensions.services.openoffice.org/automatically with the
required database and updating mechanism/script in
the repository. Unfortunately, English (variants) have no NLP projects yet
(this was a topic of the last OOo conference). For American English, I will
make an official en_US extension maintained by Wordlist project (members).

Regards,
László



>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Mathias Bauer
Hi László,

Németh László wrote:

> It seems, within a few days I will finish the morphological extension of the
> en_US dictionary (converting the morphological data of Wordlist project).

Great!

>> I'm still waiting for a plan how to upload all pre-bundled dictionaries
>> to our repository. But that's the second step.
>
> I think, it's possible to upload all official (accepted and preferred by NLP
> projects) extensions from
> http://extensions.services.openoffice.org/automatically with the
> required database and updating mechanism/script in
> the repository.
Sorry, I was talking about how we can bring all dictionaries that are
committed to svn and are bundled with the different localized builds to
the extension repository. This needs some work to do.

> Unfortunately, English (variants) have no NLP projects yet
> (this was a topic of the last OOo conference). For American English, I will
> make an official en_US extension maintained by Wordlist project (members).

The OOo bundled dictionary does intentionally have only an "en"
dictionary, not a "en-US", "en-GB" etc. because they share hyphenation
and thesaurus. I would like to keep it that way and so I would like to
update the bundled dictionary with your new files. And of course in case
of updates I would also like to provide "en" dictionaries, not "en-US".
Does that make sense?

Regards,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
Please don't reply to "[hidden email]".
I use it for the OOo lists and only rarely read other mails sent to it.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Németh László-2
Hi Mathias,

2008/12/8 Mathias Bauer <[hidden email]>

> Hi László,
>
> Németh László wrote:
>
> > It seems, within a few days I will finish the morphological extension of
> the
> > en_US dictionary (converting the morphological data of Wordlist project).
>
> Great!


I will send it next day. (I had to fix some problems in the CWS, too.)


>> I'm still waiting for a plan how to upload all pre-bundled dictionaries
>> to our repository. But that's the second step.
>
> I think, it's possible to upload all official (accepted and preferred by
NLP
> projects) extensions from
> http://extensions.services.openoffice.org/automatically with the
> required database and updating mechanism/script in
> the repository.

> Sorry, I was talking about how we can bring all dictionaries that are
> committed to svn and are bundled with the different localized builds to
> the extension repository. This needs some work to do.


It seems, some bundled dictionary extensions are attached to the relevant
issues for verification, so they are already stored in an open repository.
They need an updated link or registration on extensions.services site yet.

>
>
> > Unfortunately, English (variants) have no NLP projects yet
> > (this was a topic of the last OOo conference). For American English, I
> will
> > make an official en_US extension maintained by Wordlist project
> (members).
>
> The OOo bundled dictionary does intentionally have only an "en"
> dictionary, not a "en-US", "en-GB" etc. because they share hyphenation
> and thesaurus. I would like to keep it that way and so I would like to
> update the bundled dictionary with your new files. And of course in case
> of updates I would also like to provide "en" dictionaries, not "en-US".
> Does that make sense?


No, it doesn't. But using common (i. e. British) hyphenation patterns is a
mistake, because American and British English have different hyphenation
rules, see http://www.tex.ac.uk/cgi-bin/texfaq2html?label=oddhyphen. There
is an improved en_US pattern file in the last hyphen distribution:
http://downloads.sourceforge.net/hunspell/hyphen-2.4.tar.gz, (also attached
here: http://www.openoffice.org/issues/show_bug.cgi?id=90028).

Regards,
László



> Regards,
> Mathias
>
> --
> Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
> OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
> Please don't reply to "[hidden email]".
> I use it for the OOo lists and only rarely read other mails sent to it.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Mathias Bauer
Németh László wrote:

> Hi Mathias,
>
> 2008/12/8 Mathias Bauer <[hidden email]>
>
>> Hi László,
>>
>> Németh László wrote:
>>
>> > It seems, within a few days I will finish the morphological extension of
>> the
>> > en_US dictionary (converting the morphological data of Wordlist project).
>>
>> Great!
>
>
> I will send it next day. (I had to fix some problems in the CWS, too.)
>
>
>>> I'm still waiting for a plan how to upload all pre-bundled dictionaries
>>> to our repository. But that's the second step.
>>
>> I think, it's possible to upload all official (accepted and preferred by
> NLP
>> projects) extensions from
>> http://extensions.services.openoffice.org/automatically with the
>> required database and updating mechanism/script in
>> the repository.
>
>> Sorry, I was talking about how we can bring all dictionaries that are
>> committed to svn and are bundled with the different localized builds to
>> the extension repository. This needs some work to do.
>
>
> It seems, some bundled dictionary extensions are attached to the relevant
> issues for verification, so they are already stored in an open repository.
> They need an updated link or registration on extensions.services site yet.
>
>>
>>
>> > Unfortunately, English (variants) have no NLP projects yet
>> > (this was a topic of the last OOo conference). For American English, I
>> will
>> > make an official en_US extension maintained by Wordlist project
>> (members).
>>
>> The OOo bundled dictionary does intentionally have only an "en"
>> dictionary, not a "en-US", "en-GB" etc. because they share hyphenation
>> and thesaurus. I would like to keep it that way and so I would like to
>> update the bundled dictionary with your new files. And of course in case
>> of updates I would also like to provide "en" dictionaries, not "en-US".
>> Does that make sense?
>
>
> No, it doesn't. But using common (i. e. British) hyphenation patterns is a
> mistake, because American and British English have different hyphenation
> rules, see http://www.tex.ac.uk/cgi-bin/texfaq2html?label=oddhyphen. There
> is an improved en_US pattern file in the last hyphen distribution:
> http://downloads.sourceforge.net/hunspell/hyphen-2.4.tar.gz, (also attached
> here: http://www.openoffice.org/issues/show_bug.cgi?id=90028).

Well, there are also canadian, australian and south african english that
we are serving with our extension. We have spelling dictionaries for
them, but no hyphenation dictionaries and no thesaurus. So my take on
that was: better use the en-US thesaurus and the en-GB hyphenation than
nothing!

So how can be serve the users best? We don't offer any other english
builds than en-US, so we have to bundle the other english dictionaries
with the en-US build also. Having 4 extensions except one would clutter
the extension manager UI and - as I said - wastes space. So for me
staying with one "english-all" dictionary extension is the best option.

Ciao,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
Please don't reply to "[hidden email]".
I use it for the OOo lists and only rarely read other mails sent to it.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Robert Black-2
My impression is also that an "all in one" English dictionaries approach is
a good way to handle this issue.  One extension in the extension manager is
much cleaner and less confusing for users.

However, I thought that there is nothing to stop there being two different
Hyphenation dictionaries in there: one for US only, one for other locales of
English.  I thought it is just a matter of making an extra entry in the
dictionaries.xcu file for this? I am looking at the OOov3.0.0 version to
make this conclusion, so maybe something has changed.

Robert

2008/12/11 Mathias Bauer <[hidden email]>

> Németh László wrote:
>
> > Hi Mathias,
> >
> > 2008/12/8 Mathias Bauer <[hidden email]>
> >
> >> Hi László,
> >>
> >> Németh László wrote:
> >>
> >> > It seems, within a few days I will finish the morphological extension
> of
> >> the
> >> > en_US dictionary (converting the morphological data of Wordlist
> project).
> >>
> >> Great!
> >
> >
> > I will send it next day. (I had to fix some problems in the CWS, too.)
> >
> >
> >>> I'm still waiting for a plan how to upload all pre-bundled dictionaries
> >>> to our repository. But that's the second step.
> >>
> >> I think, it's possible to upload all official (accepted and preferred by
> > NLP
> >> projects) extensions from
> >> http://extensions.services.openoffice.org/automatically with the
> >> required database and updating mechanism/script in
> >> the repository.
> >
> >> Sorry, I was talking about how we can bring all dictionaries that are
> >> committed to svn and are bundled with the different localized builds to
> >> the extension repository. This needs some work to do.
> >
> >
> > It seems, some bundled dictionary extensions are attached to the relevant
> > issues for verification, so they are already stored in an open
> repository.
> > They need an updated link or registration on extensions.services site
> yet.
> >
> >>
> >>
> >> > Unfortunately, English (variants) have no NLP projects yet
> >> > (this was a topic of the last OOo conference). For American English, I
> >> will
> >> > make an official en_US extension maintained by Wordlist project
> >> (members).
> >>
> >> The OOo bundled dictionary does intentionally have only an "en"
> >> dictionary, not a "en-US", "en-GB" etc. because they share hyphenation
> >> and thesaurus. I would like to keep it that way and so I would like to
> >> update the bundled dictionary with your new files. And of course in case
> >> of updates I would also like to provide "en" dictionaries, not "en-US".
> >> Does that make sense?
> >
> >
> > No, it doesn't. But using common (i. e. British) hyphenation patterns is
> a
> > mistake, because American and British English have different hyphenation
> > rules, see http://www.tex.ac.uk/cgi-bin/texfaq2html?label=oddhyphen.
> There
> > is an improved en_US pattern file in the last hyphen distribution:
> > http://downloads.sourceforge.net/hunspell/hyphen-2.4.tar.gz, (also
> attached
> > here: http://www.openoffice.org/issues/show_bug.cgi?id=90028).
>
> Well, there are also canadian, australian and south african english that
> we are serving with our extension. We have spelling dictionaries for
> them, but no hyphenation dictionaries and no thesaurus. So my take on
> that was: better use the en-US thesaurus and the en-GB hyphenation than
> nothing!
>
> So how can be serve the users best? We don't offer any other english
> builds than en-US, so we have to bundle the other english dictionaries
> with the en-US build also. Having 4 extensions except one would clutter
> the extension manager UI and - as I said - wastes space. So for me
> staying with one "english-all" dictionary extension is the best option.
>
> Ciao,
> Mathias
>
> --
> Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
> OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
> Please don't reply to "[hidden email]".
> I use it for the OOo lists and only rarely read other mails sent to it.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

thomas.lange
In reply to this post by Kevin Atkinson

Hello Robert

Robert Black wrote:

> My impression is also that an "all in one" English dictionaries approach is
> a good way to handle this issue.  One extension in the extension manager is
> much cleaner and less confusing for users.
>
> However, I thought that there is nothing to stop there being two different
> Hyphenation dictionaries in there: one for US only, one for other locales of
> English.  I thought it is just a matter of making an extra entry in the
> dictionaries.xcu file for this? I am looking at the OOov3.0.0 version to
> make this conclusion, so maybe something has changed.

Yes, that should work.
At least with OOo 3.0.1.
There is a fix added for a bug that may crash the Office if there was a
hyphenation dictionary supporting more than one locale.
See issue 94523.

With OOo 3.0.1 you should be fine with making two entries like

        <node oor:name="My_HyphDic_en_US" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/hyph_en_US.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_HYPH</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>en-US</value>
            </prop>
        </node>

and

        <node oor:name="My_HyphDic_en_other" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/hyph_en_US.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_HYPH</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>en-GB en-CA en-AU en-IE</value>
            </prop>
        </node>

But you need to explicitly list all locales for the second dictionary.
There is currently no way to specify sth. like
  - use it for all English locales but en-US

And please be sure to take note of the entry named "About node names for
the dictionaries" in the wiki:
http://wiki.services.openoffice.org/w/index.php?title=Extension_Dictionaries


Regards,
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

thomas.lange
In reply to this post by Kevin Atkinson

Thomas Lange - Sun Germany - ham02 - Hamburg wrote:

> Hello Robert
>
> Robert Black wrote:
>
>> My impression is also that an "all in one" English dictionaries approach is
>> a good way to handle this issue.  One extension in the extension manager is
>> much cleaner and less confusing for users.
>>
>> However, I thought that there is nothing to stop there being two different
>> Hyphenation dictionaries in there: one for US only, one for other locales of
>> English.  I thought it is just a matter of making an extra entry in the
>> dictionaries.xcu file for this? I am looking at the OOov3.0.0 version to
>> make this conclusion, so maybe something has changed.
>
> Yes, that should work.
> At least with OOo 3.0.1.
> There is a fix added for a bug that may crash the Office if there was a
> hyphenation dictionary supporting more than one locale.
> See issue 94523.
>
> With OOo 3.0.1 you should be fine with making two entries like
>
>         <node oor:name="My_HyphDic_en_US" oor:op="fuse">
>             <prop oor:name="Locations" oor:type="oor:string-list">
>                 <value>%origin%/hyph_en_US.dic</value>
>             </prop>
>             <prop oor:name="Format" oor:type="xs:string">
>                 <value>DICT_HYPH</value>
>             </prop>
>             <prop oor:name="Locales" oor:type="oor:string-list">
>                 <value>en-US</value>
>             </prop>
>         </node>
>
> and
>
>         <node oor:name="My_HyphDic_en_other" oor:op="fuse">
>             <prop oor:name="Locations" oor:type="oor:string-list">
>                 <value>%origin%/hyph_en_US.dic</value>
>             </prop>
>             <prop oor:name="Format" oor:type="xs:string">
>                 <value>DICT_HYPH</value>
>             </prop>
>             <prop oor:name="Locales" oor:type="oor:string-list">
>                 <value>en-GB en-CA en-AU en-IE</value>
>             </prop>
>         </node>


Err... the secon entry should of course have looked like

            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/hyph_en_other.dic</value>
            </prop>


>
> But you need to explicitly list all locales for the second dictionary.
> There is currently no way to specify sth. like
>   - use it for all English locales but en-US
>
> And please be sure to take note of the entry named "About node names for
> the dictionaries" in the wiki:
> http://wiki.services.openoffice.org/w/index.php?title=Extension_Dictionaries
>
>
> Regards,
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Robert Black-2
In reply to this post by thomas.lange
Thomas

I am glad you detailed that, as that is exactly how I thought it should
work.

Out of interest, the current standard English dictionary extension
hyphenation dictionary currently uses multiple locales (and the same for the
associated thesaurus).  So I am surprised to hear there is a bug associated
with this. From my own personal use I have never experienced any problem
with this. Although, I see from the Issue you mentioned that it involved the
Russian dictionary (which I never used).

From the standard English dictionary extension:

        <node oor:name="HyphDic_en-GB" oor:op="fuse">
            <prop oor:name="Locations" oor:type="oor:string-list">
                <value>%origin%/hyph_en_GB.dic</value>
            </prop>
            <prop oor:name="Format" oor:type="xs:string">
                <value>DICT_HYPH</value>
            </prop>
            <prop oor:name="Locales" oor:type="oor:string-list">
                <value>en-GB en-US en-ZA</value>
            </prop>
        </node>

Regards
Robert Black


2008/12/11 Thomas Lange - Sun Germany - ham02 - Hamburg <
[hidden email]>

>
> Hello Robert
>
> Robert Black wrote:
>
> > My impression is also that an "all in one" English dictionaries approach
> is
> > a good way to handle this issue.  One extension in the extension manager
> is
> > much cleaner and less confusing for users.
> >
> > However, I thought that there is nothing to stop there being two
> different
> > Hyphenation dictionaries in there: one for US only, one for other locales
> of
> > English.  I thought it is just a matter of making an extra entry in the
> > dictionaries.xcu file for this? I am looking at the OOov3.0.0 version to
> > make this conclusion, so maybe something has changed.
>
> Yes, that should work.
> At least with OOo 3.0.1.
> There is a fix added for a bug that may crash the Office if there was a
> hyphenation dictionary supporting more than one locale.
> See issue 94523.
>
> With OOo 3.0.1 you should be fine with making two entries like
>
>        <node oor:name="My_HyphDic_en_US" oor:op="fuse">
>            <prop oor:name="Locations" oor:type="oor:string-list">
>                <value>%origin%/hyph_en_US.dic</value>
>            </prop>
>            <prop oor:name="Format" oor:type="xs:string">
>                <value>DICT_HYPH</value>
>            </prop>
>            <prop oor:name="Locales" oor:type="oor:string-list">
>                <value>en-US</value>
>            </prop>
>        </node>
>
> and
>
>        <node oor:name="My_HyphDic_en_other" oor:op="fuse">
>            <prop oor:name="Locations" oor:type="oor:string-list">
>                <value>%origin%/hyph_en_US.dic</value>
>            </prop>
>            <prop oor:name="Format" oor:type="xs:string">
>                <value>DICT_HYPH</value>
>            </prop>
>            <prop oor:name="Locales" oor:type="oor:string-list">
>                <value>en-GB en-CA en-AU en-IE</value>
>            </prop>
>        </node>
>
> But you need to explicitly list all locales for the second dictionary.
> There is currently no way to specify sth. like
>  - use it for all English locales but en-US
>
> And please be sure to take note of the entry named "About node names for
> the dictionaries" in the wiki:
>
> http://wiki.services.openoffice.org/w/index.php?title=Extension_Dictionaries
>
>
> Regards,
> Thomas
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Németh László-2
In reply to this post by Németh László-2
Hi,

2008/12/9 Németh László <[hidden email]>

> > It seems, within a few days I will finish the morphological extension of
> the> en_US dictionary (converting the morphological data of Wordlist
> project).
>

Dictionaries (based on Wordlist Hunspell dictionaries version 2008-12-05):
http://www.openoffice.org/nonav/issues/showattachment.cgi/58734/hunspell-en-morph-20081212.zip

 Screenshot:
http://www.openoffice.org/nonav/issues/showattachment.cgi/58739/spacemen.png(suggestions
for "astronauts": spacemen, cosmonauts, travelers (note: mostly
British "traveller" and its plural form aren't there in the en_US spelling
and morphological dictionary))

(Issue: http://www.openoffice.org/issues/show_bug.cgi?id=19563)

Linux, Windows OpenOffice.org test builds for morphological dictionary
developments: ftp://ftp.fsf.hu/OpenOffice.org_hu/devel/

en_US Hyphenation dictionary:
http://www.openoffice.org/nonav/issues/showattachment.cgi/54640/hyph_en_US.dic

README:
http://www.openoffice.org/nonav/issues/showattachment.cgi/58744/README_hyph_en_US.txt

Issue: http://www.openoffice.org/issues/show_bug.cgi?id=90028

Sorry for this short list, I will write an introduction to the new features
of spell checking, hyphenation and thesaurus handling, too.

Regards,
László
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

thomas.lange
In reply to this post by Kevin Atkinson

Hello Robert,

> I am glad you detailed that, as that is exactly how I thought it should
> work.
>
> Out of interest, the current standard English dictionary extension
> hyphenation dictionary currently uses multiple locales (and the same for the
> associated thesaurus).  So I am surprised to hear there is a bug associated
> with this. From my own personal use I have never experienced any problem
> with this. Although, I see from the Issue you mentioned that it involved the
> Russian dictionary (which I never used).
>
> From the standard English dictionary extension:
>
>         <node oor:name="HyphDic_en-GB" oor:op="fuse">
>             <prop oor:name="Locations" oor:type="oor:string-list">
>                 <value>%origin%/hyph_en_GB.dic</value>
>             </prop>
>             <prop oor:name="Format" oor:type="xs:string">
>                 <value>DICT_HYPH</value>
>             </prop>
>             <prop oor:name="Locales" oor:type="oor:string-list">
>                 <value>en-GB en-US en-ZA</value>
>             </prop>
>         </node>

Well, as with any index out of bounds accesses the result may crash or
work depending on the memory content. Also some operating systems are
more accepting to that kind of error (which from my point of view is
troublesome. A nice clean crash would be the best to find and address
such issues).
Also it might just be the case that without the fix in the end only the
first hyphenation dictionary is available for use and that should then
be fine without any crash potential. (I would need to look more closely
into the old code to see about that one.)

So if you write such dictionaries for multiple language to be used with
OOo 3.0.1 you may want to actually cross check that all the languages
are available and you get the results you are expecting for each language.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Robert Black-2
Thomas,

Totally agree with what you said about index out of bounds crashes in
regards to operating systems. Hiding these crashes is a terrible idea. I
didn't see any detail on the issue to know it was this kind of crash,
because the summary just says "Fix for stacktrace 9383", but I will take
your word on it.  :)

I am not really involved in making any dictionaries for OOo. I merely helped
out the author of the Australian dictionary recently with some technical
issues. As far as I know, Mathias is handling this, although I would be
happy to help if required.

In fact, I am looking to contribute as a developer for OOo but am not sure
where I should begin. I am a moderately experienced C+ programmer. I have
worked on remotely distributed development environments commercially. As I
mentioned above, I have become reasonably familiar with the dictionaries
from the extension point of view, but not yet seen any of the code base. I
was hoping to just single out a few very simple bugs to get started. From a
user perspective I mainly use Writer and Calc. Could you give me any advice
on where to start?

Thank you.

Regards
Robert Black


2008/12/12 Thomas Lange - Sun Germany - ham02 - Hamburg <
[hidden email]>

>
> Hello Robert,
>
> > I am glad you detailed that, as that is exactly how I thought it should
> > work.
> >
> > Out of interest, the current standard English dictionary extension
> > hyphenation dictionary currently uses multiple locales (and the same for
> the
> > associated thesaurus).  So I am surprised to hear there is a bug
> associated
> > with this. From my own personal use I have never experienced any problem
> > with this. Although, I see from the Issue you mentioned that it involved
> the
> > Russian dictionary (which I never used).
> >
> > From the standard English dictionary extension:
> >
> >         <node oor:name="HyphDic_en-GB" oor:op="fuse">
> >             <prop oor:name="Locations" oor:type="oor:string-list">
> >                 <value>%origin%/hyph_en_GB.dic</value>
> >             </prop>
> >             <prop oor:name="Format" oor:type="xs:string">
> >                 <value>DICT_HYPH</value>
> >             </prop>
> >             <prop oor:name="Locales" oor:type="oor:string-list">
> >                 <value>en-GB en-US en-ZA</value>
> >             </prop>
> >         </node>
>
> Well, as with any index out of bounds accesses the result may crash or
> work depending on the memory content. Also some operating systems are
> more accepting to that kind of error (which from my point of view is
> troublesome. A nice clean crash would be the best to find and address
> such issues).
> Also it might just be the case that without the fix in the end only the
> first hyphenation dictionary is available for use and that should then
> be fine without any crash potential. (I would need to look more closely
> into the old code to see about that one.)
>
> So if you write such dictionaries for multiple language to be used with
> OOo 3.0.1 you may want to actually cross check that all the languages
> are available and you get the results you are expecting for each language.
>
>
> Regards,
> Thomas
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Németh László-2
In reply to this post by Németh László-2
2008/12/12 Németh László <[hidden email]>

> Hi,
>
> 2008/12/9 Németh László <[hidden email]>
>
>> > It seems, within a few days I will finish the morphological extension of
>> the> en_US dictionary (converting the morphological data of Wordlist
>> project).
>>
>
> Dictionaries (based on Wordlist Hunspell dictionaries version 2008-12-05):
> http://www.openoffice.org/nonav/issues/showattachment.cgi/58734/hunspell-en-morph-20081212.zip
>


A new release with fixed morphological codes of the comparative affixes:

http://www.openoffice.org/nonav/issues/showattachment.cgi/58765/hunspell-en-morph-20081212v2.zip


>
>  Screenshot:
> http://www.openoffice.org/nonav/issues/showattachment.cgi/58739/spacemen.png(suggestions for "astronauts": spacemen, cosmonauts, travelers (note: mostly
> British "traveller" and its plural form aren't there in the en_US spelling
> and morphological dictionary))
>
> (Issue: http://www.openoffice.org/issues/show_bug.cgi?id=19563)
>
> Linux, Windows OpenOffice.org test builds for morphological dictionary
> developments: ftp://ftp.fsf.hu/OpenOffice.org_hu/devel/
>
> en_US Hyphenation dictionary:
> http://www.openoffice.org/nonav/issues/showattachment.cgi/54640/hyph_en_US.dic
>
> README:
> http://www.openoffice.org/nonav/issues/showattachment.cgi/58744/README_hyph_en_US.txt
>
> Issue: http://www.openoffice.org/issues/show_bug.cgi?id=90028
>
> Sorry for this short list, I will write an introduction to the new features
> of spell checking, hyphenation and thesaurus handling, too.
>
> Regards,
> László
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

Németh László-2
In reply to this post by thomas.lange
Hi,

I have attached the new version of the hyphenation and spelling dictionaries
to the Issue 97403 (http://qa.openoffice.org/issues/show_bug.cgi?id=97403).

Changes of the hyphenation dictionaries:

- The English hyphenation dictionaries will use the correct RIGHTHYPHENMIN=3
instead of the recent RIGHTHYPHENMIN=2 (after the integration of the new
hyphenator in CWS hunspell4thesaurus)
- also the new patterns forbid the bad hyphenations of the words with
apostrophe (*can='t, *abaser='s, *o'c=lock etc.)
- \hyphenation section of the original TeX British English hyphenation
pattern were also converted (now how-ever, through-out etc. are hyphenated
by OOo in British English texts, like by TeX).

Regards,
László

2008/12/11 Thomas Lange - Sun Germany - ham02 - Hamburg <
[hidden email]>

>
> Thomas Lange - Sun Germany - ham02 - Hamburg wrote:
>
> > Hello Robert
> >
> > Robert Black wrote:
> >
> >> My impression is also that an "all in one" English dictionaries approach
> is
> >> a good way to handle this issue.  One extension in the extension manager
> is
> >> much cleaner and less confusing for users.
> >>
> >> However, I thought that there is nothing to stop there being two
> different
> >> Hyphenation dictionaries in there: one for US only, one for other
> locales of
> >> English.  I thought it is just a matter of making an extra entry in the
> >> dictionaries.xcu file for this? I am looking at the OOov3.0.0 version to
> >> make this conclusion, so maybe something has changed.
> >
> > Yes, that should work.
> > At least with OOo 3.0.1.
> > There is a fix added for a bug that may crash the Office if there was a
> > hyphenation dictionary supporting more than one locale.
> > See issue 94523.
> >
> > With OOo 3.0.1 you should be fine with making two entries like
> >
> >         <node oor:name="My_HyphDic_en_US" oor:op="fuse">
> >             <prop oor:name="Locations" oor:type="oor:string-list">
> >                 <value>%origin%/hyph_en_US.dic</value>
> >             </prop>
> >             <prop oor:name="Format" oor:type="xs:string">
> >                 <value>DICT_HYPH</value>
> >             </prop>
> >             <prop oor:name="Locales" oor:type="oor:string-list">
> >                 <value>en-US</value>
> >             </prop>
> >         </node>
> >
> > and
> >
> >         <node oor:name="My_HyphDic_en_other" oor:op="fuse">
> >             <prop oor:name="Locations" oor:type="oor:string-list">
> >                 <value>%origin%/hyph_en_US.dic</value>
> >             </prop>
> >             <prop oor:name="Format" oor:type="xs:string">
> >                 <value>DICT_HYPH</value>
> >             </prop>
> >             <prop oor:name="Locales" oor:type="oor:string-list">
> >                 <value>en-GB en-CA en-AU en-IE</value>
> >             </prop>
> >         </node>
>
>
> Err... the secon entry should of course have looked like
>
>            <prop oor:name="Locations" oor:type="oor:string-list">
>                 <value>%origin%/hyph_en_other.dic</value>
>            </prop>
>
>
> >
> > But you need to explicitly list all locales for the second dictionary.
> > There is currently no way to specify sth. like
> >   - use it for all English locales but en-US
> >
> > And please be sure to take note of the entry named "About node names for
> > the dictionaries" in the wiki:
> >
> http://wiki.services.openoffice.org/w/index.php?title=Extension_Dictionaries
> >
> >
> > Regards,
> > Thomas
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Initial Version Of Hunspell en_US and en_CA dictionaries Available

thomas.lange
In reply to this post by Kevin Atkinson

Hi Robert,

Are you still interested in working in the area of dictionaries?

It took us a while to come up with an idea related to dictionaries that
we like to change and which might be interesting to you:

We like to change our user-dictionary format to use the same as Hunspell
dictionaries. This would roughly require the following:
- Hunspell needs to have support for exception dictionaries with
  suggestions implemented
- We need a language 'ALL' for Hunspell dictionaries since our
  user-dictionaries are capable of that
- For the hyphenation information in the user-dictionaries we need a
  separate file to keep those.
- the related code in OOo needs to be changed to make proper use of
  the new dictionary format and of the new hyphenation specific
  user-dictionaries

The first two tasks will probably require that you get in touch with
Lazlo Nemeth, the owner and maintainer of Hunspell (a OpenSource project
itself) to see if and how exception dictionaries can be be implemented.

If you are still interested please drop me a note and I will ask Lazlo
if he has the time to get you introduced in the respective parts of
Hunspell.


Regards,
Thomas



Robert Black wrote:

> Thomas,
>
> Totally agree with what you said about index out of bounds crashes in
> regards to operating systems. Hiding these crashes is a terrible idea. I
> didn't see any detail on the issue to know it was this kind of crash,
> because the summary just says "Fix for stacktrace 9383", but I will take
> your word on it.  :)
>
> I am not really involved in making any dictionaries for OOo. I merely helped
> out the author of the Australian dictionary recently with some technical
> issues. As far as I know, Mathias is handling this, although I would be
> happy to help if required.
>
> In fact, I am looking to contribute as a developer for OOo but am not sure
> where I should begin. I am a moderately experienced C+ programmer. I have
> worked on remotely distributed development environments commercially. As I
> mentioned above, I have become reasonably familiar with the dictionaries
> from the extension point of view, but not yet seen any of the code base. I
> was hoping to just single out a few very simple bugs to get started. From a
> user perspective I mainly use Writer and Calc. Could you give me any advice
> on where to start?
>
> Thank you.
>
> Regards
> Robert Black
>
>
> 2008/12/12 Thomas Lange - Sun Germany - ham02 - Hamburg <
> [hidden email]>
>
>>
>> Hello Robert,
>>
>> > I am glad you detailed that, as that is exactly how I thought it should
>> > work.
>> >
>> > Out of interest, the current standard English dictionary extension
>> > hyphenation dictionary currently uses multiple locales (and the same for
>> the
>> > associated thesaurus).  So I am surprised to hear there is a bug
>> associated
>> > with this. From my own personal use I have never experienced any problem
>> > with this. Although, I see from the Issue you mentioned that it involved
>> the
>> > Russian dictionary (which I never used).
>> >
>> > From the standard English dictionary extension:
>> >
>> >         <node oor:name="HyphDic_en-GB" oor:op="fuse">
>> >             <prop oor:name="Locations" oor:type="oor:string-list">
>> >                 <value>%origin%/hyph_en_GB.dic</value>
>> >             </prop>
>> >             <prop oor:name="Format" oor:type="xs:string">
>> >                 <value>DICT_HYPH</value>
>> >             </prop>
>> >             <prop oor:name="Locales" oor:type="oor:string-list">
>> >                 <value>en-GB en-US en-ZA</value>
>> >             </prop>
>> >         </node>
>>
>> Well, as with any index out of bounds accesses the result may crash or
>> work depending on the memory content. Also some operating systems are
>> more accepting to that kind of error (which from my point of view is
>> troublesome. A nice clean crash would be the best to find and address
>> such issues).
>> Also it might just be the case that without the fix in the end only the
>> first hyphenation dictionary is available for use and that should then
>> be fine without any crash potential. (I would need to look more closely
>> into the old code to see about that one.)
>>
>> So if you write such dictionaries for multiple language to be used with
>> OOo 3.0.1 you may want to actually cross check that all the languages
>> are available and you get the results you are expecting for each language.
>>
>>
>> Regards,
>> Thomas
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]