Dictionary update process

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Dictionary update process

Lars Aronsson

This concerns mostly the QA and distribution chain, but it has
implications for the linguistic components as well.  As designers
of spell checkers and other language tools, we need to see the
whole environment in which they operate.

In December a new Swedish dictionary was produced and ready for
inclusion in OOo.  After much waiting, possibly misunderstandings,
and reminders, it has now been scheduled (by "mh" [1]) for
inclusion in OOo 2.3.  There are no licensing problems.  The only
problem is that the previous Swedish dictionary (from 2003,
actually very close to the Swedish ispell dictionary from 1997) is
very poor, so the update is much needed.  In the timespan from
December both version 2.1 and 2.2 of OOo have been missed.

Can we now be sure that 2.3 will come out with the new dictionary?  
Can something go wrong, that delays this improvement further?

How can we improve this release process so that future updates of
the dictionary is handled faster?

Is there still a chance to have the new dictionary included
already in version 2.2?

For users of operating systems with automatic updates (e.g. Ubuntu
Linux), is there any way that OOo can give priority to such
updates?  Right now I have no idea how the OOo-Ubuntu connection
works.  My own standard Ubuntu installation still offers OOo 2.0.4
and I have no clue when they are going to offer 2.1.

When I'm lost in the dark without information, I take pleasure in
conspiracy theories.  This time I added a note to the bugzilla
that Sun Sweden has an interest in delaying the improvement of
OpenOffice.  This is of course mere speculation, for which I have
no proof.  It is however a fact that they market StarOffice in
Swedish with the argument [2] that it has a professional grade
spell checker, which OpenOffice so far hasn't had.  And it is a
fact that this issue [1] was finally pushed forward only after my
mentioning of Scott McNealy's name.  (That's a useful name.)

[1] http://qa.openoffice.org/issues/show_bug.cgi?id=62268
[2] http://se.sun.com/press/feature_stories/2005/051004/sida2.html

In the future, I think we need to push out new dictionaries more
often, e.g. including new names from politics, news and media. If
there is going to be a 3 month delay for every update, perhaps the
spell checker should be redesigned so that it doesn't only rely on
the dictionary in the distribution, but also checks a live server
for today's updates.  Such updates could even be built as a
subscription service.  It might also become important that other
softwares (e.g. the Firefox browser) use the same spell checker
and that they use a common spell checking daemon on the local
computer.

The above should also apply to thesauri and grammar checkers.


--
  Lars Aronsson ([hidden email])
  Aronsson Datateknik - http://aronsson.se

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

nemeth-2
Hi,

First, I am happy, that you have made a much
better Swedish dictionary with Hunspell's compound handling.
I plan to implement a new feature in Hunspell to handle triple consonants in
special Swedish and Norwegian compounding (also with hyphenation).

The main problem is that we have no ``core'' developers
for Lingucomponent with CWS and Issue Tracker access.
Especially not official builds (like Swedish?) have
lack of maintenance.

I will update the English dictionaries next month,
and I hope I will be able to add Swedish dictionary, too.
But it would be much better if the maintainer of the Swedish build or you
could do it.

Now I added the dictionary to the Dictionary Wiki page:
http://wiki.services.openoffice.org/wiki/Dictionaries#Swedish_.28Sweden.29

With a little effort, you can make a pack file (see the last Swedish Pack file),
too. DicOOo maintainers can publish the pack file online, simply write a letter
to Laurent Godard or Daniel Naber.

Regards,

Laci


Quoting Lars Aronsson <[hidden email]>:

>
> This concerns mostly the QA and distribution chain, but it has
> implications for the linguistic components as well.  As designers
> of spell checkers and other language tools, we need to see the
> whole environment in which they operate.
>
> In December a new Swedish dictionary was produced and ready for
> inclusion in OOo.  After much waiting, possibly misunderstandings,
> and reminders, it has now been scheduled (by "mh" [1]) for
> inclusion in OOo 2.3.  There are no licensing problems.  The only
> problem is that the previous Swedish dictionary (from 2003,
> actually very close to the Swedish ispell dictionary from 1997) is
> very poor, so the update is much needed.  In the timespan from
> December both version 2.1 and 2.2 of OOo have been missed.
>
> Can we now be sure that 2.3 will come out with the new dictionary?
> Can something go wrong, that delays this improvement further?
>
> How can we improve this release process so that future updates of
> the dictionary is handled faster?
>
> Is there still a chance to have the new dictionary included
> already in version 2.2?
>
> For users of operating systems with automatic updates (e.g. Ubuntu
> Linux), is there any way that OOo can give priority to such
> updates?  Right now I have no idea how the OOo-Ubuntu connection
> works.  My own standard Ubuntu installation still offers OOo 2.0.4
> and I have no clue when they are going to offer 2.1.
>
> When I'm lost in the dark without information, I take pleasure in
> conspiracy theories.  This time I added a note to the bugzilla
> that Sun Sweden has an interest in delaying the improvement of
> OpenOffice.  This is of course mere speculation, for which I have
> no proof.  It is however a fact that they market StarOffice in
> Swedish with the argument [2] that it has a professional grade
> spell checker, which OpenOffice so far hasn't had.  And it is a
> fact that this issue [1] was finally pushed forward only after my
> mentioning of Scott McNealy's name.  (That's a useful name.)
>
> [1] http://qa.openoffice.org/issues/show_bug.cgi?id=62268
> [2] http://se.sun.com/press/feature_stories/2005/051004/sida2.html
>
> In the future, I think we need to push out new dictionaries more
> often, e.g. including new names from politics, news and media. If
> there is going to be a 3 month delay for every update, perhaps the
> spell checker should be redesigned so that it doesn't only rely on
> the dictionary in the distribution, but also checks a live server
> for today's updates.  Such updates could even be built as a
> subscription service.  It might also become important that other
> softwares (e.g. the Firefox browser) use the same spell checker
> and that they use a common spell checking daemon on the local
> computer.
>
> The above should also apply to thesauri and grammar checkers.
>
>
> --
>   Lars Aronsson ([hidden email])
>   Aronsson Datateknik - http://aronsson.se
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

ge-7
In reply to this post by Lars Aronsson
Lars Aronsson <[hidden email]> wrote:
> In the future, I think we need to push out new dictionaries more
> often, e.g. including new names from politics, news and media.

Unfortunately you completely misunderstand, what dictionaries
for languages are for. Your restless, nervous acivities are rather
counterproductive.

-eleonora


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

Laurent Godard-3
In reply to this post by Lars Aronsson
Hi

Just ask here for inclusion in DicOOo repository if ready
if no response of myself, it is because of time, so re-ask

if your spellchecker is ready (.dic, .aff, readme+licence in a zip
file), tell me i'll do it

No conspiracy here, just i did not know/notice your needs

Moreover, there are 2 different things
1 is inclusion in OOo by default, ant thsi is what will happen in 2.3
The otehr thing is availability on demand through DicOOo provided the
package is ok

I'll have a look at wiki link but there may be a problem :
What happens with older versions of OOo than 2.0.2 when we use this
spellchecker ?


Laurent

--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

nemeth-2
Quoting Laurent Godard <[hidden email]>:

> Hi
>
> Just ask here for inclusion in DicOOo repository if ready
> if no response of myself, it is because of time, so re-ask
>
> if your spellchecker is ready (.dic, .aff, readme+licence in a zip
> file), tell me i'll do it
>
> No conspiracy here, just i did not know/notice your needs
>
> Moreover, there are 2 different things
> 1 is inclusion in OOo by default, ant thsi is what will happen in 2.3
> The otehr thing is availability on demand through DicOOo provided the
> package is ok
>
> I'll have a look at wiki link but there may be a problem :
> What happens with older versions of OOo than 2.0.2 when we use this
> spellchecker ?

It would be fine to handle different versions of dictionaries
(i.e. dictionaries for different OpenOffice.org versions) with DicOOo.
Optionally with an automatic query of the OOo version, plus a
dependency field in the pack file (or using OOo dictionary extensions instead
pack files).

The problem is not too big without version handling, too:
Most of the users use localised OOo versions with the
proper dictionaries or newer OOo versions (>=2.0.2 for Hunspell),
plus Hunspell dictionaries are near 100% compatible with MySpell.
For example, old Swedish dictionary accepts million wrong compounds.
MySpell with the new Hunspell dictionary doesn't accept compounds
(~7% of the words in Swedish texts).

Regards

Laci

>
>
> Laurent
>
> --
> Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
> http://www.indesko.com
> Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
> http://www.nuxeo.org
> Livre "Programmation OpenOffice.org", Eyrolles 2004-2006
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

Laurent Godard-3
HI

>
> It would be fine to handle different versions of dictionaries
> (i.e. dictionaries for different OpenOffice.org versions) with DicOOo.
> Optionally with an automatic query of the OOo version, plus a
> dependency field in the pack file (or using OOo dictionary extensions instead
> pack files).
>

we already maintain a different list for thesaurus
so it would be possible to detect that hunspell service is available and
then propose a correct list

Laurent

--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

Lars Aronsson
In reply to this post by Laurent Godard-3
Laurent Godard wrote:

> Just ask here for inclusion in DicOOo repository if ready
> if no response of myself, it is because of time, so re-ask
>
> if your spellchecker is ready (.dic, .aff, readme+licence in a
> zip file), tell me i'll do it

The archive at http://hem.bredband.net/dsso1/sv_SE.zip
was made available on December 7, 2006 and contains three files:
sv_SE.dic, sv_SE.aff and README_sv_SE.txt
The latter contains the full LGPL text in English.

> Moreover, there are 2 different things
> 1 is inclusion in OOo by default, ant thsi is what will happen in 2.3

Yes, and this is what I'm really asking about.  In my experience
users find DicOOo too complicated.  They install the newest
standard distribution and find the included Swedish spelling
dictionary (ispell from 1997) is a joke.  This is a problem.

Does the DicOOo mechanism change in 2.1, 2.2 or 2.3 so that it
becomes more automated and streamlined?

> I'll have a look at wiki link but there may be a problem :
> What happens with older versions of OOo than 2.0.2 when we use this
> spellchecker ?

The new dictionary package uses Hunspell and is specifically for
newer versions of OOo.  I agree it could be nice to provide an
improved dictionary also for older versions, but this is not a
priority for me personally.  When people attack me because of the
bad spell checker, I can get away with telling them to install a
newer version of OOo.  The sad thing is that a fresh install of
OOo 2.1 or 2.2 won't do. Well, what's another year? I can wait.


--
  Lars Aronsson ([hidden email])
  Aronsson Datateknik - http://aronsson.se

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

Mathias Bauer
Lars Aronsson wrote:

> Does the DicOOo mechanism change in 2.1, 2.2 or 2.3 so that it
> becomes more automated and streamlined?

IMHO a better approach would be to deploy dictionaries as extensions and
in case they rely on a particular spell checker in the installation use
a dependency on it or at least the OOo version to make sure that the
extension is only installed if this spell checker is available.

Unfortunately the dictionary handling currently has some bugs wrt. to
extendability (if missing support for something can be seen as a bug)
that have to be fixed. But if this could help to solve this and similar
problems in the future and if Laurent had some time to help us with the
DicOOo wizard we could think about giving this a higher priority.

Ciao,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
Please don't reply to "[hidden email]".
I use it for the OOo lists and only rarely read other mails sent to it.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

Laurent Godard-3
HI

>> Does the DicOOo mechanism change in 2.1, 2.2 or 2.3 so that it
>> becomes more automated and streamlined?
>
> IMHO a better approach would be to deploy dictionaries as extensions and
> in case they rely on a particular spell checker in the installation use
> a dependency on it or at least the OOo version to make sure that the
> extension is only installed if this spell checker is available.
>

as told before, i agree

> Unfortunately the dictionary handling currently has some bugs wrt. to
> extendability (if missing support for something can be seen as a bug)
> that have to be fixed. But if this could help to solve this and similar
> problems in the future and if Laurent had some time to help us with the
> DicOOo wizard we could think about giving this a higher priority.
>

i experience very hard times these days
i'll be oin vacation next week and try to do something at DicOOo level
- modify DicOOo for being able to distinguish between mySpell and
hunspell (probably 2 lists maintained in a first shoot as for thesaurus)
Laci, what is the service to check for to be sure Hunspell is installed ?

- upload the dictionary (perharsps even before)

regarding the process to download manually dictionaries, we have the
same problem in French sue to licencing issues
user do not complain that much (but some, yes, complain)

More next week

Laurent



--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dictionary update process

Mathias Bauer
Laurent Godard wrote:

>> Unfortunately the dictionary handling currently has some bugs wrt. to
>> extendability (if missing support for something can be seen as a bug)
>> that have to be fixed. But if this could help to solve this and similar
>> problems in the future and if Laurent had some time to help us with the
>> DicOOo wizard we could think about giving this a higher priority.
>>
>
> i experience very hard times these days
> i'll be oin vacation next week and try to do something at DicOOo level
> - modify DicOOo for being able to distinguish between mySpell and
> hunspell (probably 2 lists maintained in a first shoot as for thesaurus)
> Laci, what is the service to check for to be sure Hunspell is installed ?
>
> - upload the dictionary (perharsps even before)
>
> regarding the process to download manually dictionaries, we have the
> same problem in French sue to licencing issues
> user do not complain that much (but some, yes, complain)

No need to hurry, Laurent. If you can offer your support for the time
once we have finished the necessary changes in a CWS, that would be
enough. We will have to do some work on our side before this.

As you agreed to this, I'm fine with everything for the moment. I will
talk with Thomas Lange about a possible schedule for the necessary work.
When this is done, we will create a CWS and come back to the list. Until
then have a good vacation and don't let the hard times get you down.

Ciao,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
Please don't reply to "[hidden email]".
I use it for the OOo lists and only rarely read other mails sent to it.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]