Anyone familiar with the ICU?

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Anyone familiar with the ICU?

ge-7
>>
This would be useful e.g. for German where there are correct word parts
like
  "Arbeits- und Verwaltungsrecht"
BTW: Is there any other language where hyphens/dashes should be handled
similarly?
<<
To the BTW question:
Yes, in Hungarian you say:
munka- és államigazgatási jog
to the above.

-eleonora



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

thomas.lange

Hello Ruud,

> >
> > Hi all,
> >
> > Does anyone know how to modify the ICU (i.e. probably the word.txt file)
> > to allow for pre- and postfix "HYPHEN-MINUS" and "EN DASH" as part of
> > the word (in order to get them passed on to the spell checker as well)?
> >
> > This would be useful e.g. for German where there are correct word parts
> > like
> >   "Arbeits- und Verwaltungsrecht"
>
> Thomas, for German (as far as I know) and Dutch (I am sure about that) the
> - at the end of a word is only correct when the word-part it is after is a
> correct starter of a compound.
> So, I woul not liek any dash at an end to be assumed correct.
> I am busy working it out using compounding flags in hunspell. So I just
> need the ending dash to be accepted as part of a word.
> The same applies to the starting dash for ending parts from compounds ..
> (less used, but allowed).
>  
My question is NOT about if such a word is correct or not. That is not
the issue of the breakiterator (i.e. ICU).

But the ICU should treat hyphen/dash as part of the word when they are
at the start or end (at least in German, ...) because otherwise the
spell checker will never get to see the word "Arbeits-" but only
"Arbeits" and thus it can not verify that "Arbeits-" is actually correct.

Of course if a word with starting or trailing dash is correct or not
will be completely left to the spell checker and its dictionaries.


Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Per Eriksson-2
Hello,

The Swedish community has waited for such a feature for a long time,
making the hyphen being forwarded to the spellchecker.

We don't know what is expected from us. We would like this feature
activated for Swedish.

What are the risks involved with activating this that our lingu team
must know about? Our members working with the dictionary are experts and
would well be able to answer to such issues.

:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.:.
Best Regards
Per Eriksson
Lead Swedish Native Lang Project
OpenOffice.org Community
Phone: +46 70 560 10 33
Email: [hidden email]
Web: http://sv.openoffice.org/



Thomas Lange - Sun Germany - ham02 - Hamburg skrev:

> Hello Ruud,
>
>  
>>> Hi all,
>>>
>>> Does anyone know how to modify the ICU (i.e. probably the word.txt file)
>>> to allow for pre- and postfix "HYPHEN-MINUS" and "EN DASH" as part of
>>> the word (in order to get them passed on to the spell checker as well)?
>>>
>>> This would be useful e.g. for German where there are correct word parts
>>> like
>>>   "Arbeits- und Verwaltungsrecht"
>>>      
>> Thomas, for German (as far as I know) and Dutch (I am sure about that) the
>> - at the end of a word is only correct when the word-part it is after is a
>> correct starter of a compound.
>> So, I woul not liek any dash at an end to be assumed correct.
>> I am busy working it out using compounding flags in hunspell. So I just
>> need the ending dash to be accepted as part of a word.
>> The same applies to the starting dash for ending parts from compounds ..
>> (less used, but allowed).
>>  
>>    
> My question is NOT about if such a word is correct or not. That is not
> the issue of the breakiterator (i.e. ICU).
>
> But the ICU should treat hyphen/dash as part of the word when they are
> at the start or end (at least in German, ...) because otherwise the
> spell checker will never get to see the word "Arbeits-" but only
> "Arbeits" and thus it can not verify that "Arbeits-" is actually correct.
>
> Of course if a word with starting or trailing dash is correct or not
> will be completely left to the spell checker and its dictionaries.
>
>
> Thomas
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>  

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

ge-7
In reply to this post by ge-7
>>
But the ICU should treat hyphen/dash as part of the word when they are
at the start or end (at least in German, ...) because otherwise the
spell checker will never get to see the word "Arbeits-" but only
"Arbeits" and thus it can not verify that "Arbeits-" is actually correct.
<<

To check  "Arbeits- und Verwaltungsrecht" type constructs
is a job for the grammar checker and not
for the spell checker. The spell checker - as Ruud pointed out-
has no problem with words closed by a dash.

-eleonora



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

thomas.lange
In reply to this post by Per Eriksson-2

Hello Per,

> Hello,
>
> The Swedish community has waited for such a feature for a long time,
> making the hyphen being forwarded to the spellchecker.
>
> We don't know what is expected from us. We would like this feature
> activated for Swedish.
>
> What are the risks involved with activating this that our lingu team
> must know about? Our members working with the dictionary are experts and
> would well be able to answer to such issues.
>  
I don't know for sure since I'm neither familiar with the internal of
hunspell in respect to this matter and also I habve never created an
affix dictionary on my own. Sorry. My part is only to get the hyphens to
the spell checker as part of the words.

László and maybe some others should know about what dictionary
maintainers may need to do because of this change.


Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

thomas.lange
In reply to this post by ge-7

Hi Eleonora,

> >>
> But the ICU should treat hyphen/dash as part of the word when they are
> at the start or end (at least in German, ...) because otherwise the
> spell checker will never get to see the word "Arbeits-" but only
> "Arbeits" and thus it can not verify that "Arbeits-" is actually correct.
> <<
>
> To check  "Arbeits- und Verwaltungsrecht" type constructs
> is a job for the grammar checker and not
> for the spell checker. The spell checker - as Ruud pointed out-
> has no problem with words closed by a dash.
>
>  

That is not correct!

If the problem could be handled by the grammar checker only everything
would be fine and we would have no need to fix issue 64400.
But unfortunately we are not yet that far along the planned road, and it
will still take as a great deal of time. Currently spell checking and
grammar checking are still two completely unrelated processes and thus
grammar checking has no way to influence (namely overrule) results that
are reported by the spell checker.

Thus for example the current scenario for the German text
  "Arbeits- und Verwaltungsrecht."
is like this:
- the grammar checker should not find any problem at all
- but the spell checker gets to see the following three words:
   1) "Arbeits"
   2) "und"
   3) "Verwaltungsrecht."
  Because of the still open issue 64400 the spell checker will have to
  check the word "Arbeits" and that one does not exist. Thus it will be
  marked as incorrect. :-(
 
But if issue 64400 gets fixed and the dash becomes part of the word to
check than the spell checker gets to check "Arbeits-" and that word
should exist in a German spell check dictionary. And AFAIK it already
does exist like this in the current German dictionary. The only thing
preventing the dictionary and the spell checker from dealing with this
properly is that currently leading/trailing dashes are not treated as
part of the word for the purpose of spell checking.


Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

ge-7
In reply to this post by ge-7
Thomas,

Thanks for the detailed explanation.
I agree with you in the conclusion.

-eleonora


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Németh László-2
In reply to this post by thomas.lange
Hi,

2009/6/10 Thomas Lange - Sun Germany - ham02 - Hamburg <[hidden email]>:

>
> Hello Per,
>> Hello,
>>
>> The Swedish community has waited for such a feature for a long time,
>> making the hyphen being forwarded to the spellchecker.
>>
>> We don't know what is expected from us. We would like this feature
>> activated for Swedish.
>>
>> What are the risks involved with activating this that our lingu team
>> must know about? Our members working with the dictionary are experts and
>> would well be able to answer to such issues.
>>
> I don't know for sure since I'm neither familiar with the internal of
> hunspell in respect to this matter and also I habve never created an
> affix dictionary on my own. Sorry. My part is only to get the hyphens to
> the spell checker as part of the words.
>
> László and maybe some others should know about what dictionary
> maintainers may need to do because of this change.

There is no risk by definition. Hunspell has a default word break mode
to do the same, as the old ICU breakiterator (after the failed spell
checking of the words with hyphens). See BREAK in the Hunspell manual.

I will check the EN DASH support (hyphen-minus conversion for 8-bit
dictionaries and automatic word break with UTF-8 dictionaries), yet.

Regards,
László

>
>
> Thomas
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Per Eriksson-2
In reply to this post by thomas.lange
Hi Thomas

Thomas Lange - Sun Germany - ham02 - Hamburg skrev:

> If the problem could be handled by the grammar checker only everything
> would be fine and we would have no need to fix issue 64400.
> But unfortunately we are not yet that far along the planned road, and it
> will still take as a great deal of time. Currently spell checking and
> grammar checking are still two completely unrelated processes and thus
> grammar checking has no way to influence (namely overrule) results that
> are reported by the spell checker.
>
> Thus for example the current scenario for the German text
>   "Arbeits- und Verwaltungsrecht."
> is like this:
> - the grammar checker should not find any problem at all
> - but the spell checker gets to see the following three words:
>    1) "Arbeits"
>    2) "und"
>    3) "Verwaltungsrecht."
>   Because of the still open issue 64400 the spell checker will have to
>   check the word "Arbeits" and that one does not exist. Thus it will be
>   marked as incorrect. :-(
>  
> But if issue 64400 gets fixed and the dash becomes part of the word to
> check than the spell checker gets to check "Arbeits-" and that word
> should exist in a German spell check dictionary. And AFAIK it already
> does exist like this in the current German dictionary. The only thing
> preventing the dictionary and the spell checker from dealing with this
> properly is that currently leading/trailing dashes are not treated as
> part of the word for the purpose of spell checking.
>  

Thanks for the update. Is this feature activated per language or globally?

Does this feature affect other features/functions/specifications in
OpenOffice.org?

Are we uncertain about anything, hesitating about anything?

Bets
Per


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Javier SOLA
In reply to this post by ge-7
In Spanish, when hyphen is used in pace of a comma - which in English is
detached from the words (as in this example) - for Spanish it is
attached -as in this case- and must be separated when detecting word
boundaries. I do not know how this works in German. Do you use hyphens
as parenthesis, as in English or Spanish, or you never do?

This would have to be langauge dependent.

Word boundary analysis is in the OpenOffice SVN (eve if it was taken
from ICU). It calls ICU, but this modification should be done in OOo,
specifically for German (you need to create a new word boundary analysis
set-of-rules file for german). Karl is the man in charge.

Javier

ge wrote:

> Thomas,
>
> Thanks for the detailed explanation.
> I agree with you in the conclusion.
>
> -eleonora
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

thomas.lange
In reply to this post by Per Eriksson-2
Hi Per,

> Hi Thomas
>
> Thomas Lange - Sun Germany - ham02 - Hamburg skrev:
>> If the problem could be handled by the grammar checker only everything
>> would be fine and we would have no need to fix issue 64400.
>> But unfortunately we are not yet that far along the planned road, and it
>> will still take as a great deal of time. Currently spell checking and
>> grammar checking are still two completely unrelated processes and thus
>> grammar checking has no way to influence (namely overrule) results that
>> are reported by the spell checker.
>>
>> Thus for example the current scenario for the German text
>>   "Arbeits- und Verwaltungsrecht."
>> is like this:
>> - the grammar checker should not find any problem at all
>> - but the spell checker gets to see the following three words:
>>    1) "Arbeits"
>>    2) "und"
>>    3) "Verwaltungsrecht."
>>   Because of the still open issue 64400 the spell checker will have to
>>   check the word "Arbeits" and that one does not exist. Thus it will be
>>   marked as incorrect. :-(
>>  
>> But if issue 64400 gets fixed and the dash becomes part of the word to
>> check than the spell checker gets to check "Arbeits-" and that word
>> should exist in a German spell check dictionary. And AFAIK it already
>> does exist like this in the current German dictionary. The only thing
>> preventing the dictionary and the spell checker from dealing with this
>> properly is that currently leading/trailing dashes are not treated as
>> part of the word for the purpose of spell checking.
>>  
>
> Thanks for the update. Is this feature activated per language or
> globally?

The mid-hyphen as part of the word will be activated globally.
As for the pre- and post-hyphen, those will be available only per
language upon request.
I think most languages will not make use of this.


>
> Does this feature affect other features/functions/specifications in
> OpenOffice.org?
It should effect only all that makes use of the dictionary mode of the
breakiterator (anything else will be a bug).
The dictionary mode is used for spell checking, thesaurus and
hyphenation. Thus the respective dictionaries may influence the results.


Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Ruud Baars-2
Thomas Lange schreef:

> Hi Per,
>> Hi Thomas
>>
>> Thomas Lange - Sun Germany - ham02 - Hamburg skrev:
>>> If the problem could be handled by the grammar checker only everything
>>> would be fine and we would have no need to fix issue 64400.
>>> But unfortunately we are not yet that far along the planned road,
>>> and it
>>> will still take as a great deal of time. Currently spell checking and
>>> grammar checking are still two completely unrelated processes and thus
>>> grammar checking has no way to influence (namely overrule) results that
>>> are reported by the spell checker.
>>>
>>> Thus for example the current scenario for the German text
>>>   "Arbeits- und Verwaltungsrecht."
>>> is like this:
>>> - the grammar checker should not find any problem at all
>>> - but the spell checker gets to see the following three words:
>>>    1) "Arbeits"
>>>    2) "und"
>>>    3) "Verwaltungsrecht."
>>>   Because of the still open issue 64400 the spell checker will have to
>>>   check the word "Arbeits" and that one does not exist. Thus it will be
>>>   marked as incorrect. :-(
>>>  
>>> But if issue 64400 gets fixed and the dash becomes part of the word to
>>> check than the spell checker gets to check "Arbeits-" and that word
>>> should exist in a German spell check dictionary. And AFAIK it already
>>> does exist like this in the current German dictionary. The only thing
>>> preventing the dictionary and the spell checker from dealing with this
>>> properly is that currently leading/trailing dashes are not treated as
>>> part of the word for the purpose of spell checking.
>>>  
>>
>> Thanks for the update. Is this feature activated per language or
>> globally?
>
> The mid-hyphen as part of the word will be activated globally.
> As for the pre- and post-hyphen, those will be available only per
> language upon request.
Thomas, should the request com from the localization team? If so, I will
forward them the issue. When not, consider this the request for Dutch.
There is no problem with the 'in between sentence', since these require
a long dash, and spaces arount these.

> I think most languages will not make use of this.
>
>
>>
>> Does this feature affect other features/functions/specifications in
>> OpenOffice.org?
> It should effect only all that makes use of the dictionary mode of the
> breakiterator (anything else will be a bug).
> The dictionary mode is used for spell checking, thesaurus and
> hyphenation. Thus the respective dictionaries may influence the results.
>
>
> Thomas
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

thomas.lange

Hi Ruud,

> ...
>>>
>>> Thanks for the update. Is this feature activated per language or
>>> globally?
>>
>> The mid-hyphen as part of the word will be activated globally.
>> As for the pre- and post-hyphen, those will be available only per
>> language upon request.
> Thomas, should the request com from the localization team? If so, I
> will forward them the issue. When not, consider this the request for
> Dutch. There is no problem with the 'in between sentence', since these
> require a long dash, and spaces arount these.

If you want the special pre- and postfix handling for hyphens as well,
just add a respective line to issue 64400 (see my latest comment there).

Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Ruud Baars-2
Thomas Lange schreef:

>
> Hi Ruud,
>
>> ...
>>>>
>>>> Thanks for the update. Is this feature activated per language or
>>>> globally?
>>>
>>> The mid-hyphen as part of the word will be activated globally.
>>> As for the pre- and post-hyphen, those will be available only per
>>> language upon request.
>> Thomas, should the request com from the localization team? If so, I
>> will forward them the issue. When not, consider this the request for
>> Dutch. There is no problem with the 'in between sentence', since
>> these require a long dash, and spaces arount these.
>
> If you want the special pre- and postfix handling for hyphens as well,
> just add a respective line to issue 64400 (see my latest comment there).
I passed this one on to the localisation team. We will have to prepare a
3.2 dictionary anyway. Since we are on the verge of releasing a new
dictionary for Dutch, we could create a 3.2-ready version of it.
It is a bit of extra work, but it would be helpful for the users.
Could you inform us as soon as 3.2 (any prerelease) has implemented this
feature, so we can start preparing and testing?


>
> Thomas
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

thomas.lange

>
> Could you inform us as soon as 3.2 (any prerelease) has implemented
> this feature, so we can start preparing and testing?
Sure. It should become available around m52.

Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Mathias Bauer
In reply to this post by Ruud Baars-2
Ruud Baars wrote:

> Thomas Lange schreef:
>>
>> Hi Ruud,
>>
>>> ...
>>>>>
>>>>> Thanks for the update. Is this feature activated per language or
>>>>> globally?
>>>>
>>>> The mid-hyphen as part of the word will be activated globally.
>>>> As for the pre- and post-hyphen, those will be available only per
>>>> language upon request.
>>> Thomas, should the request com from the localization team? If so, I
>>> will forward them the issue. When not, consider this the request for
>>> Dutch. There is no problem with the 'in between sentence', since
>>> these require a long dash, and spaces arount these.
>>
>> If you want the special pre- and postfix handling for hyphens as well,
>> just add a respective line to issue 64400 (see my latest comment there).
> I passed this one on to the localisation team. We will have to prepare a
> 3.2 dictionary anyway. Since we are on the verge of releasing a new
> dictionary for Dutch, we could create a 3.2-ready version of it.
> It is a bit of extra work, but it would be helpful for the users.
> Could you inform us as soon as 3.2 (any prerelease) has implemented this
> feature, so we can start preparing and testing?

I wonder whether that means that we must provide two dictionary
extensions then - one for 3.0/3.1 and one for 3.2. Or will this new
dictionary work fine in 3.1 also as it will just contain a few words
that wouldn't be passed to the spell checker in 3.1 anyway?

Regards,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
Please don't reply to "[hidden email]".
I use it for the OOo lists and only rarely read other mails sent to it.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Ruud Baars-2
Mathias Bauer schreef:

> Ruud Baars wrote:
>
>  
>> Thomas Lange schreef:
>>    
>>> Hi Ruud,
>>>
>>>      
>>>> ...
>>>>        
>>>>>> Thanks for the update. Is this feature activated per language or
>>>>>> globally?
>>>>>>            
>>>>> The mid-hyphen as part of the word will be activated globally.
>>>>> As for the pre- and post-hyphen, those will be available only per
>>>>> language upon request.
>>>>>          
>>>> Thomas, should the request com from the localization team? If so, I
>>>> will forward them the issue. When not, consider this the request for
>>>> Dutch. There is no problem with the 'in between sentence', since
>>>> these require a long dash, and spaces arount these.
>>>>        
>>> If you want the special pre- and postfix handling for hyphens as well,
>>> just add a respective line to issue 64400 (see my latest comment there).
>>>      
>> I passed this one on to the localisation team. We will have to prepare a
>> 3.2 dictionary anyway. Since we are on the verge of releasing a new
>> dictionary for Dutch, we could create a 3.2-ready version of it.
>> It is a bit of extra work, but it would be helpful for the users.
>> Could you inform us as soon as 3.2 (any prerelease) has implemented this
>> feature, so we can start preparing and testing?
>>    
>
> I wonder whether that means that we must provide two dictionary
> extensions then - one for 3.0/3.1 and one for 3.2. Or will this new
> dictionary work fine in 3.1 also as it will just contain a few words
> that wouldn't be passed to the spell checker in 3.1 anyway?
>  
Now we generate the dictionary, splitting a word like 'au bain-marie'
into 'au', 'bain' and 'marie'.
In the 3.2-version it shoudl be 'au' and 'bain-marie', which is better.
This list will not be fully functional in 3.1, unless we still add
'bain' and 'marie', but then the spell checking is of less quality that
could be achieved ...

So in fact, though it still has to be tested, since the change hasn't
been commited yet, the worst case is there will be a 3.2 and a
3.1-version ...
We are planning to keep the 3.1-version working in 3.2 however (using
BREAK - in Hunspell). Tis also needs a test ..
But, with the new add-on dstructture, there is no problem is there?
There is a minimal version that could be specified ..
> Regards,
> Mathias
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: Anyone familiar with the ICU?

Mathias Bauer
Ruud Baars wrote:

> Mathias Bauer schreef:
>> Ruud Baars wrote:
>>
>>  
>>> Thomas Lange schreef:
>>>    
>>>> Hi Ruud,
>>>>
>>>>      
>>>>> ...
>>>>>        
>>>>>>> Thanks for the update. Is this feature activated per language or
>>>>>>> globally?
>>>>>>>            
>>>>>> The mid-hyphen as part of the word will be activated globally.
>>>>>> As for the pre- and post-hyphen, those will be available only per
>>>>>> language upon request.
>>>>>>          
>>>>> Thomas, should the request com from the localization team? If so, I
>>>>> will forward them the issue. When not, consider this the request for
>>>>> Dutch. There is no problem with the 'in between sentence', since
>>>>> these require a long dash, and spaces arount these.
>>>>>        
>>>> If you want the special pre- and postfix handling for hyphens as well,
>>>> just add a respective line to issue 64400 (see my latest comment there).
>>>>      
>>> I passed this one on to the localisation team. We will have to prepare a
>>> 3.2 dictionary anyway. Since we are on the verge of releasing a new
>>> dictionary for Dutch, we could create a 3.2-ready version of it.
>>> It is a bit of extra work, but it would be helpful for the users.
>>> Could you inform us as soon as 3.2 (any prerelease) has implemented this
>>> feature, so we can start preparing and testing?
>>>    
>>
>> I wonder whether that means that we must provide two dictionary
>> extensions then - one for 3.0/3.1 and one for 3.2. Or will this new
>> dictionary work fine in 3.1 also as it will just contain a few words
>> that wouldn't be passed to the spell checker in 3.1 anyway?
>>  
> Now we generate the dictionary, splitting a word like 'au bain-marie'
> into 'au', 'bain' and 'marie'.
> In the 3.2-version it shoudl be 'au' and 'bain-marie', which is better.
> This list will not be fully functional in 3.1, unless we still add
> 'bain' and 'marie', but then the spell checking is of less quality that
> could be achieved ...
>
> So in fact, though it still has to be tested, since the change hasn't
> been commited yet, the worst case is there will be a 3.2 and a
> 3.1-version ...
> We are planning to keep the 3.1-version working in 3.2 however (using
> BREAK - in Hunspell). Tis also needs a test ..
> But, with the new add-on dstructture, there is no problem is there?
> There is a minimal version that could be specified ..

Right, from the development side there is no technical problem with
that. As you wrote, it's the opposite, our new extensions makes that
much easier than in OOo 2.x.

I just wanted to have that clarified and make sure that we have
everything in place and tested early enough. And I wanted to discuss the
user problems that this might create.

For me the need to have two dictionaries just means that we must make
sure that we have both ones offered for download in the extensions
repository. We can decide later on when it's time to remove the "old"
version (maybe if we have less than n downloads in a certain time
frame). OOo versions containing a dutch dictionary of course will always
prebundle the new version starting with 3.2.

The tricky part is updating. We should not change the extension
identifier so that people that have installed the Dutch extension
manually (as it was not pre-bundled in their OOo version) should get an
update notification. AFAIK the notification process is smart enough not
to notify if the new extsnsion requires a newer OOo version than the
user has installed. But that should be tested.

Regards,
Mathias

--
Mathias Bauer (mba) - Project Lead OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
Please don't reply to "[hidden email]".
I use it for the OOo lists and only rarely read other mails sent to it.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Changed hyphen policy for Dutch (and other languages)

Ruud Baars-2
Together with one of the people of OOo we have tested our newest
dictionary for Dutch, using the new way OOo 3.2  treats words with
hyphens in the middle.

I am happy to be able to inform you our tests were successfull.
And indeed, it is a promising improvement for Dutch.

Even more when we will be able to finally finish our dictionary that
uses compounding algorithms.

Thanks.

Ruud

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Changed hyphen policy for Dutch (and other languages)

thomas.lange

Hello Ruud,

Ruud Baars wrote:

> Together with one of the people of OOo we have tested our newest
> dictionary for Dutch, using the new way OOo 3.2  treats words with
> hyphens in the middle.
>
> I am happy to be able to inform you our tests were successfull.
> And indeed, it is a promising improvement for Dutch.
>
> Even more when we will be able to finally finish our dictionary that
> uses compounding algorithms.
>  

Thats great news! :-)

Thomas



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12