Spellchecker : on letter error not detected

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Spellchecker : on letter error not detected

Laurent Godard-3
Hi

Single letters seem not being detected as errors even if not in the
dictionary.

so any  e i u é in french, alone, and that represent non-valid words are
treaded as valid
other languages are also affected

do you reproduce ?
is it a know bug ?
where in source to dig ?

Thanks

laurent


--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

thomas.lange

Hi Laurent,


> Single letters seem not being detected as errors even if not in the
> dictionary.
>
> so any  e i u é in french, alone, and that represent non-valid words are
> treaded as valid
> other languages are also affected
>
> do you reproduce ?
> is it a know bug ?
> where in source to dig ?

I can only speak for Writer here.
It is a very old decision and I do not know the exact reason for it but
there is code that actually only calls the spell checker for words
longer than one character.

If you like to have that changed we should discuss with Frank if he sees
a problem with changing that behavior.

Basically I would expect no problem. But maybe we would still want to
exclude punctuation characters. I can readily imagine that this may be
the original reason to exclude single characters from spell checking.

->FME: What do you think Frank?


Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Laurent Godard-3
Hi Thomas,

> I can only speak for Writer here.
> It is a very old decision and I do not know the exact reason for it but
> there is code that actually only calls the spell checker for words
> longer than one character.
>

Thanks a lot for this clear response

> If you like to have that changed we should discuss with Frank if he sees
> a problem with changing that behavior.
>
> Basically I would expect no problem. But maybe we would still want to
> exclude punctuation characters. I can readily imagine that this may be
> the original reason to exclude single characters from spell checking.
>

Yes for puntuations but spellchecking one-character-length words is
mandatory in many languages (not saying all)

> ->FME: What do you think Frank?
>

Thanks a lot Thomas

Laurent

--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Lars Aronsson
In reply to this post by Laurent Godard-3
Laurent Godard wrote:

> Single letters seem not being detected as errors even if not in the
> dictionary.

A similar question: Has anybody tried to spell check whitespace?
That is, to write software that approves comma followed by
whitespace, but detects an error if whitespace is followed by
comma.  Today's spell checkers (back to ispell) all seem to split
the text into words and non-words and then only spell check the
words.  Wouldn't it make just as much sense to spell check the
non-words?  There are rules for each language. For example,
whitespace before colon is OK in French, but mostly wrong in
English and Swedish.

You could also want to spell-check numbers so that decimals and
thousands are properly written 1,234,567.89 and not 1,23,4567.89.  
(For Swedish, that would be decimal comma and non-breakable space
for the thousands instead.) Or if it is an IP address
123.456.78.90 the software could remark that 456 is out of range.

A more clever spell checker could also detect 31st April as an
error.  And even Monday June 19, 2007, since that is a Tuesday.

Will such tests be part of a planned grammar checker?


--
  Lars Aronsson ([hidden email])
  Aronsson Datateknik - http://aronsson.se

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Daniel Naber-9
On Monday 18 June 2007 05:00, Lars Aronsson wrote:

> A similar question: Has anybody tried to spell check whitespace?
> That is, to write software that approves comma followed by
> whitespace,

LanguageTool (www.languagetool.org) checks this already.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

thomas.lange
In reply to this post by Laurent Godard-3

Hell Lars,

> Laurent Godard wrote:
>
>> Single letters seem not being detected as errors even if not in the
>> dictionary.
>
> A similar question: Has anybody tried to spell check whitespace?

No. This is not possible with the current spell checking design.
The breakiterator is used to obtain word by word. And even if it is
configured to return punctuation as well it does not include white
spaces. (At least this is the state of affairs as I know it.)

> That is, to write software that approves comma followed by
> whitespace, but detects an error if whitespace is followed by
> comma.  Today's spell checkers (back to ispell) all seem to split
> the text into words and non-words and then only spell check the
> words.  Wouldn't it make just as much sense to spell check the
> non-words?  There are rules for each language. For example,
> whitespace before colon is OK in French, but mostly wrong in
> English and Swedish.

In general it does not make sense because of the various spell checker
implementations I encountered none ever featured this for word-by-word
spell checking. And word-by-word is the limitation of the current spell
checker API.
A sentence based spell or grammar checking should be able to do that.
But if it actually does will depend on the specific implementation.


> You could also want to spell-check numbers so that decimals and
> thousands are properly written 1,234,567.89 and not 1,23,4567.89.  
> (For Swedish, that would be decimal comma and non-breakable space
> for the thousands instead.) Or if it is an IP address
> 123.456.78.90 the software could remark that 456 is out of range.

Spell checking numbers is already possible. (Results depending on the
spell checker used).
There is an option for this (sth like "check words with numbers") and by
default it is turned off.

> A more clever spell checker could also detect 31st April as an
> error. And even Monday June 19, 2007, since that is a Tuesday.

I think here you are expecting too much from a spell or grammar checker.
Even though it can be done it is much unlikely.
It is similar to check "green sun" and "blue apple" and nobody is going
to do that. Especially since such constructs might be Ok in literacy
(e.g. fiction).

Also there are other calendars in use e.g. Julian or Byzantine.
And the day of the week varies in those. But it should still be Ok to
cite sth like "Monday June 19, 2007 in Julian calendar corresponds to
... in the Gregorian calendar." And to check that it will require actual
understanding of the sentence.

> Will such tests be part of a planned grammar checker?

Don't know...
What a single grammar checker chooses to check and what not is not is up
to it's implementation.
I can only tell you that the API would allow for such things,
since the checker gets passed the whole sentence (unlike to what
happens for spell checking) and may report errors anywhere in it.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Laurent Godard-3
In reply to this post by thomas.lange
Hi Thomas, Hi Frank, Hi all

> Basically I would expect no problem. But maybe we would still want to
> exclude punctuation characters. I can readily imagine that this may be
> the original reason to exclude single characters from spell checking.
>
> ->FME: What do you think Frank?

can we move on this topic as i see it as a major drawback for spellchecking

any opinion ?
where to look in the sources ?
any already opened issue ?

Laurent

--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Frank Meies
Hi all,

On 06/19/07 12:01, Laurent Godard wrote:

> Hi Thomas, Hi Frank, Hi all
>
>> Basically I would expect no problem. But maybe we would still want to
>> exclude punctuation characters. I can readily imagine that this may be
>> the original reason to exclude single characters from spell checking.
>>
>> ->FME: What do you think Frank?
>
> can we move on this topic as i see it as a major drawback for spellchecking
>
> any opinion ?
> where to look in the sources ?

sw/source/core/txtnode/txtedt.cxx, search for 'rWord.Len() > 1'

Regards,

Frank

--
Frank Meies (fme) - OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Laurent Godard-3
Hi Frank

>> any opinion ?
>> where to look in the sources ?
>
> sw/source/core/txtnode/txtedt.cxx, search for 'rWord.Len() > 1'
>

thanks a lot
any planned impact ? what about punctuations as noticed by thomas

Laurent


--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Frank Meies
Hi Laurent,

On 06/19/07 12:18, Laurent Godard wrote:

>
>>> any opinion ?
>>> where to look in the sources ?
>>
>> sw/source/core/txtnode/txtedt.cxx, search for 'rWord.Len() > 1'
>>
>
> thanks a lot
> any planned impact ? what about punctuations as noticed by thomas

Don't know. If all spell checkers consider them as 'valid' words,
there's no need to exclude punctuation characters.

Regards,

Frank

--
Frank Meies (fme) - OpenOffice.org Writer
OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Marcin Miłkowski
In reply to this post by Laurent Godard-3
Laurent Godard pisze:

> Hi Frank
>
>>> any opinion ?
>>> where to look in the sources ?
>>
>> sw/source/core/txtnode/txtedt.cxx, search for 'rWord.Len() > 1'
>>
>
> thanks a lot
> any planned impact ? what about punctuations as noticed by thomas

Well, this can slow down the spell-checking process a little. I'm not
sure if whitespace would be forwarded to the spellchecker as well after
removing 'rWord.Len() > 1', but if so, we should check if all checkers,
i.e., StarOffice and Hunspell, would deal with that correctly.

Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

thomas.lange
In reply to this post by Laurent Godard-3

Hi,


Frank Meies wrote:

> Hi Laurent,
>
> On 06/19/07 12:18, Laurent Godard wrote:
>
>>
>>>> any opinion ?
>>>> where to look in the sources ?
>>>
>>> sw/source/core/txtnode/txtedt.cxx, search for 'rWord.Len() > 1'
>>>
>>
>> thanks a lot
>> any planned impact ? what about punctuations as noticed by thomas

I just checked by disabling the respective code.
It seems we are lucky and punctuation will still be skipped also some
other characters like CTRL 'minus', CTRL 'space', <, > and @ seem to be
no problem.
But in the end this may be language specific because of the
breakiterator. I checked only with english.

> Don't know. If all spell checkers consider them as 'valid' words,
> there's no need to exclude punctuation characters.

Definetly not all spellcheckers will consider punctuation or characters
like +, @, < to be valid.


In regards to the little test mentioned above I see no specific problem
right now when changing the behavior.
If anyone is willing to provide me with a issue for that I'm willing to
fix it. It is just for the reference that I'm not changing this out of
the blue.


Regards,
Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Spellchecker : on letter error not detected

Laurent Godard-3
Hi Thomas

> In regards to the little test mentioned above I see no specific problem
> right now when changing the behavior.
> If anyone is willing to provide me with a issue for that I'm willing to
> fix it. It is just for the reference that I'm not changing this out of
> the blue.
>

i openned
http://www.openoffice.org/issues/show_bug.cgi?id=78734

let us know

Laurent

--
Laurent Godard <[hidden email]> - Ingénierie OpenOffice.org -
http://www.indesko.com
Nuxeo Enterprise Content Management >> http://www.nuxeo.com -
http://www.nuxeo.org
Livre "Programmation OpenOffice.org", Eyrolles 2004-2006

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]