[SoC] Grammar checker API

classic Classic list List threaded Threaded
61 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

[SoC] Grammar checker API

Bruno Sant'Anna-2
Hi,

Talking about this program with Carlos Menezes, we started to think about summarizing the topics open in previous e-mails. Feel free to comment ok?

1. Grammar Checker API, now:
  1. It makes sense working with just one language now; so, foreign words in the text should be ignored.
  2. The grammar checker should run in a different thread to not block OpenOffice.
  3. The grammar checker should be able to check inside table cells, text headers and footers, enumerations and text boxes (Drawing Objects).
  4. The grammar checker should determine end of the sentences, because it is not so trivial (e.g., abbreviations). So, OpenOffice should just provide to the grammar checker an entire block of text, like a paragraph.
  5. OpenOffice should be able to replace the wrong sentences.
  6. I think we should create an unified User Interface, for any grammar checker use it.
  7. Automatic checking should run in background and marking the wrong sentences with a wavy line. It could be enabled and disabled, like Spell Checker.
  8. The API should provide a paragraph (for example) to grammar checker and this one should return a list. If there is no mistake in this paragraph, the list should be empty,  else the list should contain:
    1. Where is the mistake in the paragraph (initial index + final index).
    2. A list of suggestions to correct that mistake (this list can be empty if checker is not prepared to guess).
    3. A comment about mistake, e.g. what a grammar book should say about it.

2. Grammar Checker API, future:
  1. Let's suppose it's possible to manage several languages in a text and there is a Language Guessing API. Then, when OpenOffice discover language of a sentence, it automatically loads grammar checker to correspondent language.
  2. Optimize memory allocation, input/output and processing.
  3. Correct possible bugs.

Bruno Sant'Anna

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Daniel Naber-4
On Dienstag 23 Mai 2006 22:15, Bruno Sant'Anna wrote:

>   5. OpenOffice should be able to replace the wrong sentences.

This might seem obvious, but the original styles and layout should be kept
when text is replaced.

> 1. Where is the mistake in the paragraph (initial index + final index).
>       2. A list of suggestions to correct that mistake (this list can
>       be empty if checker is not prepared to guess).
>       3. A comment about mistake, e.g. what a grammar book should say
>       about it.

There should also be a way for the user to disable the rule, e.g. because
it is triggered to often (for correct sentences). Maybe one could even
have an option "accept here" which doesn't deactivate the rule but avoids
the display of the error message at this position. Of course this is not
that trivial...

regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Marcin Miłkowski
In reply to this post by Bruno Sant'Anna-2
Bruno Sant'Anna wrote:
>    6. I think we should create an unified User Interface, for any
>       grammar checker use it.

Think also about setting general options for any grammar checker - like
setting the required language register (colloquial, official, etc. - it
should be up to the Grammar Checker how many registers and which
registers it actually supports). Now, should the register list be
defined generally for all languages or set by a grammar checker for a
particular language? Hard to say which is best.

Best regards,
Marcin Miłkowski (developing a Polish checker for LanguageTool right now
;) )

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Jancs
In reply to this post by Bruno Sant'Anna-2
Citēju Bruno Sant'Anna <[hidden email]>:

> 2. Grammar Checker API, future:
>    1. Let's suppose it's possible to manage several languages in a text
>    and there is a Language Guessing API. Then, when OpenOffice discover
>    language of a sentence, it automatically loads grammar checker to
>    correspondent language.
>     2. Optimize memory allocation, input/output and processing.

everything sounds +/- OK, but the question is:

if the api will be designed as described in p.1, what kind of titanic work will
be needeed to bring to life p.2?

I even now see performance problems with OO-2.02, not having grammar checker and
with only 4 languages of spell, hyph and thes, no matter Lin or Win. I have to
add, that my machine is not of the weakest one (AMD64-3800/1GB RAM).

Janis
***

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Joan Moratinos
In reply to this post by Bruno Sant'Anna-2
En/na Bruno Sant'Anna ha escrit:

> Hi,
>
> Talking about this program with Carlos Menezes, we started to think
> about summarizing the topics open in previous e-mails. Feel free to
> comment ok?
>
> 1. Grammar Checker API, now:
>
>    1. It makes sense working with just one language now; so, foreign
>       words in the text should be ignored.
>    2. The grammar checker should run in a different thread to not block
>       OpenOffice.
>    3. The grammar checker should be able to check inside table cells,
>       text headers and footers, enumerations and text boxes (Drawing
>       Objects).
>    4. The grammar checker should determine end of the sentences, because
>       it is not so trivial (e.g., abbreviations). So, OpenOffice should
>       just provide to the grammar checker an entire block of text, like
>       a paragraph.

For the automatic checking in the background:
I have noticed that the Spanish grammar checker for MSWord tries to
check everytime the user types a character that is a "candidate" for
ending a sentence (for example, a dot). If the user goes on typing on
the same paragraph, eventualy some fragments are checked again (it seems
like there are "hard" ends, that can't be changed by the following text,
and "soft" ends, that depend on the text that follows (for example, an
abbreviation can appear at the end of the sentence or in the middle)). I
think that we should check the grammar as soon as possible, not when all
the paragraph has been typed.

>    5. OpenOffice should be able to replace the wrong sentences.

The checker should preserve formating, footnotes, etc. Ideally these
things should not be passed to the checker (the footnotes and the like
could be passed when the paragraph or the sentence that includes them
has been checked, for example), but if the user chooses to accept a
suggestion, the format (i.e. italics), the footnotes, etc. should remain
in the original places. Perhaps we could pass "markers" embedded in the
paragraph text and then return them in the corrected text to "align" the
original and the checked sentences.

>    6. I think we should create an unified User Interface, for any
>       grammar checker use it.

I think that this user interface should be optional. A grammar checker
is a candidate for great complexity and we should not be constrained to
a predefined UI. For example, the grammar checker I'm developing
(http://www.einescat.org) uses its own UI, and can be eventually used
from clients other than OOo. For me (in my particular case) it would be
better not being bound to any user interface.

>    7. Automatic checking should run in background and marking the wrong
>       sentences with a wavy line. It could be enabled and disabled, like
>       Spell Checker.

We should consider different colors for different usages (grammar
mistakes, style recommendations, etc.).

>    8. The API should provide a paragraph (for example) to grammar
>       checker and this one should return a list. If there is no mistake
>       in this paragraph, the list should be empty,  else the list should
>       contain:
>          1. Where is the mistake in the paragraph (initial index + final
>             index).
>          2. A list of suggestions to correct that mistake (this list can
>             be empty if checker is not prepared to guess).
>          3. A comment about mistake, e.g. what a grammar book should say
>             about it.

A paragraph can contain several mistakes. We should proceed as in the
spell checker. First the checker could return only the limits of the
mistakes, so that OOo marks it. Only when the user asks for suggestions
or explanations, should the checker provide it. Often the user will
correct the mistakes without asking for suggestions nor explanations.

>
>
> 2. Grammar Checker API, future:
>
>    1. Let's suppose it's possible to manage several languages in a text
>       and there is a Language Guessing API. Then, when OpenOffice
>       discover language of a sentence, it automatically loads grammar
>       checker to correspondent language.
>    2. Optimize memory allocation, input/output and processing.
>    3. Correct possible bugs.
>
>
> Bruno Sant'Anna
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Joan Moratinos
In reply to this post by Bruno Sant'Anna-2
En/na Bruno Sant'Anna ha escrit:

> Hi,
>
> Talking about this program with Carlos Menezes, we started to think
> about summarizing the topics open in previous e-mails. Feel free to
> comment ok?
>
> 1. Grammar Checker API, now:
>
>    1. It makes sense working with just one language now; so, foreign
>       words in the text should be ignored.
>    2. The grammar checker should run in a different thread to not block
>       OpenOffice.
>    3. The grammar checker should be able to check inside table cells,
>       text headers and footers, enumerations and text boxes (Drawing
>       Objects).
>    4. The grammar checker should determine end of the sentences, because
>       it is not so trivial (e.g., abbreviations). So, OpenOffice should
>       just provide to the grammar checker an entire block of text, like
>       a paragraph.

For the automatic checking in the background:
I have noticed that the Spanish grammar checker for MSWord tries to
check everytime the user types a character that is a "candidate" for
ending a sentence (for example, a dot). If the user goes on typing on
the same paragraph, eventualy some fragments are checked again (it seems
like there are "hard" ends, that can't be changed by the following text,
and "soft" ends, that depend on the text that follows (for example, an
abbreviation can appear at the end of the sentence or in the middle)). I
think that we should check the grammar as soon as possible, not when all
the paragraph has been typed.

>    5. OpenOffice should be able to replace the wrong sentences.

The checker should preserve formating, footnotes, etc. Ideally these
things should not be passed to the checker (the footnotes and the like
could be passed when the paragraph or the sentence that includes them
has been checked, for example), but if the user chooses to accept a
suggestion, the format (i.e. italics), the footnotes, etc. should remain
in the original places. Perhaps we could pass "markers" embedded in the
paragraph text and then return them in the corrected text to "align" the
original and the checked sentences.

>    6. I think we should create an unified User Interface, for any
>       grammar checker use it.

I think that this user interface should be optional. A grammar checker
is a candidate for great complexity and we should not be constrained to
a predefined UI. For example, the grammar checker I'm developing
(http://www.einescat.org) uses its own UI, and can be eventually used
from clients other than OOo. For me (in my particular case) it would be
better not being bound to any user interface.

>    7. Automatic checking should run in background and marking the wrong
>       sentences with a wavy line. It could be enabled and disabled, like
>       Spell Checker.

We should consider different colors for different usages (grammar
mistakes, style recommendations, etc.).

>    8. The API should provide a paragraph (for example) to grammar
>       checker and this one should return a list. If there is no mistake
>       in this paragraph, the list should be empty,  else the list should
>       contain:
>          1. Where is the mistake in the paragraph (initial index + final
>             index).
>          2. A list of suggestions to correct that mistake (this list can
>             be empty if checker is not prepared to guess).
>          3. A comment about mistake, e.g. what a grammar book should say
>             about it.

A paragraph can contain several mistakes. We should proceed as in the
spell checker. First the checker could return only the limits of the
mistakes, so that OOo marks it. Only when the user asks for suggestions
or explanations, should the checker provide it. Often the user will
correct the mistakes without asking for suggestions nor explanations.

>
>
> 2. Grammar Checker API, future:
>
>    1. Let's suppose it's possible to manage several languages in a text
>       and there is a Language Guessing API. Then, when OpenOffice
>       discover language of a sentence, it automatically loads grammar
>       checker to correspondent language.
>    2. Optimize memory allocation, input/output and processing.
>    3. Correct possible bugs.
>
>
> Bruno Sant'Anna
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Bruno Sant'Anna-2
In reply to this post by Daniel Naber-4


On 5/23/06, Daniel Naber <[hidden email]> wrote:
On Dienstag 23 Mai 2006 22:15, Bruno Sant'Anna wrote:

>  5. OpenOffice should be able to replace the wrong sentences.

This might seem obvious, but the original styles and layout should be kept
when text is replaced.

Yes, as Thomas said in first e-mail, the API has to take care about these details, preserve the text format. It should be tricky but of course is a priority, I'll remember this when start coding.

> 1. Where is the mistake in the paragraph (initial index + final index).
> 2. A list of suggestions to correct that mistake (this list can
> be empty if checker is not prepared to guess).
> 3. A comment about mistake, e.g. what a grammar book should say
> about it.

There should also be a way for the user to disable the rule, e.g. because
it is triggered to often (for correct sentences). Maybe one could even
have an option "accept here" which doesn't deactivate the rule but avoids
the display of the error message at this position. Of course this is not
that trivial...

Something like, "ignore this" and "ignore all", ... I haven't thought about it, I'm thinking about treat the return like an object (since oo is developped in C++), the constructor method of a class Checked shoud receive the return of a grammar checker right, and has a boolean "checked" marked as false, if the user mark it as checked this boolean turns into on and the grammar checker API stops showing this rule. In the case of a "ignore all" setup, a method ignoreAll receives a rule index, look for every rule in the text block, and turn the variable checked in true. We have to think about it.

regards
Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Jonathon Blake
In reply to this post by Bruno Sant'Anna-2
Bruno wrote:

> The grammar checker should determine end of the sentences, because it is not so trivial (e.g., abbreviations).So, OpenOffice should just provide to the grammar checker an entire block of text, like a paragraph.

I'd suggest rephrasing that paragraph as:

"OpenOffice.org provides text blocks --- usually the entire paragraph
--- to the grammar checker.  The grammar checker has to verify all
punctuation.  This includes things like abbreviations, quoted text
within sentences, bulleted items, and sentence endings."

> I think we should create an unified User Interface, for any grammar checker use it.

>  2. Grammar Checker API, future:
> Let's suppose it's possible to manage several languages in a text and there is a Language Guessing API. Then, when OpenOffice discover language of a sentence, it automatically loads grammar checker to correspondent language.

Have the Grammer Checker API send the grammar checker the language
that the styles in the paragraph are configured for.   If the
characters are in teh CJKV range, then that is the returned language.
If the characters are in either a BiDi, or Indus Valley Script range,
then return that language.  Otherwise return the Western language.

The grammar checker should flag characters outside of the appropriate
unicode subrange, without changing them to something else. [Have it
mark the chracters as a spelling error.]

>Automatic checking should run in background

Useful, only if it does not interfere with creating the document.  If
I am writing in German, I don't want it to pop up and capitalize
Esperanto expressions that I am using.  Or worse yet, converting
Ladino to Hebrew.

xan

jonathon
--
Ethical conduct is a vice.
Corrupt conduct is a virtue.

Motto of Nacarima.
Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Jonathon Blake
In reply to this post by Bruno Sant'Anna-2
Bruno wrote:

>  Something like, "ignore this" and "ignore all", ... I haven't thought about it,

That would also make it easier to create "custom" style checkers.
EG:
* Mark as wrong all uses of first person singular;
* Mark as correct all uses of third person plural;
* Mark as wrong all uses of the ctive vioce;
* Mark as correct all uses of the passive voice;
[To apply the styles guidelines of a company I used to work for.]

Big business is as interested in documents that conform to their style
guidelines, as they are in documents that are grammatically correct.

xan

jonathon
--
Ethical conduct is a vice.
Corrupt conduct is a virtue.

Motto of Nacarima.
Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Bruno Sant'Anna-2
In reply to this post by Joan Moratinos


On 5/24/06, Joan Moratinos <[hidden email]> wrote:
En/na Bruno Sant'Anna ha escrit:

> Hi,
>
> Talking about this program with Carlos Menezes, we started to think
> about summarizing the topics open in previous e-mails. Feel free to
> comment ok?
>
> 1. Grammar Checker API, now:
>
>    1. It makes sense working with just one language now; so, foreign
>       words in the text should be ignored.
>    2. The grammar checker should run in a different thread to not block
>       OpenOffice.
>    3. The grammar checker should be able to check inside table cells,
>       text headers and footers, enumerations and text boxes (Drawing
>       Objects).
>    4. The grammar checker should determine end of the sentences, because
>       it is not so trivial (e.g., abbreviations). So, OpenOffice should
>       just provide to the grammar checker an entire block of text, like
>       a paragraph.

For the automatic checking in the background:
I have noticed that the Spanish grammar checker for MSWord tries to
check everytime the user types a character that is a "candidate" for
ending a sentence (for example, a dot). If the user goes on typing on
the same paragraph, eventualy some fragments are checked again (it seems
like there are "hard" ends, that can't be changed by the following text,
and "soft" ends, that depend on the text that follows (for example, an
abbreviation can appear at the end of the sentence or in the middle)). I
think that we should check the grammar as soon as possible, not when all
the paragraph has been typed.

As we discussed before, letting the OO determine the end of sentences is difficult. I think the right  time to start checking is after every Return Key press. Letting the grammar checkers analyse blocks is more secure, the grammar checker can commit few mistakes when we act like it.
 

>    5. OpenOffice should be able to replace the wrong sentences.

The checker should preserve formating, footnotes, etc. Ideally these
things should not be passed to the checker (the footnotes and the like
could be passed when the paragraph or the sentence that includes them
has been checked, for example), but if the user chooses to accept a
suggestion, the format (i.e. italics), the footnotes, etc. should remain
in the original places. Perhaps we could pass "markers" embedded in the
paragraph text and then return them in the corrected text to "align" the
original and the checked sentences.

hum... I think API can deal with it, my idea is not letting grammar checkers deal with these details, only analyse and suggests corrections. It could be difficult letting a grammar checker deal with indexes, text positions, underlining etc.

>    6. I think we should create an unified User Interface, for any
>       grammar checker use it.

I think that this user interface should be optional. A grammar checker
is a candidate for great complexity and we should not be constrained to
a predefined UI. For example, the grammar checker I'm developing
(http://www.einescat.org) uses its own UI, and can be eventually used
from clients other than OOo. For me (in my particular case) it would be
better not being bound to any user interface.

We have discussed it before, there is a problem, today every grammar checker uses its own user interface, now imagine if you want to use two or more grammar checkers in  the same time, each grammar checker should have its own UI? I think its not good. I know if we create a single user interface it cannot allow a fine tuning in each grammar checker but I'm proposing a unified UI with most common options.  We are open to discuss here ok?
 

>    7. Automatic checking should run in background and marking the wrong
>       sentences with a wavy line. It could be enabled and disabled, like
>       Spell Checker.

We should consider different colors for different usages (grammar
mistakes, style recommendations, etc.).

Can be for future  API =). I'll remeber this...

>    8. The API should provide a paragraph (for example) to grammar
>       checker and this one should return a list. If there is no mistake
>       in this paragraph, the list should be empty,  else the list should
>       contain:
>          1. Where is the mistake in the paragraph (initial index + final
>             index).
>          2. A list of suggestions to correct that mistake (this list can
>             be empty if checker is not prepared to guess).
>          3. A comment about mistake, e.g. what a grammar book should say
>             about it.

A paragraph can contain several mistakes. We should proceed as in the
spell checker. First the checker could return only the limits of the
mistakes, so that OOo marks it. Only when the user asks for suggestions
or explanations, should the checker provide it. Often the user will
correct the mistakes without asking for suggestions nor explanations.

Yes, You are correct, the users may in several times just correct the sentences, but the process of analysing the Paragraph is processed just once per change (after a change or a return key press, as I told before). And a single check should provide all information regarding the block analysed, IT not means that everything will be showed to the user, it will just be stored in some place (an object in memory) for the User Interface deal with it.

>

>
> 2. Grammar Checker API, future:
>
>    1. Let's suppose it's possible to manage several languages in a text
>       and there is a Language Guessing API. Then, when OpenOffice
>       discover language of a sentence, it automatically loads grammar
>       checker to correspondent language.
>    2. Optimize memory allocation, input/output and processing.
>    3. Correct possible bugs.
>
>
> Bruno Sant'Anna
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

thomas.lange
In reply to this post by Bruno Sant'Anna-2

Hello Bruno,

Well, first things first:
Congratulations for being accepted as on of the projects for the Google
Summer of Code! :-)

->Lacci: Hi, Lacci. I'm not sure if you already noticed that we have
started a dsicussion about grammar checking an API and not last to be
mentioned integration of grammar checkers in OOo.
The focus should currently be on the integration (i.e. whow will it look
like to the user in the end?) especially if there are more than one
grammar checkers available. I think this should be the first topic
because we need to make clear where we want to go and identify the
problems on the way before deciding on an API.
So if you have time I would be glad if you can share your thoughts.


> 1. Grammar Checker API, now:
>
>    1. It makes sense working with just one language now; so, foreign
>       words in the text should be ignored.

From the API view agreed!

From the UI view I'm a bit unsure here. Since currently different
languages in one sentence being spell checked is working it looks a bit
like a regression from the users point of view if that text would just
be skipped.


>    2. The grammar checker should run in a different thread to not block
>       OpenOffice.

You mean when grammar checking is done automatically (in the background
like automatic spell checking) only?

>    3. The grammar checker should be able to check inside table cells,
>       text headers and footers, enumerations and text boxes (Drawing
>       Objects).

Sure.
The question is should it be able to do so because it knows of the
existence of such objects and is able to retrieve/modify those on it's
own? Or should the existence of such objects be completely hidden to the
grammar checker? For example by means of an abstract API to iterate
through and modify the text of a document.
And pushing that question one step further:
Is the grammar checkers implementation to iterate through the text or
should there be a different object that iterates through the text and
calls the grammar checker to process it?

>    4. The grammar checker should determine end of the sentences, because
>       it is not so trivial (e.g., abbreviations). So, OpenOffice should
>       just provide to the grammar checker an entire block of text, like
>       a paragraph.

Doing it this way would of cause be easiest from the applications view.
First it does not need to determine the end of a sentence and secondly
paragraphs are the easiest units to access.

But I somewhat doubt the ability of a grammar to identify the end of
sentence in a mixed language text. For example if an English grammar
checker encounters the upside-down question-mark following the Spanish
word at the end. Thus I'm wondering if the API should allow for a
suggested-end-of-sentence when calling the grammar checker. Thus if the
implementation encounters unknown characters it has at least a hint.

BTW: The I18N break-iterator is not that bad with abbreviations. I think
it has a list of those. But citations and similar things might pose a
huge problem to it.

And another question would be:
Having the grammar checker being called with sentences, does it mean
when an error is found the whole paragraph is presented to the user
(could be really large!) or does the UI only display the sentence of
where the error occurred?

Displaying less than a sentence seems somewhat bad to me because
sometimes the user will possibly like to solve an error by rearranging
the sentence. And quiting the UI because only the wrong word was
displayed seems to be annoying. And allowing the original document to be
modified parallel to the dialog being display may be somewhat
troublesome to implement.


>    5. OpenOffice should be able to replace the wrong sentences.

;-)

>    6. I think we should create an unified User Interface, for any
>       grammar checker use it.

+1.

Of course this will not prevent someones grammar checker to come along
with it's own UI.
It only makes the implementation easier if the UI is already there and
to the user all the grammar checker will look the same. Thus avoiding a
possible source of confusion.

>    7. Automatic checking should run in background and marking the wrong
>       sentences with a wavy line. It could be enabled and disabled, like
>       Spell Checker.

+1.
Someone once mentioned the idea of at least two different kind of lines.
One for what the grammar checker knows for sure is wrong. And the other
one for "this is probably wrong" (e.g. outdated words like "thy" or
"thee" in English). This of course going along with an option that
allows the user to specify if he likes to have both types displayed or
only the I'm-100%-sure-it-is-wrong parts.
The reasoning was AFAIR that it is most annoying to the user to get
errors reported that are no errors.
I found that idea quite compelling...


>    8. The API should provide a paragraph (for example) to grammar
>       checker and this one should return a list. If there is no mistake
>       in this paragraph, the list should be empty,  else the list should
>       contain:

A list of what?
Suggestions on how to correct the first encountered error?

Or did you meant a list of all errors? Or even sth else?


>          1. Where is the mistake in the paragraph (initial index + final
>             index).
>          2. A list of suggestions to correct that mistake (this list can
>             be empty if checker is not prepared to guess).
>          3. A comment about mistake, e.g. what a grammar book should say
>             about it.

Having listed point 1. here as part of the list seems to suggest that a
list of all errors was meant to be returned...
When I talked about this to people implementing grammar checkers last
year all of them said to stop at the first error. Since when that error
was corrected the whole sentence will have to be checked again.
Thus there would be no need for further errors.
Also (as sometimes happen with compilers) consider one single error to
trigger reports of several errors following it. If that one gets fixed
all the other ones will vanish as well. Thus the list may already be
obsolete when the first error got fixed.


> 2. Grammar Checker API, future:
>
>    1. Let's suppose it's possible to manage several languages in a text
>       and there is a Language Guessing API. Then, when OpenOffice
>       discover language of a sentence, it automatically loads grammar
>       checker to correspondent language.

Here it is a bit like the snake biting it's tail:
How is the language guessing to be presented with a sentence to operate
on (in order to define which grammar checker is to be used), when the
grammar checker is already required to identify the end of the sentence?

Either it is only guessing the language of the paragraph, which may
constitute of several complete-sentences-in-various-languages. Or we
still need the I18N breakiterator (or sth similar) to identify the sentence.


Regards,
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

thomas.lange
In reply to this post by Bruno Sant'Anna-2

Hi Daniel,

>>   5. OpenOffice should be able to replace the wrong sentences.
>
> This might seem obvious, but the original styles and layout should be kept
> when text is replaced.
>
>> 1. Where is the mistake in the paragraph (initial index + final index).
>>       2. A list of suggestions to correct that mistake (this list can
>>       be empty if checker is not prepared to guess).
>>       3. A comment about mistake, e.g. what a grammar book should say
>>       about it.
>
> There should also be a way for the user to disable the rule, e.g. because
> it is triggered to often (for correct sentences). Maybe one could even
> have an option "accept here" which doesn't deactivate the rule but avoids
> the display of the error message at this position. Of course this is not
> that trivial...

This brings back the issue of:
Should a grammar checker come along with it's own UI to allow
customization of options or are we to pre-compile a set of options that
can be used?
I'm quite confident we here on the ML can compile a suitable list of
options for Western languages. But I'm wondering about the possible or
required options for e.g. Asian languages...

Also a German grammar checker comes along with only a single options
that is used to define about 5 levels of how strict the text should be
checked without giving much further hints. And having a large set of
options that gets not used at all looks odd also.

Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

thomas.lange
In reply to this post by Bruno Sant'Anna-2

Hi all,

> Citēju Bruno Sant'Anna <[hidden email]>:
>
>> 2. Grammar Checker API, future:
>>    1. Let's suppose it's possible to manage several languages in a text
>>    and there is a Language Guessing API. Then, when OpenOffice discover
>>    language of a sentence, it automatically loads grammar checker to
>>    correspondent language.
>>     2. Optimize memory allocation, input/output and processing.
>
> everything sounds +/- OK, but the question is:
>
> if the api will be designed as described in p.1, what kind of titanic work will
> be needeed to bring to life p.2?

Well I would not wonder if to implement it in full would take about 1 or
even 2 years.
That's why we need to clarify first what we like to have in the end or
we will waste lots of time and work.
Hopefully we also agree on sth like a order of things to do and have a
single grammar checker for a single language acceptable integrated in
the near future (several month at most). For this I think the most
essential thing we are missing now is an API that allows the grammar
checker to mark the wrong text parts.
With this done we could already have CoGrOO fully functional on it's own.

And what we need to do after that is to make some kind of transition to
what was described in point 2. For example by providing a common UI or
an API to iterate over the text that eliminates the need for the grammar
checker to know about document internals like header or tables etc.


Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

thomas.lange
In reply to this post by Bruno Sant'Anna-2

Hi all,

> For the automatic checking in the background:
> I have noticed that the Spanish grammar checker for MSWord tries to
> check everytime the user types a character that is a "candidate" for
> ending a sentence (for example, a dot). If the user goes on typing on
> the same paragraph, eventualy some fragments are checked again (it seems
> like there are "hard" ends, that can't be changed by the following text,
> and "soft" ends, that depend on the text that follows (for example, an
> abbreviation can appear at the end of the sentence or in the middle)). I
> think that we should check the grammar as soon as possible, not when all
> the paragraph has been typed.

The reasonable choice would probably be to check again when a word was
modified/added deleted.
Similar things already happen for spell checking when a word gets modified.


>>    5. OpenOffice should be able to replace the wrong sentences.
>
> The checker should preserve formating, footnotes, etc. Ideally these
> things should not be passed to the checker (the footnotes and the like
> could be passed when the paragraph or the sentence that includes them
> has been checked, for example), but if the user chooses to accept a
> suggestion, the format (i.e. italics), the footnotes, etc. should remain
> in the original places. Perhaps we could pass "markers" embedded in the
> paragraph text and then return them in the corrected text to "align" the
> original and the checked sentences.

We may get a comment from Oliver on this next week when he is back.
Because he already was required to implement such issues for the
sentence based spell checking dialog.

>>    6. I think we should create an unified User Interface, for any
>>       grammar checker use it.
>
> I think that this user interface should be optional. A grammar checker
> is a candidate for great complexity and we should not be constrained to
> a predefined UI. For example, the grammar checker I'm developing
> (http://www.einescat.org) uses its own UI, and can be eventually used
> from clients other than OOo. For me (in my particular case) it would be
> better not being bound to any user interface.

For a single grammar checker being used at all I would agree on the instant.
But considering an environment where numerous grammar checker will be
installed I think having a larger number of different UI's is not a good
idea. Such an environment easily exist for Universities. Consider them
to install all grammar checkers available to support all their students
from different countries.

>>    7. Automatic checking should run in background and marking the wrong
>>       sentences with a wavy line. It could be enabled and disabled, like
>>       Spell Checker.
>
> We should consider different colors for different usages (grammar
> mistakes, style recommendations, etc.).

Several types of lines to mark text for various reasons sounds Ok to me.
(See one of my other posts).
I like to mention though that last year most people pointed out that
styles should not be handled by a grammar checker at all.

I found the idea of checking and correcting styles quite interesting.
But maybe it should be a component of it's own.
This probably just depends if sth like that is supported by grammar
checkers. I myself have not yet seen a grammar checker that cared about
styles.

Just to make sure what I meant when talking about styles:
I was referring to text attributes like bold, italic, font size or color
etc.

Maybe you are referring to the style of language being used e.g:
- ancient
- technical
- vulgar
- official document
- informal
...


Regards,
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Jonathon Blake
Thomas wrote:

> I myself have not yet seen a grammar checker that cared about styles.

I know I was refering to writing style --- the style of the languge being used..

xan

jonathon
--
Ethical conduct is a vice.
Corrupt conduct is a virtue.

Motto of Nacarima.
Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

thomas.lange
In reply to this post by Bruno Sant'Anna-2

Hi,

>
>     For the automatic checking in the background:
>     I have noticed that the Spanish grammar checker for MSWord tries to
>     check everytime the user types a character that is a "candidate" for
>     ending a sentence (for example, a dot). If the user goes on typing on
>     the same paragraph, eventualy some fragments are checked again (it seems
>     like there are "hard" ends, that can't be changed by the following text,
>     and "soft" ends, that depend on the text that follows (for example, an
>     abbreviation can appear at the end of the sentence or in the middle)). I
>     think that we should check the grammar as soon as possible, not when all
>     the paragraph has been typed.
>
>
> As we discussed before, letting the OO determine the end of sentences is
> difficult. I think the right  time to start checking is after every
> Return Key press. Letting the grammar checkers analyse blocks is more
> secure, the grammar checker can commit few mistakes when we act like it.

Just looking for the Return key won't work.
If you are inserting a new sentence in a paragraph or cut or paste text
that key will not be pressed. You also need to have in mind that text
may get changed via API as well.

To word it somewhat sloppy I would say "whenever the text changed by at
least a word".


>     >    5. OpenOffice should be able to replace the wrong sentences.
>
>     The checker should preserve formating, footnotes, etc. Ideally these
>     things should not be passed to the checker (the footnotes and the like
>     could be passed when the paragraph or the sentence that includes them
>     has been checked, for example), but if the user chooses to accept a
>     suggestion, the format (i.e. italics), the footnotes, etc. should remain
>     in the original places. Perhaps we could pass "markers" embedded in the
>     paragraph text and then return them in the corrected text to "align"
>     the
>     original and the checked sentences.
>
>
> hum... I think API can deal with it, my idea is not letting grammar
> checkers deal with these details, only analyse and suggests corrections.
> It could be difficult letting a grammar checker deal with indexes, text
> positions, underlining etc.

That would be my suggestion as well. Let the grammar checker care about
checking text only. That way it will be either to provide a new one and
does require less changes to existing ones if document internals change.

But I'm quite sure that CoGrOO currently has to handle a lot of this by
itself right now. ^_~

> Yes, You are correct, the users may in several times just correct the
> sentences, but the process of analysing the Paragraph is processed just
> once per change (after a change or a return key press, as I told
> before). And a single check should provide all information regarding the
> block analysed, IT not means that everything will be showed to the user,
> it will just be stored in some place (an object in memory) for the User
> Interface deal with it.

I hope you don't intend to create such objects hidden in the memory for
later use by the auto checking as well. I think it will be way to easily
to have many of such objects created by accident (e.g. wrong language
set at paragraph). Caching a very limited number of results will be Ok
though.
The main question is if it is not much more efficient (time and memory)
for auto checking if the suggestions are not retrieved.
AFAIK for spell checkers at least retrieving suggestions is a
considerable task compared to only find out that sth is wrong somewhere.

Thus it might be a good idea to only keep one (or at most a very limited
number) of that sth-is-wrong-somewhere data if that can be used to speed
up the suggestion retrieval.
But most likely that will be of not much use either when you can not
cache all of them (which is likely a memory no-go). And caching a small
number will probably not help either since when it is about up grammar
checking those will be obsolete almost instantly.

Thus currently me thinks there is no use in retrieving or caching any
data beyond the minimum required results to mark the text portion when
running in the background.

TL->Bruno: Please check if I'm wrong here.


Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Daniel Naber-4
On Freitag 26 Mai 2006 16:44, Thomas Lange wrote:

> Thus currently me thinks there is no use in retrieving or caching any
> data beyond the minimum required results to mark the text portion when
> running in the background.

Are we talking about showing the error corrections in a context menu? Then
it doesn't seem to be possible to get those on demand, as it may take 1-2
seconds to get the correction and the UI shouldn't block that long.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Daniel Naber-4
In reply to this post by thomas.lange
On Freitag 26 Mai 2006 15:36, Thomas Lange wrote:

> I'm quite confident we here on the ML can compile a suitable list of
> options for Western languages.

Well, LanguageTool has rules that mark phrases like "ohne Gewehr" (instead
of "ohne Gewähr"). This might be right or wrong, so every single rule can
be disabled. Also rules may need to be made configurable. So no, I don't
think we can agree on a list of options.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Jonathon Blake
Daniel wrote:

>Also rules may need to be made configurable. So no, I don't think we
can agree on a list of options.

How about a standard call in the grammar UI, for configuring the
grammar checker, to selectively turn off/on every rule that is used by
the grammar checker. [This is assumign that the same UI will be used,
regardless of which grammar checking program is used.]

xan

jonathon
--
Ethical conduct is a vice.
Corrupt conduct is a virtue.

Motto of Nacarima.
Reply | Threaded
Open this post in threaded view
|

Re: [SoC] Grammar checker API

Bruno Sant'Anna-2
In reply to this post by thomas.lange


On 5/26/06, Thomas Lange <[hidden email]> wrote:

Hello Bruno,

Well, first things first:
Congratulations for being accepted as on of the projects for the Google
Summer of Code! :-)


Hi Thomas,

Haha thanks a lot, as I told before I'll do a great effort in this project.
It will be a great experience for me.


> 1. Grammar Checker API, now:
>
>    1. It makes sense working with just one language now; so, foreign
>       words in the text should be ignored.

From the API view agreed!

From the UI view I'm a bit unsure here. Since currently different
languages in one sentence being spell checked is working it looks a bit
like a regression from the users point of view if that text would just
be skipped.

For the user interface we can create something visual that shows to user a message like "Words in other languages are not being checked yet." It for sure will pass confidence for users i think. 

>    2. The grammar checker should run in a different thread to not block
>       OpenOffice.

You mean when grammar checking is done automatically (in the background
like automatic spell checking) only?

No,  not  just in background,  I was planning to implement both of them, automatic checking and interactive checking, but for both of them we can create threads. My idea in creating threads is to not block OOo, when a user want interactive checking it doesn't matter but when a authomatic checking start it must run in background and the main process (OOo) must continue.

Here I want to add a thing, I'm planning to implement both modes ok (automatic and interactive). When a user request the interactive method (e.g. clicking in a button "Check Grammar") the API provides the current text, I mean everything, not just a block, I think it is secure, it can be slow but the user is prepared for waiting since he asked to check. In the automatic checking, after every change of a paragraph the API sends it to checker, I was thinking about setting a time limit too, for example, 60 seconds, what do you think?

>    3. The grammar checker should be able to check inside table cells,
>       text headers and footers, enumerations and text boxes (Drawing
>       Objects).

Sure.
The question is should it be able to do so because it knows of the
existence of such objects and is able to retrieve/modify those on it's
own?

I think in this case the rules change a bit, the secure method here is when a user is editing this, just an example:  when a user stops typing for a period of 4 seconds, the API sends that little piece of text to grammar checker.

Or should the existence of such objects be completely hidden to the
grammar checker?

As I told before we have to figure how to deal with it. But I think inside table cells and enumerations should be checked for sure.

For example by means of an abstract API to iterate
through and modify the text of a document.
And pushing that question one step further:
Is the grammar checkers implementation to iterate through the text or
should there be a different object that iterates through the text and
calls the grammar checker to process it?

For me the second one, it can treat details like formatting, letting the grammar checkers act directly should be dangerous for text formatting

>    4. The grammar checker should determine end of the sentences, because
>       it is not so trivial ( e.g., abbreviations). So, OpenOffice should
>       just provide to the grammar checker an entire block of text, like
>       a paragraph.

Doing it this way would of cause be easiest from the applications view.
First it does not need to determine the end of a sentence and secondly
paragraphs are the easiest units to access.

But I somewhat doubt the ability of a grammar to identify the end of
sentence in a mixed language text. For example if an English grammar
checker encounters the upside-down question-mark following the Spanish
word at the end. Thus I'm wondering if the API should allow for a
suggested-end-of-sentence when calling the grammar checker. Thus if the
implementation encounters unknown characters it has at least a hint.

BTW: The I18N break-iterator is not that bad with abbreviations. I think
it has a list of those. But citations and similar things might pose a
huge problem to it.

Question: Can grammar checkers use I18N break-iterator?

And another question would be:
Having the grammar checker being called with sentences, does it mean
when an error is found the whole paragraph is presented to the user
(could be really large!) or does the UI only display the sentence of
where the error occurred?

The  sentence for sure, for these we will have a list with start position and end position indexes. =)

Displaying less than a sentence seems somewhat bad to me because
sometimes the user will possibly like to solve an error by rearranging
the sentence.

Yes , I agree with you, I think grammar checkers should deal with it in sententeces, it a part of a sentence is wrong, this sentence is considered wrong.

And quiting the UI because only the wrong word was
displayed seems to be annoying. And allowing the original document to be
modified parallel to the dialog being display may be somewhat
troublesome to implement.

I think the secure way of implement changes is by showing dialogs, even in authomatic checking, it just show the mistakes, a user have to right click in it and a dialog appears. Have you figured another way to do it?

>    5. OpenOffice should be able to replace the wrong sentences.

;-)

>    6. I think we should create an unified User Interface, for any
>       grammar checker use it.

+1.

Of course this will not prevent someones grammar checker to come along
with it's own UI.
It only makes the implementation easier if the UI is already there and
to the user all the grammar checker will look the same. Thus avoiding a
possible source of confusion.

>    7. Automatic checking should run in background and marking the wrong
>       sentences with a wavy line. It could be enabled and disabled, like
>       Spell Checker.

+1.
Someone once mentioned the idea of at least two different kind of lines.
One for what the grammar checker knows for sure is wrong. And the other
one for "this is probably wrong" (e.g. outdated words like "thy" or
"thee" in English). This of course going along with an option that
allows the user to specify if he likes to have both types displayed or
only the I'm-100%-sure-it-is-wrong parts.
The reasoning was AFAIR that it is most annoying to the user to get
errors reported that are no errors.
I found that idea quite compelling...

It can be done but I'm not sure if every grammar checker will implement it.

>    8. The API should provide a paragraph (for example) to grammar
>       checker and this one should return a list. If there is no mistake
>       in this paragraph, the list should be empty,  else the list should
>       contain:

A list of what?

a list containning objects, for example;

object mistake
{
    int startpos; // start position of the sentence
    int endpos; // end positon of the sentence
    string guessed_sentence; // the right sentence guessed by the grammar checker
    string rule_tip; // the grammar rule comment
    boolean checked; // flag if the user want to ignore it  or not.
}

Suggestions on how to correct the first encountered error?

Or did you meant a list of all errors? Or even sth else?


>          1. Where is the mistake in the paragraph (initial index + final
>             index).
>          2. A list of suggestions to correct that mistake (this list can
>             be empty if checker is not prepared to guess).
>          3. A comment about mistake, e.g. what a grammar book should say
>             about it.

Having listed point 1. here as part of the list seems to suggest that a
list of all errors was meant to be returned...
When I talked about this to people implementing grammar checkers last
year all of them said to stop at the first error. Since when that error
was corrected the whole sentence will have to be checked again.
Thus there would be no need for further errors.
Also (as sometimes happen with compilers) consider one single error to
trigger reports of several errors following it. If that one gets fixed
all the other ones will vanish as well. Thus the list may already be
obsolete when the first error got fixed.

This is true, yes it should be better.

> 2. Grammar Checker API, future:
>
>    1. Let's suppose it's possible to manage several languages in a text
>       and there is a Language Guessing API. Then, when OpenOffice
>       discover language of a sentence, it automatically loads grammar
>       checker to correspondent language.

Here it is a bit like the snake biting it's tail:
How is the language guessing to be presented with a sentence to operate
on (in order to define which grammar checker is to be used), when the
grammar checker is already required to identify the end of the sentence?

Either it is only guessing the language of the paragraph, which may
constitute of several complete-sentences-in-various-languages. Or we
still need the I18N breakiterator (or sth similar) to identify the sentence.


Regards,
Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


1234