I tried to download the OOo 3.1 but after searching from one page to another page, i only got OOo 3.0 to download.After the installation OOo 3.0 , i tried to use the temporary unicode Normalisation in the hunspell but it did not work. My input conversion table is shown below.
ICONV ọ ọ
ICONV ọ̀ ọ̀
ICONV ọ́ ọ́
ICONV ṣ ṣ
ICONV ẹ̀ ẹ̀
ICONV ẹ́ ẹ́
ICONV ẹ ẹ
The character in the second column were written in these sequency: alphabet first, followed by tone mark ( ) and then by the underdot last (.) while the character in third column were written in these sequency: alphabet first, followed by underdot(.) and then by tone mark( ).The OpenOffice writer did not recongnise the character as the same.
Really, this is not only a spell checking problem. OpenOffice.org has
problems with both of visual and functional equivalence of
characters. For example, here is the result of the Find all ä
operation on ÄÄää, i.e. on the "A U+0308 (COMBINING DIARESIS) Ä a
U+0308 ä" character sequence:
It would be fine to solve this problem in the future OpenOffice.org
versions by automatic Unicode normalization, also by OpenType support.
Hunspell 1.2.x (I hope, it will be in OOo 3.1) has a temporary
solution for Unicode normalization (canonical and compatiblity), the
optional input/output conversion:
ICONV Ä Ä
ICONV ä ä
ICONV 가 ᄀ ᅡ
ICONV ﬁ fi
First three conversion is canonical normalization: two composition and
a Hangul decomposition. Conversion of the ﬁ ligature is a
compatibility normalization (but spell checking of words with
f-ligatures needs fixed word breaking in OOo, too).
Conversion of the spell checking suggestions to
the original composed form:
OCONV ᄀ ᅡ 가
OCONV fi ﬁ
(Special spell checking requirements needs special solution. For
example, German typography uses only f-ligatures within words, bot not
in compound word boundary, so the previous OCONV fi ﬁ conversion is
not right for German. A redundant dictionary with non-suggested
decomposed forms, and dictionary words with ligatures helps to check
the correct typography of a German text:
--- affix file ---
REP fi ﬁ
REP ﬁ fi
--- dictionary file ----
Hyphenation of both of composed and decomposed characters is possible
in OOo by redundant hyphenation patterns in OpenOffice.org.
Compatibility equivalent ligatures can be handled by non-standard
For thesauri it is a temporary solution using redundant items or
Incoming stemming in OOo thesaurus by Hunspell is also can handle
normalization problem temporarily.
ICONV input conversion or explicit stems (
--- dic file ---
) can give the normalized stems to the thesaurus component.
Maybe a new Hunspell tool could help the spelling dictionary
developers by the automatic generation of the ICONV normalization
2009/1/5 Stephan Bergmann <[hidden email]>:
> On 01/02/09 09:51, F Wolff wrote:
>> Hallo all
>> We recently had a discussion on a list for African localisation about
>> the utility of having Unicode normalisation automatically done in
>> Hunspell, so that creators of spell checkers wouldn't need to
>> about that.
>> Is this a feature that would be useful to
more people? Is there
>> something generic in OOo that handles normalisation issues for other
>> purposes? (searching, thesaurus, indexes, etc.) I can think of many
>> places where it could be relevant.
>> I'm curious to hear what other people think.
> I brought this up years ago as point 4 of
> nothing became of it back then...
> To unsubscribe, e-mail: [hidden email] > For additional commands, e-mail: [hidden email] >