syllable and word.....

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

syllable and word.....

Cthar
Hi,

Dzongkha text flow in continuum. Dzongkha words consists of one or more
syllable.
in case of multisyllable word, the syllables are separated by the Tibetan
Inter-syllabic Mark called Tsheg [unicode: 0F0B].
This Tsheg is a small dot represented in the Dzongkha keyboard by [Space
Bar].

So, the basic problem with the Dzongkha Spell Checker is that, this Tsheg
causes
hunspell to spell check Dzongkha word syllable by syllable.
and if we store the .dic file with syllables instead of word,
then there would be multitude of invalid words formed.

The example to suit the above problem would be Latin-borrowed English words
"ad hoc", "alma mater", etc....
if we list "ad", "hoc", "alma", "mater", separately in the .dic file, then
we can have words such as "ad alma" "ad mater"
"alma hoc", and so on.......

i see mentioning about ICU breakiterator, ZWSP, etc. how do these all
works..any links to these....
How to go about it... Any idea and suggestionsgreatly appreciated..

Thanks in advance
C. Norbu.