Hyphenation dictionary question

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Hyphenation dictionary question

thomas.lange

Hi all,

Because of a patch to take care of I need to have a basic understanding
about the meaning of entries in a hyphenation dictionary.

If I look e.g. into the en-US hyphenation dictionary there are entries like

.e2a2r
.u4n5k2
a4c2a2r
am2i4no
4and
an5e2st.

What is the meaning of those?
First I thought each entry to be a word part (sub string) where the
numbers denote possible hyphenation points and the value the quality of
that hyphenation point.
But that seems not to be true. At least I do not know a word with a sub
string of 'ear' that can be hyphenated after each of those characters.
Similar for 'acar'. And what is the meaning of the '.' characters?

Can someone shed some light into this?


Note: The actual problem is with some Indic script where I need to check
if entries like
1ઐ1
ल2्2
will get processed correctly by the hyphenator. (But I thought a western
example might be more readable for most subscribers.)
Here the specific problem arises from the characters not being
represented by single bytes...


Thomas


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hyphenation dictionary question

Olivier R.-2
Hi Thomas,

Thomas Lange - Sun Germany - ham02 - Hamburg a écrit :

> .e2a2r
> .u4n5k2
> a4c2a2r
> am2i4no
> 4and
> an5e2st.
>
> What is the meaning of those?
> First I thought each entry to be a word part (sub string) where the
> numbers denote possible hyphenation points and the value the quality of
> that hyphenation point.
> But that seems not to be true. At least I do not know a word with a sub
> string of 'ear' that can be hyphenated after each of those characters.
> Similar for 'acar'. And what is the meaning of the '.' characters?

- odd numbers: can hyphenate
- even numbers: cannot hyphenate
- dots: beginning or end of a word.

The highest number wins.

You should read that:
http://hunspell.sourceforge.net/tb87nemeth.pdf


Regards,
Olivier

--

== N'écrivez pas à cette adresse. Réservée aux listes de discussion. ==
** Do not reply at this address. Mailing-list only. **

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hyphenation dictionary question

F Wolff-2
Op Di, 2009-03-17 om 16:48 +0100 skryf Olivier R.:

> Hi Thomas,
>
> Thomas Lange - Sun Germany - ham02 - Hamburg a écrit :
>
> > .e2a2r
> > .u4n5k2
> > a4c2a2r
> > am2i4no
> > 4and
> > an5e2st.
> >
> > What is the meaning of those?
> > First I thought each entry to be a word part (sub string) where the
> > numbers denote possible hyphenation points and the value the quality of
> > that hyphenation point.
> > But that seems not to be true. At least I do not know a word with a sub
> > string of 'ear' that can be hyphenated after each of those characters.
> > Similar for 'acar'. And what is the meaning of the '.' characters?
>
> - odd numbers: can hyphenate
> - even numbers: cannot hyphenate
> - dots: beginning or end of a word.
>
> The highest number wins.
>
> You should read that:
> http://hunspell.sourceforge.net/tb87nemeth.pdf

I haven't seen this before. It looks like the best resource at the
moment. At the time I worked on these things there were less
documentaiton available, and we started a page on our wiki collecting
some information. I guess it will now only serve as a little bit extra,
but perhaps the gotcha I explain on the page is useful for you.

http://translate.sourceforge.net/wiki/guide/hyphenation

I also added the new link to that page. I never actually saw any news
related to that. It means I can finally fix several hyphenation bugs in
Afrikaans. We have requirements very similar to Dutch in terms of the
handling of the dïäerësës.

Keep well
Friedel


--
Recently on my blog:
http://translate.org.za/blogs/friedel/en/content/video-virtaals-functionality


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hyphenation dictionary question

Németh László-2
Hi,

2009/3/18 F Wolff <[hidden email]>:

> Op Di, 2009-03-17 om 16:48 +0100 skryf Olivier R.:
>> Hi Thomas,
>>
>> Thomas Lange - Sun Germany - ham02 - Hamburg a écrit :
>>
>> > .e2a2r
>> > .u4n5k2
>> > a4c2a2r
>> > am2i4no
>> > 4and
>> > an5e2st.
>> >
>> > What is the meaning of those?
>> > First I thought each entry to be a word part (sub string) where the
>> > numbers denote possible hyphenation points and the value the quality of
>> > that hyphenation point.
>> > But that seems not to be true. At least I do not know a word with a sub
>> > string of 'ear' that can be hyphenated after each of those characters.
>> > Similar for 'acar'. And what is the meaning of the '.' characters?
>>
>> - odd numbers: can hyphenate
>> - even numbers: cannot hyphenate
>> - dots: beginning or end of a word.
>>
>> The highest number wins.
>>
>> You should read that:
>> http://hunspell.sourceforge.net/tb87nemeth.pdf
>
> I haven't seen this before. It looks like the best resource at the
> moment. At the time I worked on these things there were less
> documentaiton available, and we started a page on our wiki collecting
> some information. I guess it will now only serve as a little bit extra,
> but perhaps the gotcha I explain on the page is useful for you.
>
> http://translate.sourceforge.net/wiki/guide/hyphenation
>
> I also added the new link to that page. I never actually saw any news
> related to that. It means I can finally fix several hyphenation bugs in
> Afrikaans. We have requirements very similar to Dutch in terms of the
> handling of the dïäerësës.

Unfortunately, we need a fix for the alternative hyphenation of
diaeresis on the OpenOffice.org/Writer side:
http://www.openoffice.org/issues/show_bug.cgi?id=71608

News about Hyphen 2.4 (integrated for OOo 3.1):
http://markmail.org/message/7grelc6xisoxh4hq

Hyphen 2.4, the improved hyphenator of OOo 3.1:
http://sourceforge.net/project/showfiles.php?group_id=143754&package_id=231949

There are several other news about hyphenation. One of the most
important, that the serious en_US hyphenation problems of OOo 3.x
(also the older issue about the hyphenation and apostrophes) will be
fixed in OpenOffice.org 3.1:
http://www.openoffice.org/issues/show_bug.cgi?id=90028
Check the attached hyphenation patterns of
http://www.openoffice.org/issues/show_bug.cgi?id=97403 to handle
apostrophes (described in
http://www.openoffice.org/issues/show_bug.cgi?id=23015,
http://www.openoffice.org/issues/show_bug.cgi?id=72996 and
http://www.openoffice.org/issues/show_bug.cgi?id=90028).

The hyphenation dialog of OpenOffice.org hasn't supported Unicode and
alternative hyphenation yet, but Caolan McNamara just fixed the
Unicode problem:
http://www.openoffice.org/issues/show_bug.cgi?id=100273.

I have made also a new issue about the missing pattern generator for
the new functions of the hyphenator:
http://www.openoffice.org/issues/show_bug.cgi?id=100302. This pattern
generator would be an important step in supporting hyphenation tasks
of non-English languages.

Regards,
László


>
> Keep well
> Friedel
>
>
> --
> Recently on my blog:
> http://translate.org.za/blogs/friedel/en/content/video-virtaals-functionality
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]