Hunspell SFX rule to delete any character

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Hunspell SFX rule to delete any character

Goran Rakic
Hi László, all,

Is it possible to delete any character (or any list of characters,
possibly given by regexp) in the Hunspell SFX rule? For example:

SFX A Y 2
SFX A 1 e . is:Ns is:Cg
SFX A 1 i . is:Ns is:Cd

paprika -> paprike is:Ns is:Cg

Or do I have to rewrite this into one SFX entry for every possible
ending character?


On the same topic, is it possible to copy a character? In the rule above
this could give an option to remove any character and than add it back
to the output followed by some suffix.


Is it also possible to rewrite morphology tag? For example having po:N
in the dictionary entry and po:A in the affix rule? Currently the second
tag gets appended to the end.


Thanks,
Goran Rakic



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell SFX rule to delete any character

Ruud Baars-2
Goran Rakic schreef:
> Hi László, all,
>
> Is it possible to delete any character (or any list of characters,
> possibly given by regexp) in the Hunspell SFX rule? For example:
>
> SFX A Y 2
> SFX A 1 e . is:Ns is:Cg
> SFX A 1 i . is:Ns is:Cd
>  
Change this into

SFX A Y 2
SFX A a e . is:Ns is:Cg
SFX A i i . is:Ns is:Cd

And the char will be dropped.

> paprika -> paprike is:Ns is:Cg
>
> Or do I have to rewrite this into one SFX entry for every possible
> ending character?
>
>
> On the same topic, is it possible to copy a character? In the rule above
> this could give an option to remove any character and than add it back
> to the output followed by some suffix.
>
>
> Is it also possible to rewrite morphology tag? For example having po:N
> in the dictionary entry and po:A in the affix rule? Currently the second
> tag gets appended to the end.
>
>
> Thanks,
> Goran Rakic
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell SFX rule to delete any character

Goran Rakic
У нед, 17. 01 2010. у 15:54 +0100, Ruud Baars пише:
> Change this into
> ...
> And the char will be dropped.

Thanks Ruud, I know the syntax, sorry if my question was not clear.

I was asking if it is possible to write a rule that will drop any
character so I do not have to make one rule for stripping -a, another
for stripping -o and so on.

Best regards,
Goran Rakic


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell SFX rule to delete any character

Ruud Baars-2
Goran Rakic schreef:

> У нед, 17. 01 2010. у 15:54 +0100, Ruud Baars пише:
>  
>> Change this into
>> ...
>> And the char will be dropped.
>>    
>
> Thanks Ruud, I know the syntax, sorry if my question was not clear.
>
> I was asking if it is possible to write a rule that will drop any
> character so I do not have to make one rule for stripping -a, another
> for stripping -o and so on.
>  
I am sorry I misunderstood. Your request is a good one. Since 0 means
nothing, it is reasonable to suggest a number as the number of chars to
be dropped.
It would reduce the number of items in the new Dutch affix file as well.

Maybe Laszlo will be willing to add this to the code.
Are you willing to add this to the feature request list, Laszlo?
Ruud

> Best regards,
> Goran Rakic
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell SFX rule to delete any character

ge-7
In reply to this post by Goran Rakic
For the sake of precisity:

SFX D   y     ied        [^aeiou]y

Field
-----
1     SFX         - indicates this is a suffix
2     D           - is the name of the character which represents this affix
3     y           - the string of chars to strip off before adding affix
                         (a 0 here indicates the NULL string)
4     ied         - the string of affix characters to add
                         (a 0 here indicates the NULL string)
5     [^aeiou]y   - the conditions which must be met before the affix
                    can be applied

The third field indicates the **string of chars** to strip (not just
a single character).

Goran asks for a character like '?' that would cause to strip
any character, or ?a? would mean strip all 3 character
strings, whose middle character is an 'a'.

The only problem I can see is, if any language uses ? as a
regular character. Maybe that could be handled with some REP mechanism.

I do not quite understand, why this request. Affix handling is
optimized, and even big affix files, much bigger than ever required for
slavic or german languages are handled now with a reasonable speed.

>>
I am sorry I misunderstood. Your request is a good one. Since 0 means
nothing, it is reasonable to suggest a number as the number of chars to
be dropped.
It would reduce the number of items in the new Dutch affix file as well.
<<


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hunspell SFX rule to delete any character

Goran Rakic
In reply to this post by Goran Rakic
У пон, 18. 01 2010. у 11:11 +0200, ge пише:
>
> I do not quite understand, why this request. Affix handling is
> optimized, and even big affix files, much bigger than ever required for
> slavic or german languages are handled now with a reasonable speed.

It would help me in creating a compiler to translate Unitex FST-like
inflection grammar into an affix rule definition.

I do not know about what benefits this will make for others but I would
like to know if there is an interest for such feature.

Best regards,
Goran



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]