[SoC] [report] Component for guessing the language of text

Jocelyn Merand



Since I have written the last email (dealing with the working component and sent on the July 17th) I haven't been able run it again. I have thought "there is something not rational". In fact, a zombie process ran and blocked everything (unopkg etc.). As I used the hibernation function of my laptop, it has only been killed yesterday night when I have discovered it.

I am sorry to publish this report with at least 5 days of delays.

Fortunately I can today attach to this email an archive of a really working component. You could notice that it does not manage multiguessing (it's not really a problem and I wanted to manage it but my problem has eaten all my time). You could also notice that the guesses of short texts are quite bad (it's the cause of the Unicode future dev). I also must find a best way to inject the text into the C++ method (I use an ASCII one !!).

To run it, please unzip the archive into your SDK directory and run:


cd <SDK_home>/LG/cpp_guesser

make clean


cd ..

make clean


make guesslangmain.run


(of course you must setup your environment with setsdkenv )


For windows user, this would makes you in distress; I haven't made the makefile customisation. I will try to do it soon.



Since a week I have tried to customise the component Thomas sent me. I also have worked on a set of class that should be used to manage rules (and obviously Unicode ones). I think I will be able to use it in 1 week.



