Thesaurus Server

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Thesaurus Server

Graham Lauder-3


A large Corporate client of mine has offered to host and manage a "My
Thesaurus"  setup  on one of their company servers.  This includes using
one of their staff who is familiar with php and MYSQL to manage it.  The
Thesaurus will be for  "Commonwealth" English, ie  en_GB and it's
variants en_NZ , en_ZA, en_AU and so forth.

Unfortunately my skills in this area are nil, and he would like to
discuss this  with one of the Lingucomponent Project Leads or someone
else more familiar with My Thesaurus than me.

Mail me direct and I'll pass on his email address.

--
Graham Lauder,

INGOTs Assessor Trainer
Moderator New Zealand
(International Grades in Office Technologies)
www.theingots.org
www.theingots.org.nz

OpenOffice.org MarCon (Marketing Contact) NZ
http://marketing.openoffice.org/contacts.html

www.ooogear.co.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Shaun McDonald
Hi Graham,
On 13 Mar 2007, at 16:46, Graham Lauder wrote:

>
>
> A large Corporate client of mine has offered to host and manage a "My
> Thesaurus"  setup  on one of their company servers.  This includes  
> using
> one of their staff who is familiar with php and MYSQL to manage  
> it.  The
> Thesaurus will be for  "Commonwealth" English, ie  en_GB and it's
> variants en_NZ , en_ZA, en_AU and so forth.

Please be aware of the issue:
<http://www.openoffice.org/issues/show_bug.cgi?id=66919> as the user  
interface elements of the various non-US English locales are not  
properly translated.

Also at the moment there are only 3 English variants available:
en-US
en-GB
en-ZA

Shaun

>
> Unfortunately my skills in this area are nil, and he would like to
> discuss this  with one of the Lingucomponent Project Leads or someone
> else more familiar with My Thesaurus than me.
>
> Mail me direct and I'll pass on his email address.
>
> --
> Graham Lauder,
>
> INGOTs Assessor Trainer
> Moderator New Zealand
> (International Grades in Office Technologies)
> www.theingots.org
> www.theingots.org.nz
>
> OpenOffice.org MarCon (Marketing Contact) NZ
> http://marketing.openoffice.org/contacts.html
>
> www.ooogear.co.nz
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: dev-
> [hidden email]
>


       
       
               
___________________________________________________________
All new Yahoo! Mail "The new Interface is stunning in its simplicity and ease of use." - PC Magazine
http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Graham Lauder-3
Shaun McDonald wrote:

> Hi Graham,
> On 13 Mar 2007, at 16:46, Graham Lauder wrote:
>
>>
>>
>> A large Corporate client of mine has offered to host and manage a "My
>> Thesaurus"  setup  on one of their company servers.  This includes using
>> one of their staff who is familiar with php and MYSQL to manage it.  The
>> Thesaurus will be for  "Commonwealth" English, ie  en_GB and it's
>> variants en_NZ , en_ZA, en_AU and so forth.
>
> Please be aware of the issue:
> <http://www.openoffice.org/issues/show_bug.cgi?id=66919> as the user
> interface elements of the various non-US English locales are not
> properly translated.
>
> Also at the moment there are only 3 English variants available:
> en-US
> en-GB
> en-ZA
>
> Shaun

Hey Shaun,

There is in fact an en_AU one that is based on the GB one.
(http://wiki.services.openoffice.org/wiki/Dictionaries#English_.28United_Kingdom.2C_....29)
(http://www.justlocal.com.au/clients/oooau/)
This is the one we've used in the companies custom OOo build and it
works fine.  The reason we need a specific NZ one is that there are many
Maori words that have made their way into everyday use in NZ.  So for
instance in an NZ thesaurus, a synonym for "good" would be "kapai", for
food; kai, for broken: patu.  in an en_UK or en_US thesaurus these would
be nonsense.  All these words are included in the en_NZ spellcheck
dictionaries.  The idea is to set up a My Thesaurus en_NZ server and
then add the others as and if needed.

Most people tend to use the en_US interface with the en_other
dictionaries and Thes.  That's what we are doing at this point.



Cheers
GL

>
>>
>> Unfortunately my skills in this area are nil, and he would like to
>> discuss this  with one of the Lingucomponent Project Leads or someone
>> else more familiar with My Thesaurus than me.
>>
>> Mail me direct and I'll pass on his email address.
>>
>> --Graham Lauder,


--
Graham Lauder,

INGOTs Assessor Trainer
Moderator New Zealand
(International Grades in Office Technologies)
www.theingots.org
www.theingots.org.nz

OpenOffice.org MarCon (Marketing Contact) NZ
http://marketing.openoffice.org/contacts.html

www.ooogear.co.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Marcin Miłkowski
Hi all,

it seems to be a big misunderstanding here: My Thesaurus isn't a server
software and you really cannot setup it on any web server. What you can
do, however, to edit thesauruses, is to install Open Thesaurus (see
www.openthesaurus.de) which is used successfully to prepare OOo
thesauruses for many languages.

Best regards,
Marcin

Graham Lauder napisał(a):

> Shaun McDonald wrote:
>> Hi Graham,
>> On 13 Mar 2007, at 16:46, Graham Lauder wrote:
>>
>>>
>>> A large Corporate client of mine has offered to host and manage a "My
>>> Thesaurus"  setup  on one of their company servers.  This includes using
>>> one of their staff who is familiar with php and MYSQL to manage it.  The
>>> Thesaurus will be for  "Commonwealth" English, ie  en_GB and it's
>>> variants en_NZ , en_ZA, en_AU and so forth.
>> Please be aware of the issue:
>> <http://www.openoffice.org/issues/show_bug.cgi?id=66919> as the user
>> interface elements of the various non-US English locales are not
>> properly translated.
>>
>> Also at the moment there are only 3 English variants available:
>> en-US
>> en-GB
>> en-ZA
>>
>> Shaun
>
> Hey Shaun,
>
> There is in fact an en_AU one that is based on the GB one.
> (http://wiki.services.openoffice.org/wiki/Dictionaries#English_.28United_Kingdom.2C_....29)
> (http://www.justlocal.com.au/clients/oooau/)
> This is the one we've used in the companies custom OOo build and it
> works fine.  The reason we need a specific NZ one is that there are many
> Maori words that have made their way into everyday use in NZ.  So for
> instance in an NZ thesaurus, a synonym for "good" would be "kapai", for
> food; kai, for broken: patu.  in an en_UK or en_US thesaurus these would
> be nonsense.  All these words are included in the en_NZ spellcheck
> dictionaries.  The idea is to set up a My Thesaurus en_NZ server and
> then add the others as and if needed.
>
> Most people tend to use the en_US interface with the en_other
> dictionaries and Thes.  That's what we are doing at this point.
>
>
>
> Cheers
> GL
>
>>> Unfortunately my skills in this area are nil, and he would like to
>>> discuss this  with one of the Lingucomponent Project Leads or someone
>>> else more familiar with My Thesaurus than me.
>>>
>>> Mail me direct and I'll pass on his email address.
>>>
>>> --Graham Lauder,
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Shaun McDonald
In reply to this post by Graham Lauder-3

On 14 Mar 2007, at 00:59, Graham Lauder wrote:

> Shaun McDonald wrote:
>> Hi Graham,
>> On 13 Mar 2007, at 16:46, Graham Lauder wrote:
>>
>>>
>>>
>>> A large Corporate client of mine has offered to host and manage a  
>>> "My
>>> Thesaurus"  setup  on one of their company servers.  This  
>>> includes using
>>> one of their staff who is familiar with php and MYSQL to manage  
>>> it.  The
>>> Thesaurus will be for  "Commonwealth" English, ie  en_GB and it's
>>> variants en_NZ , en_ZA, en_AU and so forth.
>>
>> Please be aware of the issue:
>> <http://www.openoffice.org/issues/show_bug.cgi?id=66919> as the user
>> interface elements of the various non-US English locales are not
>> properly translated.
>>
>> Also at the moment there are only 3 English variants available:
>> en-US
>> en-GB
>> en-ZA
>>
>> Shaun
>
> Hey Shaun,
>
> There is in fact an en_AU one that is based on the GB one.
> (http://wiki.services.openoffice.org/wiki/Dictionaries#English_.
> 28United_Kingdom.2C_....29)
> (http://www.justlocal.com.au/clients/oooau/)
> This is the one we've used in the companies custom OOo build and it
> works fine.  The reason we need a specific NZ one is that there are  
> many
> Maori words that have made their way into everyday use in NZ.  So for
> instance in an NZ thesaurus, a synonym for "good" would be "kapai",  
> for
> food; kai, for broken: patu.  in an en_UK or en_US thesaurus these  
> would
> be nonsense.  All these words are included in the en_NZ spellcheck
> dictionaries.  The idea is to set up a My Thesaurus en_NZ server and
> then add the others as and if needed.
>
> Most people tend to use the en_US interface with the en_other
> dictionaries and Thes.  That's what we are doing at this point.
>

Thanks for the update. What I said still applies for the user  
interface at this time.

Shaun



               
___________________________________________________________
Now you can scan emails quickly with a reading pane. Get the new Yahoo! Mail. http://uk.docs.yahoo.com/nowyoucan.html

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Graham Lauder-3
In reply to this post by Marcin Miłkowski
Marcin Miłkowski wrote:

> Hi all,
>
> it seems to be a big misunderstanding here: My Thesaurus isn't a
> server software and you really cannot setup it on any web server. What
> you can do, however, to edit thesauruses, is to install Open Thesaurus
> (see www.openthesaurus.de) which is used successfully to prepare OOo
> thesauruses for many languages.
>
> Best regards,
> Marcin

Erk, sorry yes I actually meant Openthesaurus.

 There, you see, just proves how ignorant I am with regard to this!  ;)

Cheers
G

>
> Graham Lauder napisał(a):
>> Shaun McDonald wrote:
>>> Hi Graham,
>>> On 13 Mar 2007, at 16:46, Graham Lauder wrote:
>>>
>>>>
>>>> A large Corporate client of mine has offered to host and manage a "My
>>>> Thesaurus"  setup  on one of their company servers.  This includes
>>>> using
>>>> one of their staff who is familiar with php and MYSQL to manage
>>>> it.  The
>>>> Thesaurus will be for  "Commonwealth" English, ie  en_GB and it's
>>>> variants en_NZ , en_ZA, en_AU and so forth.
>>> Please be aware of the issue:
>>> <http://www.openoffice.org/issues/show_bug.cgi?id=66919> as the user
>>> interface elements of the various non-US English locales are not
>>> properly translated.
>>>
>>> Also at the moment there are only 3 English variants available:
>>> en-US
>>> en-GB
>>> en-ZA
>>>
>>> Shaun
>>
>> Hey Shaun,
>>
>> There is in fact an en_AU one that is based on the GB one.
>> (http://wiki.services.openoffice.org/wiki/Dictionaries#English_.28United_Kingdom.2C_....29)
>>
>> (http://www.justlocal.com.au/clients/oooau/)
>> This is the one we've used in the companies custom OOo build and it
>> works fine.  The reason we need a specific NZ one is that there are many
>> Maori words that have made their way into everyday use in NZ.  So for
>> instance in an NZ thesaurus, a synonym for "good" would be "kapai", for
>> food; kai, for broken: patu.  in an en_UK or en_US thesaurus these would
>> be nonsense.  All these words are included in the en_NZ spellcheck
>> dictionaries.  The idea is to set up a My Thesaurus en_NZ server and
>> then add the others as and if needed.
>>
>> Most people tend to use the en_US interface with the en_other
>> dictionaries and Thes.  That's what we are doing at this point.
>>
>>
>>
>> Cheers
>> GL
>>
>>>> Unfortunately my skills in this area are nil, and he would like to
>>>> discuss this  with one of the Lingucomponent Project Leads or someone
>>>> else more familiar with My Thesaurus than me.
>>>>
>>>> Mail me direct and I'll pass on his email address.
>>>>
>>>> --Graham Lauder,
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Graham Lauder,

INGOTs Assessor Trainer
Moderator New Zealand
(International Grades in Office Technologies)
www.theingots.org
www.theingots.org.nz

OpenOffice.org MarCon (Marketing Contact) NZ
http://marketing.openoffice.org/contacts.html

www.ooogear.co.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Graham Lauder-3
In reply to this post by Graham Lauder-3
Quoting Daniel Naber <[hidden email]>:


> > On Thursday 15 March 2007 03:45, you wrote:
> >
> > Hi Graham,
> >
>  
>> >> Should we add all these influences into a en_CW (Commonwealth)? I don't
>> >> know, although personally, I think there is a good case to be made for
>> >> it, but for that to happen we need the ability to add these words to the
>> >> thesaurus.  Toll NZ is offering to host such a facility on their
>> >> servers.  Is there one elsewhere?
>>    
> >
> > it's rather important to understand that the English thesaurus of
> > OpenOffice.org is based on Wordnet (http://wordnet.princeton.edu/), i.e.
> > it is not being developped using OpenThesaurus. So first the data would
> > need to be imported into OpenThesaurus. However, as soon as there's a new
> > version of Wordnet, it will be very difficult to merge those changes into
> > the version in OpenThesaurus which has now been enhanced with Commonwealth
> > words.
>  

Actually I think that the important thing to understand is that the NON-US
thesaurus is woeful, true only marginally worse than US one but
noticeably.  My client's staff all noted this as a serious concern and only just short of a
showstopper for adoption.


> >
> > I see two options
> >
> > 1. Ignore the problem above and accept that the thesauri diverge over time
>  

Such is the way of language and especially the English language


> >
> > 2. Contact the Wordnet people and ask them if there's a way to contribute
> > Commonwealth words in a way they are marked in the data, i.e. they can be
> > filtered out for those who don't want them.
> >
> > 2 is clearly the better solution. Thanks for your offer to host the
> > thesaurus, but the technical hosting is actually the easy part. What is
> > needed for a good long-term solution is someone who's willing to work on
> > solution 2 above.
>  

Sorry but solution 2 is not a solution at all.  Princeton is dealing with
th_en_US and I presume th_en_GB not en_NZ or for that matter AU and ZA and so
on

If OOo uses the latest Wordnet data then it is demonstrably inadequate
as far as Non US English is concerned.  We need an OpenSource Solution.


> >
> > Regards
> > Daniel
> >
>  


--
Graham Lauder,

INGOTs Assessor Trainer
Moderator New Zealand
(International Grades in Office Technologies)
www.theingots.org
www.theingots.org.nz

OpenOffice.org MarCon (Marketing Contact) NZ
http://marketing.openoffice.org/contacts.html

www.ooogear.co.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Marcin Miłkowski
Hi Graham,

Graham Lauder napisał(a):

>>> 2. Contact the Wordnet people and ask them if there's a way to contribute
>>> Commonwealth words in a way they are marked in the data, i.e. they can be
>>> filtered out for those who don't want them.
>>>
>>> 2 is clearly the better solution. Thanks for your offer to host the
>>> thesaurus, but the technical hosting is actually the easy part. What is
>>> needed for a good long-term solution is someone who's willing to work on
>>> solution 2 above.
>>  
>
> Sorry but solution 2 is not a solution at all.  Princeton is dealing with
> th_en_US and I presume th_en_GB not en_NZ or for that matter AU and ZA and so
> on

If you want to build an English thesaurus that is better than Wordnet,
then, well, good luck, but remember - you have been warned.

I'd recommend searching for local English Wordnets (or similar
linguistic projects), maybe there are Australian versions. Trying to
build a new thesaurus from scratch is simply futile. See:

http://www.globalwordnet.org/


Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Graham Lauder
On Saturday 17 March 2007 11:39, Marcin Miłkowski wrote:

> Hi Graham,
>
> Graham Lauder napisał(a):
> >>> 2. Contact the Wordnet people and ask them if there's a way to
> >>> contribute Commonwealth words in a way they are marked in the data,
> >>> i.e. they can be filtered out for those who don't want them.
> >>>
> >>> 2 is clearly the better solution. Thanks for your offer to host the
> >>> thesaurus, but the technical hosting is actually the easy part. What is
> >>> needed for a good long-term solution is someone who's willing to work
> >>> on solution 2 above.
> >
> > Sorry but solution 2 is not a solution at all.  Princeton is dealing with
> > th_en_US and I presume th_en_GB not en_NZ or for that matter AU and ZA
> > and so on
>
> If you want to build an English thesaurus that is better than Wordnet,
> then, well, good luck, but remember - you have been warned.

Warned?  about what?  If the Wordnet list is Opensource then what is the
issue?

>
> I'd recommend searching for local English Wordnets (or similar
> linguistic projects), maybe there are Australian versions.

I've already done that, those that I've seen are small operations run by a
single enthusiast on a small backroom server that has a single point of
failure.  

My client is a large Multinational corporate that runs the largest
distribution network in NZ and Australia that includes Rail, Shipping, Air
and Road.  The ISS department with which I deal employs 70 highly skilled IT
staff.  Without exception, they identified the thesaurus as a weak link in
OOo and as such are offering to help solve this problem.

This is NOT a trivial offer, this is an opportunity for the project to develop
the Thesaurus component in OOo.   My Client will assign space on their
servers,  a Project Manager to look after the project and so forth  

>Trying to
> build a new thesaurus from scratch is simply futile.

It's a good thing Mr Roget didn't think that.

>See:
>
> http://www.globalwordnet.org/


There is a  problem in the thesaurus implementation in OOo. Setting OOo to
English (UK) in options>languages results in the thesaurus being greyed out.  
The assumption then is that there is no thesaurus installed for that
localisation... ie English English.  Given that this is the source of all
English variations, not having a Thesaurus is patently absurd.  therefore
wordnet is not doing the job and a fix is  needed

Right now all I see is barriers that need removing

If therefore Openthesaurus is a bad option, the assumption I take from what
you are saying is; setting up a local Wordnet is the best alternative.  

Does that allow for community input?
What is the step from Wordnet Database to installed Thesaurus in OOo?
Where can I find someone who can exchange emails with my clients people to get
them under way



Cheers
GL
--
"GET LEGAL - GET OPENOFFICE.ORG"
http://why.openoffice.org
ISO 26300 compliant

Graham Lauder,
OpenOffice.org MarCon (Marketing Contact) NZ
http://marketing.openoffice.org/contacts.html

INGOTs Assessor Trainer
(International Grades in Office Technologies)
www.theingots.org.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Marcin Miłkowski
Hi Graham,
>
> Warned?  about what?  If the Wordnet list is Opensource then what is the
> issue?

I understood you want to start from _scratch_. Wordnet was sponsored by
a 20-million USD grant, and done by a team of really qualified
linguists. And it is one of the biggest achievements in computer
linguistics as such. So you know that trying to beat that requires a lot
of resources, and IT resources are really not so important.

If you want to extend Wordnet, then it's another story. Of course, it's
easier to do so.

>> I'd recommend searching for local English Wordnets (or similar
>> linguistic projects), maybe there are Australian versions.
>
> I've already done that, those that I've seen are small operations run by a
> single enthusiast on a small backroom server that has a single point of
> failure.  

The server is a minor issue. The major issue is how to start a team -
single enthusiasts could never achieve that with no remuneration.

>> Trying to
>> build a new thesaurus from scratch is simply futile.
>
> It's a good thing Mr Roget didn't think that.

But it's not 19th century anymore. Roget's thesaurus is really worse
than Wordnet in linguistic terms. And in linguistics, you try to
bootstrap and reuse the data.

> If therefore Openthesaurus is a bad option, the assumption I take from what
> you are saying is; setting up a local Wordnet is the best alternative.  

It is not a bad option. I'm using OpenThesaurus myself. But if you want
to reuse Wordnet, you need to convert it into OpenThesaurus, and this is
a non-trivial task.

You cannot setup a local Wordnet without any software as Wordnet is only
a file. You need an editing environment. You can use some other software
(there are many software packages for professional linguists - used for
building national wordnets - but they could be far too complicated for
an average user).

> What is the step from Wordnet Database to installed Thesaurus in OOo?

Conversion of the database. Take a look at scripts at Daniel Naber's site.

But note that this conversion does not allow any direct edition Wordnet
nor edition of OOo thesaurus.

> Where can I find someone who can exchange emails with my clients people to get
> them under way

No idea. First you have to find someone who has some natural language
processing background, and is able to make mapping between Wordnet's
relations to MySQL database in OpenThesaurus, and make a decision
whether some of the relations are to be discarded or ported into
OpenThesaurus software. I wouldn't start a project without finding the
person who is able to do that - these processes are non-trivial.

I would recommend you to contact linguistics (NLP) departments at
Australian universities. This is a task that make a good postgraduate work.

Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Daniel Naber-9
In reply to this post by Graham Lauder
On Sunday 01 April 2007 04:25, Graham wrote:

> What is the step from Wordnet Database to installed Thesaurus in OOo?
> Where can I find someone who can exchange emails with my clients people
> to get them under way

Marcin has already sent a comprehensive reply so I'd just like to add this:

The Global WordNet Association has a (very low traffic) mailing list where
you will find WordNet experts: http://www.globalwordnet.org/

WordNet as SQL already exists, see
http://wordnet.princeton.edu/links.shtml#SQL
The interesting question is how to follow the changes. Als see
http://wordnet.princeton.edu/links.shtml#mappings

Instead of exchanging emails in private I suggest to use this mailing as
long as it's not purely WordNet-specific.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Graham Lauder-3
In reply to this post by Marcin Miłkowski
On Saturday 31 March 2007 22:17, Marcin Miłkowski wrote:
> Hi Graham,
>
> > Warned?  about what?  If the Wordnet list is Opensource then what is the
> > issue?
>
> I understood you want to start from _scratch_.

I never said that.  In fact I said exactly the opposite.  I'm not sure how you
could have drawn that inference, I apologise if I am not being clear enough.

> Wordnet was sponsored by
> a 20-million USD grant, and done by a team of really qualified
> linguists. And it is one of the biggest achievements in computer
> linguistics as such. So you know that trying to beat that requires a lot
> of resources, and IT resources are really not so important.

But as I keep saying, and as many others keep saying and repeating the OOo
thesaurus that this is based on is seriously substandard especially in
en_GB/NZ/AU/ZA

>
> If you want to extend Wordnet, then it's another story. Of course, it's
> easier to do so.
>
> >> I'd recommend searching for local English Wordnets (or similar
> >> linguistic projects), maybe there are Australian versions.
> >
> > I've already done that, those that I've seen are small operations run by
> > a single enthusiast on a small backroom server that has a single point of
> > failure.
>
> The server is a minor issue. The major issue is how to start a team -
> single enthusiasts could never achieve that with no remuneration.

Excuse me if I'm beginning to sound frustrated,  as I already explained, we
have a team of people, paid people, that are willing to administer the setup.  
They have project managers who will gather the linguists.  What you are
talking about is frankly rather trivial in a corporate space. Assembling
project teams is a daily task in this environment  You are seeing barriers
where there are none.

>
> >> Trying to
> >> build a new thesaurus from scratch is simply futile.
> >
> > It's a good thing Mr Roget didn't think that.
>
> But it's not 19th century anymore. Roget's thesaurus is really worse
> than Wordnet in linguistic terms.

But he didn't think it was futile to try and that was the point.  It's about
atitude and nothing to do with comparisons of quality.

> And in linguistics, you try to
> bootstrap and reuse the data.

Of course, that is why I said we build from the en_GB wordnet if that is what
it requires.

>
> > If therefore Openthesaurus is a bad option, the assumption I take from
> > what you are saying is; setting up a local Wordnet is the best
> > alternative.
>
> It is not a bad option. I'm using OpenThesaurus myself. But if you want
> to reuse Wordnet, you need to convert it into OpenThesaurus, and this is
> a non-trivial task.

OK we seem to have gone in a circle. But the point is it's doable with the
right skills.  

>
> You cannot setup a local Wordnet without any software as Wordnet is only
> a file. You need an editing environment. You can use some other software
> (there are many software packages for professional linguists - used for
> building national wordnets - but they could be far too complicated for
> an average user).

There you go again, barriers.  Forget the barriers, I would like to see
solutions.

Problem:  OOo English thesaurus is demonstrably substandard

Solution:  I don't know, that's why I'm asking, because we have a substantial
benefactor who is willing to commit resources, both financially and
materially to a thesaurus project and who I would like to be able to put in
touch with people in the OOo community that are more knowledgable in this
area than I.    


>
> > What is the step from Wordnet Database to installed Thesaurus in OOo?
>
> Conversion of the database. Take a look at scripts at Daniel Naber's site.
>
> But note that this conversion does not allow any direct edition Wordnet
> nor edition of OOo thesaurus.

Sorry, I'm not sure what you mean here, I'll look at Daniels scripts.

>
> > Where can I find someone who can exchange emails with my clients people
> > to get them under way
>
> No idea. First you have to find someone who has some natural language
> processing background, and is able to make mapping between Wordnet's
> relations to MySQL database in OpenThesaurus, and make a decision
> whether some of the relations are to be discarded or ported into
> OpenThesaurus software. I wouldn't start a project without finding the
> person who is able to do that - these processes are non-trivial.

As I already said, the technical skills are available.  Once the client knows
what skillset is needed he will assign the person most suitable for the job.

>
> I would recommend you to contact linguistics (NLP) departments at
> Australian universities.

Why Australia, I'm not Australian and surprising as it may seem, New Zealand
does have Universities.

>This is a task that make a good postgraduate work.
>
> Regards,
> Marcin
>

--
Graham Lauder,

INGOTs Assessor Trainer
Moderator New Zealand
(International Grades in Office Technologies)
www.theingots.org
www.theingots.org.nz

OpenOffice.org MarCon (Marketing Contact) NZ
http://marketing.openoffice.org/contacts.html

www.ooogear.co.nz

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Daniel Naber-9
In reply to this post by Graham Lauder
On Sunday 01 April 2007 04:25, Graham wrote:

> There is a  problem in the thesaurus implementation in OOo. Setting OOo
> to English (UK) in options>languages results in the thesaurus being
> greyed out.

I cannot reproduce this using OOo 2.2. Both GB and US are mapped to the
files th_de_DE_v2.idx/dat in dictionary.lst. Can you send a small document
that demonstrates this problem?

regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Marcin Miłkowski
In reply to this post by Graham Lauder-3
Graham Lauder napisał(a):
> On Saturday 31 March 2007 22:17, Marcin Miłkowski wrote:
>> Hi Graham,
>>
>>> Warned?  about what?  If the Wordnet list is Opensource then what is the
>>> issue?
>> I understood you want to start from _scratch_.
>
> I never said that.  In fact I said exactly the opposite.  I'm not sure how you
> could have drawn that inference, I apologise if I am not being clear enough.

Then it's really OK to proceed with your project. Sorry if I misunderstood.

> Excuse me if I'm beginning to sound frustrated,  as I already explained, we
> have a team of people, paid people, that are willing to administer the setup.  
> They have project managers who will gather the linguists.  What you are
> talking about is frankly rather trivial in a corporate space. Assembling
> project teams is a daily task in this environment  You are seeing barriers
> where there are none.

You're asking for advice so you're getting advice - I'm only saying that
gathering a skilled team is the most important issue here.

I understood that the company is only offering a technical environment,
and this technical environment is not really a bottleneck.

>>> If therefore Openthesaurus is a bad option, the assumption I take from
>>> what you are saying is; setting up a local Wordnet is the best
>>> alternative.
>> It is not a bad option. I'm using OpenThesaurus myself. But if you want
>> to reuse Wordnet, you need to convert it into OpenThesaurus, and this is
>> a non-trivial task.
>
> OK we seem to have gone in a circle. But the point is it's doable with the
> right skills.  

This is not a circle. See, OpenThesaurus is software, and Wordnet is
data. You need to convert the data to OpenThesaurus format (MySQL
database), and then possibly tweak OpenThesaurus code to map some of
Wordnet's relations (OpenThesaurus has only a limited set of relations).
And it's wiser to plan this beforehand, as adding relations afterward
could be harder.

>> You cannot setup a local Wordnet without any software as Wordnet is only
>> a file. You need an editing environment. You can use some other software
>> (there are many software packages for professional linguists - used for
>> building national wordnets - but they could be far too complicated for
>> an average user).
>
> There you go again, barriers.  Forget the barriers, I would like to see
> solutions.

You'll find several tools to edit Wordnets online but some of them are
too advanced to be useful for enriching Wordnet for Australian, New
Zealand, or other version of English.

I'm suggesting that you should use OpenThesaurus. Is that clear now? And
you need a skilled NLP student to lead the team. Hire one, and the
problem is solved.

>> But note that this conversion does not allow any direct edition Wordnet
>> nor edition of OOo thesaurus.
>
> Sorry, I'm not sure what you mean here, I'll look at Daniels scripts.

Let me reiterate: OOo thesaurus is only a file in a special format,
Daniel's scripts simply change original Wordnet format to OOo format.
There is no special editor for OOo thesaurus formatted files.

> As I already said, the technical skills are available.  Once the client knows
> what skillset is needed he will assign the person most suitable for the job.

IT student or graduate that completed NLP courses will do, it's quite
easy to setup the environment, and quite hard to update the data when
Wordnet gets updated; relations have to be mapped thoughtfully, but it's
doable (it could be a good postgrad work). He can lead the technical
team. And several lexicographers or cognitive linguists could be needed
if you want to have it really good.

>> I would recommend you to contact linguistics (NLP) departments at
>> Australian universities.
>
> Why Australia, I'm not Australian and surprising as it may seem, New Zealand
> does have Universities.

I thought you are Australian, sorry for confusion. Take any university
from a country that needs a modified thesaurus.

Regards,
Marcin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Thesaurus Server

Jonathon-4
In reply to this post by Graham Lauder-3
Graham Lauder wrote:

> Excuse me if I'm beginning to sound frustrated,  as I already explained, we

And at about this point the only thing I can glean is:
* A company is willing to hire people to setup and maintain a server;
* A company is willing to hire people to create a theasaurus;
* A company is willing to give the theasaurus to OOo.

And that all you want to know if there are tools that will:
** Convert some list of words in a theasurus list to a database;
** Convert a different list of words in a theasurus list to that database;
** Allow somebody to add/edit/delete words in that database;
** Convert the output of that database into an OOo Theasurus;

And if those tools exist:
* What are they called;
* Where can they be obtained from;
* Who/what/where can support for them be obtained from;

Am I missing anything?

xan

jonathon

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]