Hierarchical Keyword Tree

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Hierarchical Keyword Tree

Leonard Mada
 > German "Schlagwort" vs "Stichwort"
I do not know if there is an English equivalent for the two terms. I
believe you have in English only keywords, which are actually
"Stichwoerter". "Schlagwoerter" would be some kind of keywords, too
(like used in Indices), but there is no distinction between the two, as
far as I know.

What I want is an amalgam of both, and even more than that. Simple
keywords are to primitive and do not offer the wanted advantages when
you want to search something. e.g. I recently searched for the term
"febrile neutropenia" on Pubmed and retrieved 1883 search results. This
search was not the most sensitive, though. Searching for "febrile" and
"neutropenia" yields 3500 results. Searching for "fever" and
"neutropenia" results in 3283 hits.

As the sensitivity of the search increases, so drops the specificity.
Most of those documents would have been useless for me. And by the way,
"febrile neutropenia" is not such a common term. If you search for
something common, you would have one-two orders of magnitude more search
results.

There is definitively the need for something better, and I believe a
form of hierarchical keywords (or tags) could offer some relief, but
there is definitely need for a more thorough thought on this subject.

As I described on the wiki page: the endocarditis example (infection of
heart valves)
 - in endocarditis heart valves are most often infected (but not
exclusively):
  -- so most of the time endocarditis implies "heart valves", too
  -- I may want sometime to search more extensively for heart valves;
the option would be to:
  -- add "heart valves" as a keyword to every article on endocarditis;
    --- but the keyword list would become very fast a huge list (because
I would have to enter other terms as well, like cardiology, various
bacteria and many more)
    --- many terms can be selectively used on some articles, so applying
them indiscriminately will result in a severe loss of specificity for
the search:
    --- e.g.: most endocarditis causes bacteremia (bacteria in the
blood), yet not all
    --- bacteremia can also cause endocarditis (i.e. be the reason for
endocarditis)
    --- however I would add bacteremia as a keyword only when
specifically studied in the article (to maintain a high specificity)
    --- yet for a more general search on bacteremia, I would include
endocarditis, too, in my search protocol
    --- of course, the search could often be done without that
hierarchical tree, by manually including all the search strings in the
query, but the query would look odd and be difficult to understand (and
many users wouldn't be able even to write it correctly); you would easy
forget to include some indirect search term;
 - to expand your example: Nonfiction -> Guidebooks -> Cooking -> Asian
meals: I may want to specifically search for 'Asian meals'; another time
for Cooking (including Asian meals) and still another time !!only!! for
'Guidebooks' (excluding books on Cooking or any other specific
'Guide'-book, i.e. generally on guidebooks). To expand it on
endocarditis: I may want to search on endocarditis or infection
(including endocarditis and other infections), or more generally
articles dealing broadly with infections (but not with specific
infections, like endocarditis).

 > "Stichworte" are not usually stored hierarchically
 - see comment on sensitivity vs specificity: adding every possible
keyword to the list would make these lists huge,
 - reduce the specificity, and
 - it would be notoriously cumbersome to physically add all those
keywords to the list (and not to forget one)

I believe that hierarchical keyword lists/ trees could offer a very
powerful mechanism for such searches (because one would be able to
dynamically change the tree structure to be best suited for the
particular search).

Also, this way you do not have always to remember every keyword ("tag")
that should be included in the tree (the tree is simply there; no user
would create for every new search a new, very different tree; rather,
most trees would be used for a number of searches, and a new tree would
most often be a tweek of a previous tree, not a de novo invention).

I have over >2500 articles on my PC. They are arranged hierarchically in
subdirectories. The problem is:
 - articles may belong to more than one directory (aka category)
  -- I would like to have more than one tree for my articles, but you
can't do this on a filesystem
 - I need sometime searches on more than one subdirectory from different
directory trees (this is indeed difficult to do on a file system)
 - there are many other limitations, but currently its the best method
to organise so many articles

When you have so many articles, the organization of them becomes a real
nightmare.

I believe that hierarchical keywords are a good start (!!and I do not
have any better idea right know!!). Therefore, I believe that a little
brainstorming would be quite useful.



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Keyword Tree

Bruce D'Arcus

On Sep 29, 2006, at 12:31 PM, Leonard Mada wrote:

> What I want is an amalgam of both, and even more than that. Simple
> keywords are to primitive and do not offer the wanted advantages when
> you want to search something. e.g. I recently searched for the term
> "febrile neutropenia" on Pubmed and retrieved 1883 search results.
> This search was not the most sensitive, though. Searching for
> "febrile" and "neutropenia" yields 3500 results. Searching for "fever"
> and "neutropenia" results in 3283 hits.

Thanks for this practical example Leonard. It's a good one!

> As the sensitivity of the search increases, so drops the specificity.
> Most of those documents would have been useless for me. And by the
> way, "febrile neutropenia" is not such a common term. If you search
> for something common, you would have one-two orders of magnitude more
> search results.
>
> There is definitively the need for something better, and I believe a
> form of hierarchical keywords (or tags) could offer some relief, but
> there is definitely need for a more thorough thought on this subject.

Given all the stuff going on with RDF these days, take a look at SKOS:

<http://www.xml.com/pub/a/2005/06/22/skos.html>

It's basically a hierarchical concept vocabulary. So you define a
controlled taxonomy by creating concepts, and specifying their
relations. To then link into that taxonomy, you just have something
like:

        <dc:subject rdf:resource="http://ex.net/joe/concepts/foo"/>

My taxonomy is here (though not very well developed):

        <http://www.users.muohio.edu/darcusb/meta/topics>

For software that understands RDF and SKOS, then, it will know that foo
is, say, a more narrow concept of bar. I think that's what we all need.

The trick is how to do this in a GUI, and how to offer users both
useful constraint and flexibility?

I could imagine in, say, a web service environment, users defining
themselves as parts of particular communities, and then having
dedicated taxonomies that could be plugged in, all fairly seamlessly.

So imagine a general taxonomy, maybe hooked into wordnet and linked to
Wikipedia, and then more focused ones for different communities (law,
medicine, philosophy). You add and search by tags using auto-complete.

> Also, this way you do not have always to remember every keyword
> ("tag") that should be included in the tree (the tree is simply there;
> no user would create for every new search a new, very different tree;
> rather, most trees would be used for a number of searches, and a new
> tree would most often be a tweek of a previous tree, not a de novo
> invention).

As a user I find this very attractive. The problem is, what happens
when the term you want doesn't exist?

Note, I am explicitly thinking of a multi-user -- internet-scale --
context here. Zotero is going live next week, and these guys are really
going to push innovation in this space. They are adding auto-complete
tags, and will later be adding server functionality and
social-networking/data merging and such. So I think we need to think in
internet-scale terms.

> I have over >2500 articles on my PC. They are arranged hierarchically
> in subdirectories. The problem is:
> - articles may belong to more than one directory (aka category)
>  -- I would like to have more than one tree for my articles, but you
> can't do this on a filesystem
> - I need sometime searches on more than one subdirectory from
> different directory trees (this is indeed difficult to do on a file
> system)
> - there are many other limitations, but currently its the best method
> to organise so many articles
>
> When you have so many articles, the organization of them becomes a
> real nightmare.

I think you'll really appreciated Zotero. It has hierarchical
"collections", which are just virtual folder (though in the current
version, selection a parent collection does not select the children;
hopefully they've fixed that).

So there tags and collections are two different ways to organization
items.

> I believe that hierarchical keywords are a good start (!!and I do not
> have any better idea right know!!). Therefore, I believe that a little
> brainstorming would be quite useful.

Agreed. Take a look at Zotero next week and see what you think. I'm in
touch with the developers, so hopefully we can keep moving towards a
really innovative solution here where we all benefit.

Bruce

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Keyword Tree

Bruce D'Arcus

On Sep 30, 2006, at 10:55 AM, Bruce D'Arcus wrote:

> So imagine a general taxonomy, maybe hooked into wordnet and linked to
> Wikipedia, and then more focused ones for different communities (law,
> medicine, philosophy). You add and search by tags using auto-complete.

And I guess, realistically, there'd be a user taxonomy, which would be
any new term the user added (that was not chosen through auto-complete
selection). They might then be able to specify the relations between
their terms, and maybe the defined taxonomies.

Alternately, in a more free-flowing alternative, a community's taxonomy
is just the merged (and hopefully normalized) tags of its users, with
some ability to relate them.

Bruce

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Keyword Tree

Bruce D'Arcus
In reply to this post by Leonard Mada
Hah, just came across this after sending the last message. The "fly
through" comment refers to their use of -- you guess it -- hierarchical
tags! Not sure how they're doing it internally, but storage is RDF, so
I'd guess it's something like what I was talking about with SKOS.

Bruce

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]