gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fwd: Re: [Gnumed-devel] phrase wheel


From: Karsten Hilbert
Subject: Re: Fwd: Re: [Gnumed-devel] phrase wheel
Date: Sun, 14 Sep 2003 18:18:03 +0200
User-agent: Mutt/1.3.22.1i

OK, phrase wheel 101 time.

When the user typed a small number of characters (say, less
than three) it does not make sense to look for true substrings
in the possible matches. But it does make sense to check for
those matches that *start* with the input.

When typing more it starts making sense to look for true
substrings. It can be argued, that there's an intermediate
range where it does make sense to look for matches at *word
boundaries* inside phrases but not yet at *arbitrary* locations
inside the phrase. This would be when the user typed, say, 3-4
characters. In this mode the following holds true:

"pain" matches "address@hidden ancle"
"pain" matches "left ancle painful"
"pain" matches "pain=severe"

"pain" DOES NOT match "repainted house"

When typing even more the phrase wheel starts matching even
true substrings such that eventually:

"pain" does match "repainted house"

(Of course, the thresholds are parameters.)

Now, Richard pointed out that some characters need to be
ignored:

"?aneurysm" should loose the "?" for matching.

Those are stripped.

The next step is to check whether there are matches at word
boundaries inside the phrase (see mode 2 above when having
typed moderate amounts of characters). The following
characters in any combination/order but at least one of them
count as a word separator inside a phrase:

-           (drop-dead painful, perhaps debatable)
<space>     (pain severe)
<tab>       (pain   severe)
=           (pain=severe)
+           (pain+swelling)
&           (pain&swelling)
:           (pain:severe)
_           (pain_severity, this is debatable)
@           (address@hidden)

Hence:
> > default_word_separators = re.compile('[- \t=+&:address@hidden')

> > default_phrase_separators = re.compile('[;/|]+')
Those are used to separate input terms to be used as fragments
for matching. If in a lab request phrase wheel I type

"ESR;CRP;DBC" or "ESR/CRP/ASL/DBC" ...

I'd like the phrase wheel to deliver matches based on the
phrase I am in, say the "CRP" part of the input.

> > default_ignored_chars = re.compile("""[?!."'\\(){}\[\]<>~#*$%^]+""")
Well, those get ignored right from the beginning.

> I'm sorry but I don't understand the concept of these changes. Do you want
> to match groups of words / whole phrases ? 
Yes. Except that I treat it as one "word" separated by
phrase_separators. And it accumulates a "phrase match
probability" that does NOT break down into the probabilities
for the individual words (one can, of course, use single
word phrases, too).

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346




reply via email to

[Prev in Thread] Current Thread [Next in Thread]