[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: announcing thaiword.el?
From: |
Miles Bader |
Subject: |
Re: announcing thaiword.el? |
Date: |
Tue, 29 Mar 2005 17:35:15 +0900 |
On Mon, 28 Mar 2005 09:47:09 +0900 (JST), Kenichi Handa <address@hidden> wrote:
> To handle the regular expression "\\b" and "\\B" correctly
> for Thai, we need a bigger change in regex.c. For the
> moment, I have no idea how to do that.
Current extensions to "word syntax", using `word-separating-categories'
etc., seem to do the correct thing with regexps.[*] Perhaps some
extension to that mechanism would work.
For instance, what if entries in `word-separating-categories' could have an
optional predicate function -- in addition to the current (CAT1 . CAT2)
format, allow (CAT1 CAT2 PREDICATE-FUN), and only consider the entry to
match if PREDICATE-FUN fun (with some apropriate args) also returns true?
Then for a case like Thai, where you want to do more complicated tests
to establish word-boundaries inside sequences of non-delimited text,
could use a "degenerate" entry in `word-separating-categories' with both
CAT1 and CAT2 the same, but also with a predicate attached to do the
more complicated test. I suppose that would slow down word matching
when the predicate is called, but it would only happen for text where
that is appropriate.
-Miles
[*] I was surprised that this is true, and I don't understand why from
my quick look at regex.c :-/ ... But my simple tests seem to show
that it does really work. E.g., I can add '(?C . ?C) to
`word-separating-categories', and then a regexp search will suddenly
start considering every single kanji character as a standalone word.
--
Do not taunt Happy Fun Ball.
- Re: announcing thaiword.el?, (continued)
- Re: announcing thaiword.el?, Richard Stallman, 2005/03/26
- Re: announcing thaiword.el?, Kenichi Handa, 2005/03/27
- Re: announcing thaiword.el?, Richard Stallman, 2005/03/28
- Re: announcing thaiword.el?, Kim F. Storm, 2005/03/29
- Re: announcing thaiword.el?, Kenichi Handa, 2005/03/29
- Re: announcing thaiword.el?, Juri Linkov, 2005/03/29
- Re: announcing thaiword.el?,
Miles Bader <=
- Re: announcing thaiword.el?, Kenichi Handa, 2005/03/29
- Re: announcing thaiword.el?, Miles Bader, 2005/03/29
- Re: announcing thaiword.el?, Kenichi Handa, 2005/03/29