emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Character group folding in searches


From: Artur Malabarba
Subject: Character group folding in searches
Date: Fri, 6 Feb 2015 11:04:03 -0200

This is a follow up on a previous discussion regarding Single quotes in Info.

I've been looking into ways of having the search functions fold
similar characters together. There are a few goals which I'm listing
above to facilitate comparision of possible approaches. Feel free to
mention other highly-important goals, but please don't go into
high-level abstractions (such as letting the user define groups),
these can always be done and are not relevant to this discussion.

1. Follow the `decomposition' char property. For instance, the
character "a" in the search string would match any one of  "aãáâ" (and
so on). This is easy to do, and one of the patches below already shows
how. Note that this won't handle symbols that are actually composed of
multiple characters.

2. Follow an intuitive sense of similarity which is not defined in the
unicode standard. For instance, an ascii single quote in the search
string should match any type of single quote (there are about a dozen
that I know of).

3. Ignore modifier (non-spacing) characters. Another way of writing
"á" is to write "a" followed by a special non-spacing accute. This
kind of thing (a symbol composed of multiple characters) is not
handled by item 1, so I'm listing as a separate point.

4. Perform the conversion two-ways. That is, item 1 should work even
if the search contained "á" instead of "a". Item 2 should match an
ascii quote if the search string contains a curly quote. This is
mostly useful when the user copies a fancy string from somewhere and
pastes it into the search field.

5. It should work for any searching, not just isearch.


Goals 1, 2, and 3 are the most important (in my opinion).
Goals 1 and 2 are achieved by all of the patches below, while the others vary.

-----------------------------------------------------------

Below, I'm attaching 3 patches, they each represent a different way of
achieving part of the above.

* group-folding-with-regexp-lisp.patch

This one takes each input character and either keeps it verbatim or
transform it into a regexp which matches the entire group that this
character represents. It is implemented in isearch.

+ It trivially handles goals 1, 2 and 3. Because regexps are quite
versatile, it is the only solution that handles item 3 (it allows each
character to match more than a single character).
+ Goal 4 can be achieved with a bit more work (the input just needs to
be normalized before turning it into a regexp).
- It is slower than the options below, but it should be fast enough for isearch.
- Goal 5 would take a lot more work. This character parsing would have
to be added to each of search functions (not to mention it might be
too slow for lisp-code searches).

(Note that the attached patch doesn't actually do item 1. That is NOT
a limitation, it can do item 1 quite trivially. I simply haven't done
it yet.)

* group-folding-with-case-table-lisp.patch

This patch is entirely in elisp. I've put it all inside `isearch.el'
for now, for the sake of simplicity, but it's not restricted to
isearch.

It creates a new case-table which performs group folding by borrowing
the case-folding machinery, so it is very fast. Then, group folding
can be achieved by running the search inside a `with-group-folding`
macro. There's also an example implementation which turns it on for
isearch by default.

+ It immediately satisfies items 1, 2, 4, and 5.
+ It is very fast.
- It has no simple way of achieving item 3.

(Note that the attached patch doesn't actually do item 2. That is NOT
a limitation, it can do item 2 quite trivially. I simply haven't done
it yet.)

* group-folding-with-case-table-C.patch

This patch defines a new char-table and uses it instead of
case_canon_table when the group-fold-search variable is non-nil.

This shares the advantages and disadvantages of the lisp patch above
but, in addition:
+ You don't need a `with-group-folding' macro, all you need is to (let
((group-fold-search t)) ...) around the search which is more in terms
with how case-folding works.
- If the user decides to set `group-fold-search' to t, this can break
existing code (a disadvantage that the lisp version above does not
have).
- It adds two extra fields to every buffer object (the boolean
variable and the char table).

(Note that compiling this last patch gives a crashing executable for
me. I'm just putting it here to showcase the option.)

---------------------

My question is:

Do any of these options seem good enough? Which would you all like to explore?
I like the second one best, but goal 3 is quite important.

Attachment: group-folding-with-case-table-C.patch
Description: Text Data

Attachment: group-folding-with-regexp-lisp.patch
Description: Text Data

Attachment: group-folding-with-case-table-lisp.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]