[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: extending case-fold-search to remove nonspacing marks (diacritics et
From: |
Artur Malabarba |
Subject: |
Re: extending case-fold-search to remove nonspacing marks (diacritics etc.) |
Date: |
Fri, 6 Feb 2015 02:32:46 +0000 |
2015-02-05 22:54 GMT-02:00 Juri Linkov <address@hidden>:
>> Something essentially identical to this was being discussed here a
>> couple of weeks ago. Look for the thread "Single quotes in Info". I
>> wrote a small elisp solution for building this into isearch (which you
>> can find on the "scratch/isearch-character-group-folding" branch). It
>> took a different approach to yours, relating characters to regexp, but
>> it works.
>
> I see that your branch contains nothing more than was already implemented
> a long time ago in bug#13041 where the major stumbling block was
> an inefficiency of the regexp-based solution. Could you help to improve it?
I'll have a look. The code I wrote was fast enough for isearch and I'm
starting to convince myself it was the best solution.
The motivation behind extending case-fold tables was to make it fast
enough to use on any search, and also have it work on some very
corner-case situations. Combine this with the core-dump issue I've hit
while trying to implement it, and you have a recipe for my fast
diminishing motivation to do this.
>> The bright side is that I think this two-char way of writing latin
>> accents is much less common (not 100% sure though, it's hard to tell
>> the difference). The downside is that I know nothing about other
>> languages, so maybe using two chars to represent one char is the
>> default behavior in some other languages?
>
> As https://emacs.stackexchange.com/q/7992/478 indicates,
> other languages require insertion/deletion of special characters
> like diacritics/accents from the search string/buffer for normalization.
>
> When looking for a solution I recommend you to check ucs-normalize.
> For example, evaluating:
>
> (require 'ucs-normalize)
> ucs-normalize-combining-chars
>
> you can see exactly the same characters
>
> 1616 1615 1619 1648 1618 1612 1613 1611 1617 1614
>
> mentioned in https://emacs.stackexchange.com/a/8001/478
>
> Using its corresponding regexp `ucs-normalize-combining-chars-regexp'
> is easy in isearch, e.g.:
>
> ;; Decomposition search for accented letters.
> (define-key isearch-mode-map "\M-sd" 'isearch-toggle-decomposition)
>
> (defun isearch-toggle-decomposition ()
> "Toggle Unicode decomposition searching on or off."
> (interactive)
> (setq isearch-word (unless (eq isearch-word 'isearch-decomposition-regexp)
> 'isearch-decomposition-regexp))
> (if isearch-word (setq isearch-regexp nil))
> (setq isearch-success t isearch-adjusted t)
> (isearch-update))
>
> (defun isearch-decomposition-regexp (string &optional _lax)
> "Return a regexp that matches decomposed Unicode characters in STRING."
> (let ((accents (substring ucs-normalize-combining-chars-regexp 0 -1)))
> (mapconcat
> (lambda (c0)
> (concat (string c0) accents "?"))
> (replace-regexp-in-string accents "" string) "")))
>
> (put 'isearch-decomposition-regexp 'isearch-message-prefix "deco ")
>
> But this is more inefficient than properly implementing it using case tables.
There's probably a way of handling these in c code, but it'll have to
be done manually (translation tables won't do it). And by someone who
understands this more than me. :-)
- extending case-fold-search to remove nonspacing marks (diacritics etc.), Ted Zlatanov, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Juri Linkov, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.),
Artur Malabarba <=
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Artur Malabarba, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Stephen J. Turnbull, 2015/02/05
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Stefan Monnier, 2015/02/06
- Re: extending case-fold-search to remove nonspacing marks (diacritics etc.), Eli Zaretskii, 2015/02/06