emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: extending case-fold-search to remove nonspacing marks (diacritics et


From: Eli Zaretskii
Subject: Re: extending case-fold-search to remove nonspacing marks (diacritics etc.)
Date: Fri, 06 Feb 2015 09:29:33 +0200

> From: Ted Zlatanov <address@hidden>
> Date: Thu, 05 Feb 2015 17:16:04 -0500
> 
> https://emacs.stackexchange.com/questions/7992/how-to-search-an-arabic-word-in-text-without-its-diacritics-accents
> suggested it would be useful if diacritics were ignored when searching
> for text in various situations. This is similar to `case-fold-search'
> but more generic. Here's what I suggested as the answer at the ELisp
> level:
> 
> #+begin_src emacs-lisp
> (defun kill-marks (string)
>   (concat (loop for c across string
>                 when (not (eq 'Mn (get-char-code-property c 
> 'general-category)))
>                 collect c)))
> 
> (let* ((original1 "your Arabic string here")
>       (normalized1 (ucs-normalize-NFKD-string original1))
>       (original2 "your other Arabic string here")
>       (normalized2 (ucs-normalize-NFKD-string original2)))
>   (equal
>    (replace-regexp-in-string "." 'kill-marks normalized1)
>    (replace-regexp-in-string "." 'kill-marks normalized2)))
> #+end_src

That doesn't do what we want, it's only a partial solution to that
problem.  E.g., it doesn't equate the initial, medial, and final
variants of the letters used by Arabic and other Semitic scripts.
Moreover, you cannot even search for "a" and find "รก", AFAICS.

The way to solve this correctly and generally was discussed here some
time ago, so if there are people here for whom this is an itch to
scratch, please let's do this as discussed there.  We already have all
the necessary information for that in Emacs databases.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]