[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Character group folding in searches
From: |
Stefan Monnier |
Subject: |
Re: Character group folding in searches |
Date: |
Sun, 08 Feb 2015 09:03:23 -0500 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/25.0.50 (gnu/linux) |
> I'm sorry, I don't understand how this will solve the use-cases
> brought up in this thread. Can you explain?
Every equivalence class selected by such a DFA can match any set of
strings that can be described by a regular expression, so it should be
more than sufficiently powerful.
> . exact match -- only exactly the same codepoints match
The DFA is trivial, matches any (and only) one-char sequences and
returns the char.
> . base-character match -- this ignores any combining marks,
> diacriticals, etc.
Admittedly, less trivial since we have to remember the base char after
matching it, while skipping subsequent combining marks and diacriticals.
> . matching ligatures, such as ffi and ffi
Straightforward.
> . ignoring punctuation, like string-collate-equalp does,
> i.e. "foobar" will match "foo.bar"
Easy: the DFA will simply loop back when it sees a ".".
> . ignoring isolated zero-width or non-combining marks and
> directional controls
Same.
> I understand very well how these can be handled by several different
> char-tables, but you seem to say that a single char-table can do all
> this, and I don't see how.
Not sure what you mean by "single char-table" or why you think I said
something about single-vs-multiple char-tables.
A first implementation of DFAs could use internally char-tables (where
each node of the DFA is a char-table) but I think it's something
entirely different from what you mean by "different char-tables" or
"single char-table", since you'd choose one DFA (which may have any
number of char-tables inside).
> Now I'm completely confused: char-tables don't need this optimization,
> as you well know: they already are space-efficient for storing
> characters that map to the table's default value. So I probably
> misunderstand your whole idea, if it does need such an optimization.
A DFA can have hundreds of nodes (hence hundreds of char-tables if we
use char-tables for that), most of which map one or two chars to
a special value while all others are mapped to "the default", so there
can be significant gains from using a more specialized representation.
>> PS: And this same kind of "char-table extended into a DFA" could be
>> useful for syntax-tables in order to provide much more flexible support
>> for multi-character comment markers or "paren-like nested elements".
> If that's your itch to scratch, I'm impatiently waiting for patches ;-)
It's been in the back of my mind for many years.
Stefan
- Re: Character group folding in searches, (continued)
- Re: Character group folding in searches, Stefan Monnier, 2015/02/06
- Re: Character group folding in searches, Eli Zaretskii, 2015/02/06
- Re: Character group folding in searches, Stefan Monnier, 2015/02/06
- Re: Character group folding in searches, Eli Zaretskii, 2015/02/07
- Re: Character group folding in searches, Stefan Monnier, 2015/02/07
- Re: Character group folding in searches, Eli Zaretskii, 2015/02/07
- Re: Character group folding in searches,
Stefan Monnier <=
- Re: Character group folding in searches, Eli Zaretskii, 2015/02/08
- Re: Character group folding in searches, Stefan Monnier, 2015/02/08
- Re: Character group folding in searches, Eli Zaretskii, 2015/02/09
- Re: Character group folding in searches, Stefan Monnier, 2015/02/09
- Re: Character group folding in searches, Eli Zaretskii, 2015/02/09
- Re: Character group folding in searches, Stefan Monnier, 2015/02/09
- Re: Character group folding in searches, Eli Zaretskii, 2015/02/10
Re: Character group folding in searches, Juri Linkov, 2015/02/06