[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] add compiled regexp primitive lisp object
From: |
dmcc2 |
Subject: |
Re: [PATCH] add compiled regexp primitive lisp object |
Date: |
Wed, 31 Jul 2024 22:33:54 +0000 |
> On Tuesday, July 30th, 2024 at 09:02, Philip Kaludercic <philipk@posteo.net>
> wrote:
>
> No comments on the patch from me, I am just curious, did you notice any
> performance improvements? Or is this just cleaning up the codebase?
>
> --
> Philip Kaludercic on peregrine
I failed to provide context: very reasonable question! ^_^ This was spurred by
a discussion from the day before on how to introduce a lisp-level API for
composing search patterns
(https://lists.gnu.org/archive/html/emacs-devel/2024-07/msg01201.html), where I
concluded that codifying compiled regexps into a lisp object would be a useful
first step towards understanding the tradeoffs of introducing other matching
logic beyond regex-emacs.c. I received a reply
(https://lists.gnu.org/archive/html/emacs-devel/2024-07/msg01203.html)
indicating that patches would be the appropriate next step, and then got to
work. I was incredibly pleased about how delightful and straightforward it was
to create this first draft and wanted to share progress, but didn't think
further than that before falling asleep ^_^!
(btw, the pdumper API is incredibly cool and much less complex than I expected.)
I think a useful prototype of this workstream would involve:
(1) add new Lisp_Regexp primitive object constructed via `make-regexp' (this
patch; done),
(2) store match-data in the Lisp_Regexp instead of a thread-local (done
locally) & extend match data accessors like `match-data' to extract from an
optional Lisp_Regexp arg (the way `match-string' accepts an optional string
arg),
(3) add new Lisp_Match primitive object (or maybe just use a list for now) for
match functions to write results into instead of mutating the Lisp_Regexp
match-data (I believe this will make regexp matching entirely
reentrant/thread-safe) & extend match data accessors to accept Lisp_Match as
well.
At that point, I am guessing it will be relatively easy to construct a
benchmark that produces a very clear speedup (construct 100 random regexps and
search them in a loop) and demonstrably avoids recompiling via a profile
output. There are also likely to be benchmarks more representative of typical
emacs workload, which I would be delighted to receive suggestions for.
I think the next steps are clear enough, so I'm planning to ping this list
again when I have a working prototype achieving such a benchmark. Since the
inline diff seemed ok this time, I will also provide an inline diff for that
unless the diff exceeds +1000 lines (not expected), in which case I will attach
a patch file.
Thanks,
Danny