[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#19873: Ill-formed regular expression is constructed in forward-parag
From: |
Alan Mackenzie |
Subject: |
bug#19873: Ill-formed regular expression is constructed in forward-paragraph. |
Date: |
Thu, 9 Mar 2017 21:04:45 +0000 |
User-agent: |
Mutt/1.7.2 (2016-11-26) |
Hello, Marcin.
On Sun, Feb 26, 2017 at 17:44:51 +0100, Marcin Borkowski wrote:
> On 2015-02-15, at 10:31, Alan Mackenzie <acm@muc.de> wrote:
> > Hello, Emacs!
> > In forward-paragraph, L37, a regular expression is constructed as
> > follows:
> > (let* ...
> > (sp-parstart (concat "^[ \t]*\\(?:" parstart "\\|" parsep "\\)"))
> > ...)
> > . Here parstart and parsep are, more or less,
> > paragraph-{start,separate}.
> > The problem is that parstart and parsep themselves are likely to begin
> > with "[ \t]*" (the default values certainly do), so we have two
> > consecutive matchers for an arbitrary amount of whitespace. This causes
> > the regexp engine to run very slowly when a line starts with lots of WS
> > but doesn't match.
> > This problem seems to be the cause of bug # 19846 (where holding down the
> > spacebar inside a C comment causes Emacs to seize up when auto-fill mode
> > is enabled).
> Hi Alan, hi all,
> I put this bug on my todo-list some time ago and decided now to revisit
> it.
> I'm wondering what could be done about it. First of all, my Emacs has
> this as paragraph-start:
> "\\|[ ]*$"
> and this as paragraph-separate:
> "[ ]*$"
> and frankly speaking, I'm not sure why they differ at all (by default).
> Also, even though forward-paragraph checks for "^" at their beginning,
> they actually don't begin with that character (again, by default).
> My first thought is to add a check whether paragraph-start and
> paragraph-sep match something like
> "^\\^?\\[[[:space:]]+\\][+*]?"
> and if yes, make parstart/parsep equal to them, but without the matching
> part.
> WDYT?
My first reaction is "This is a good idea, but be very careful!". For
example, if paragraph-start and/or paragraph-separate begin with
"[ \t]+" (i.e. the paragraph start requires space at BOL), you will miss
it by removing matches of "^\\^?\\[[[:space:]]+\\][+*]?" from them.
I think this idea is workable, but you'll have to check for one or both
of paragraph-s{tart,eparate} starting with "[ \t]+". A good strategy
here might be to begin the target regexp with "^[ \t]*", then begin one
or both components with "[ \t]" (without the "*").
There may be other gotchas which I haven't thought about yet.
One needs a twisted mind to do this sort of thing properly, so I offer my
services to review your upcoming patch. ;-)
> --
> Marcin Borkowski
> http://octd.wmi.amu.edu.pl/en/Marcin_Borkowski
> Faculty of Mathematics and Computer Science
> Adam Mickiewicz University
--
Alan Mackenzie (Nuremberg, Germany).