bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#13802: stack overflow in mm-add-meta-html-tag


From: Stefan Monnier
Subject: bug#13802: stack overflow in mm-add-meta-html-tag
Date: Sun, 24 Feb 2013 21:04:21 -0500
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux)

> I see a "Stack overflow in regexp matcher" error traceable back to
> lisp/gnus/mm-decode.el func ‘mm-add-meta-html-tag’ fragment:

>   (re-search-forward "\
>   <meta\\s-+http-equiv=[\"']?content-type[\"']?\\s-+content=[\"']\
>   text/\\(\\sw+\\)\\(?:\;\\s-*charset=\\(.+\\)\\)?[\"'][^>]*>" nil t)

Hmm... I don't see any obvious reason for a stack overflow unless the
text has some very long lines or a lot of space between elements.

> One idea (untested) is to replace the ".+" (used to match the charset)
> with a more specific pattern.  Perhaps "[^<>]+" or "\\sw+"?

I don't think that would help.  To avoid such overflow, you need to
reduce the backtracking, i.e. reduce the number of cases where two
options are possible according to the simplistic regexp-optimizer.
\s<CHAR> pattern is actually very poor in this respect, because the
optimizer can't know anything about the chars that this matches (since
it depends on text-properties).
The flip side is that replacing \\s- with [ \t\n] might help (this way,
the optimizer will see that the + repetition does not need backtracking
since a char cannot both match a loop iteration and the "after the
loop" content).
Similarly using [^;'\"]+ instead of \\sw+ would help, and maybe replacing
.+ with [^'\"\n]+ would help as well.

> Thinking more systematically, maybe Emacs should add a condition
> ‘stack-overflow/regexp’ (or something like that) such that code can
> ‘condition-case’ for it and try a fallback path.

In reality, such overflow should only ever happen if you have backrefs
in your regexp.


        Stefan





reply via email to

[Prev in Thread] Current Thread [Next in Thread]