emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.


From: Daniel Colascione
Subject: Re: Dynamic modules: MODULE_HANDLE_SIGNALS etc.
Date: Sun, 3 Jan 2016 13:28:24 -0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 01/03/2016 01:07 PM, John Wiegley wrote:
>>>>>> Daniel Colascione <address@hidden> writes:
> 
>> It's not just a theoretical problem: I've spent lots of late nights staring
>> at stack traces, trying to figure out how a certain deadlock could be
>> possible, only to realize that the program had already crashed --- or would
>> have, if a seldom-tested bit of code hadn't checked for NULL and returned
>> without releasing a lock, causing a hang half an hour later.
> 
> I see. Isn't what you describe an argument against error handling in general,
> though? It too can mask the origin of serious problems.

It is. There's a difference between trying to paper over undefined
behavior generally, however, and reporting well-defined errors using a
safe mechanism. (The former invalidates the system's own invariants,
while the latter invalidates only the application's invariants.)

But yes, error handling in general can paper over bugs, and I've
certainly seem Emacs bugs similarly exacerbated by attempting to ignore
errors.

> What if we do this:
> 
>   1. When a serious error occurs that engages crash recovery, we pop up a
>      window in Emacs describing that a serious error occurred that would have
>      crashed Emacs --and that *nothing* should be trusted now. All the user
>      should do is save critical buffers and exit immediately.

The call to Fdo_auto_save tries to do that already. Fdo_auto_save isn't
async-signal-safe, so I'd rather fork a child process, in the child,
call Fdo_auto_save and exit, have the parent wait 500ms for the child
(not forever, in case the child deadlocks), kill the child, and continue
crashing. That, or provide a less elaborate, async-signal-safe, pure C
auto-save facility.

In any case, control flow shouldn't leave the signal handler when the
application is in an unpredictable state.

>   2. When in such a state, M-x report-emacs-bug automatically includes a trace
>      for the location where the crash occurred. Of course, this assumes Emacs
>      is still functional enough to send e-mail.
> 
>> You're right that under Linux, programs need to prepare for the possibility
>> that they might suddenly cease to exist. We're talking about something
>> different here, which is the possibility that a program can *keep running*,
>> but in a damaged and undefined state.

Ideally, Emacs would, on crash (and after auto-save), spawn a copy of
itself with an error report pre-filled. Fork and exec work perfectly
fine in signal handlers.

> I was thinking the system itself is now running in a damaged and undefined
> state. When that happens, I often reboot since I can't really trust it
> anymore.
> 
>> I'm worried that it'll be hard to know if it bites us, particularly since
>> the problems I'm imagining are infrequent, unreproducible, and carry no
>> obvious signature that would show up in a user crash report.
> 
> If we use a window to pop up an alarm indicating, boldly, that Emacs is now
> UNSTABLE and should only be used to save files and exit -- maybe even noting
> how to abort Emacs to avoid typical cleanup actions -- we can start getting
> feedback on whether this feature really helps or hurts.

I think we need better crash reporting generally. Stack overflow is only
one instance of the general class of things that can go wrong.

But in any case, if we put Emacs into a state where the only thing a
user can do is save files, why not just save the files? There's no
guarantee that after a crash that we can even display something.

> I understand error handlers can mask problems, and that they've made your life
> more difficult as an engineer concerned with uncovering such causes. However,
> I'm disinclined to accept, a priori, that it will hurt before trying it out.

We have no information on how often Emacs crashes in the hands or real
users or how it crashes. A wait-and-see approach is just blind faith.

Nobody has also brought up why other programs don't work with way. Other
programs avoid this kind of hackery for good reasons, which I've
detailed. We shouldn't ignore the lessons of everyone else. It's not for
lack of inspiration that nobody else does this.

One question that neither you, nor Eli, nor Paul have answered is why we
would try to recover from stack overflow and not NULL deferences.
Exactly the same arguments apply to both situations.

> When Emacs isn't being run under gdb (which it almost never is) it also
> doesn't give much useful information about what happened, and loses data. With
> the crash recovery logic, we should at least be able to provide a trace of
> where we were when the crash was detected, plus give the user a chance of
> reporting that data back to us. I see this as possibly *increasing* the amount
> of error information we receive, and not just masking or eliminating it.

Emacs should report its own crashes somehow *generally*, probably with
Breakpad.

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]