Re: wait_reading_process_ouput hangs in certain cases (w/ patches)

emacs-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wait_reading_process_ouput hangs in certain cases (w/ patches)

From:	Matthias Dahl
Subject:	Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Date:	Wed, 15 Nov 2017 15:03:27 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

Hello Paul...

On 14/11/17 22:54, Paul Eggert wrote:

> I doubt whether anyone understands the problem well enough, which is why
> I have been asking questions about the proposed solution - which as far
> as I can see, is more of a band-aid rather than a real fix.

What makes you think that, just out of curiosity? I poured quite a lot
of time into debugging this bug, reading up on the relevant C sources
and coming up with a "simple" enough that is easy to grasp and review.

This was by no means a 5-minute look and fix... which is nothing I would
ever do, period. If anything, I am usually too thorough for my own good.

> When this happens, it appears that the original accept-process-output
> acted by calling wait_reading_process (0, 0, 0, 0, Qnil, PROC, 0) where
> PROC is the ispell-process. First, is that correct? (If not, my
> remaining questions may be a wild goose chase....)

Exactly. You can also have a deeper look at things, as there is a full
and detailed backtrace posted a few messages back (emacs-bt-full.txt).

> This meant the original wait_reading_process did the following: set wait
> = INFINITY, run the timers (which apparently call wait_reading_process
> recursively), then check whether update_tick != process_tick (line 5182
> of process.c in commit 79108894dbcd642121466bb6af6c98c6a56e9233). Is
> update_tick equal to process_tick in the problematic call? I'll assume
> so, but please check this. (If not, my remaining questions may need to
> be changed.)

This is pretty much irrelevant, if I am not missing some huge bit piece
of the puzzle somewhere.

If the branch update_tick != process_tick is taken, there is nothing in
there that would eventually notice that the data from our wait_proc has
already been read.

If thread_select signaled that there is no more data available, we end
up with status_notify for our wait_proc -- and that will also only try
to read any remaining data, if at all.

The crucial part is: ALL data has been read from our wait_proc while we
were running timers or filters -- and no further data will become
available until there is some interaction again with the process. That
is the case with the ispell process.

wait_reading_... currently has no chance/code to detect such an event
since it solely relies on pselect calls and such -- which will come up
empty handed if no further data is available, and there is nothing to
change that.

[ ... I skipped the remaining questions for now. If you still think
      those are relevant, let me know and I will do my best to answer
      those as well. ...]

> The changes you're proposing essentially kick the code so that it
> pretends that it read some bytes, even though it didn't (because the
> bytes were actually read and processed by a subroutine), causing it to
> exit the loop (and return nonzero instead of zero -- why?). But isn't
> this kick what the update_tick != process_tick (line 5182) check is
> supposed to be doing? And if so, why isn't that check working for your
> case? Is it because the code is forgetting to increment a tick count?

The fix is no band-aid, as you put it earlier. To answer your questions:

1)
Yes, it gives wait_reading_... the possibility to exit the loop and
return >= 1, if the data for our wait_proc has been read while we were
running timers or filters.

2)
Returning >= 1 is semantically correct, imho. We were waiting for data
to become available. That data became available. Granted, it is not us
directly who read it and passed it to the filter, but still, the caller
expects us to signal if data become available... and that is exactly
what happened. Returning anything else wouldn't make much sense, imho
and probably (?) cause problems in this particular case...

3)
update_tick != processed_tick ... see earlier in this mail for why this
is not helping a bit with this particular case.

By the way, I do have ideas for solutions that have no corner-cases,
which is what you are asking for. My current solution will fail to
detect a read-back if exactly 2**(bit depth of counter) has been read
which is very (!) unlikely, but still. That's it... otherwise it will
work reliably.

But that requires that we change wait_reading_... to really only return
-1, 0 or 1 and give up returning the real amount of bytes. Otherwise we
either end up with the same corner-case all over again, just in a more
complex solution or with a way too complex solution for this problem.
No user is using the return value in such a way in-tree. And given that
those changes are going only to master, it gives all out-of-tree users
who might use that value differently, ample time to adjust -- or voice
their concerns.

Thanks for your thorough questions and review -- very much appreciated!

So long,
Matthias

-- 
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu

[Prev in Thread]

Current Thread

[Next in Thread]

Re: wait_reading_process_ouput hangs in certain cases (w/ patches), (continued)

Prev by Date: Re: mouse-drag-and-drop-region
Next by Date: Re: mouse-drag-and-drop-region
Previous by thread: Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Next by thread: Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Index(es):
- Date
- Thread