emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wait_reading_process_ouput hangs in certain cases (w/ patches)


From: Matthias Dahl
Subject: Re: wait_reading_process_ouput hangs in certain cases (w/ patches)
Date: Tue, 7 Nov 2017 15:18:02 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0

Hello Eli...

On 06/11/17 17:34, Eli Zaretskii wrote:

> Sorry, I don't understand.  When we call emacs_read in other cases in
> that function, the result is an ssize_t, a signed type of the same
> width as size_t.  So why would you need to handle overflow, when
> reading via emacs_read doesn't?
> 
> And even if the result could overflow, we have the likes of
> INT_ADD_WRAPV to make handling of that easier and safer.
> 
> So I don't see why we should give up in this case.

The amount of output read (in total) from a process is tracked via the
following member variable of Lisp_Process:

  /* Byte-count for process output read from `infd'.  */
  unsigned long infd_num_bytes_read;

That gives us at least 4 GiB worth of data that can be read back and
tracked before it wraps around to 0.

When wait_reading_process_output starts, the following happens:

  unsigned long initial_wait_proc_num_bytes_read = (wait_proc) ?
                              wait_proc->infd_num_bytes_read : 0;

We save the current counter and compare it later on to see if data has
been read from that process without us noticing it, e.g.:

 if (wait_proc
     && wait_proc->infd_num_bytes_read !=
        initial_wait_proc_num_bytes_read)
 {
   got_some_output = 1;
   break;
 }

What might have happened now is that wait_proc->infd_num_bytes_read was
already at the edge of the maximum representable value range and wrapped
around during some recursive calls to wait_reading_process_output while
we still have a value in initial_wait_proc_num_bytes_read that has not
wrapped around. E.g.:

  wait_proc->infd_num_bytes_read = (2^32)-10
  initial_wait_proc_num_bytes_read = wait_proc->infd_num_bytes_read

  ... recursive calls happen, output is read ...

  wait_proc->infd_num_bytes_read = 256
  initial_wait_proc_num_bytes_read = (2^32)-10

  // this will be wrong
  bytes_read = wait_proc->infd_num_bytes_read -
               initial_wait_proc_num_bytes_read

That causes a problem -- even though this case is unlikely, it is still
possible and I don't think it should be dismissed.

To get how many bytes have been read for this current call, we have to
calculate the difference of initial_wait_proc_num_bytes_read and
wait_proc->infd_num_bytes_read. That naturally fails if there has been
a wrap around in wait_proc->infd_num_bytes_read.

gnulib's integer overflow helpers won't help in this case, at all.

That is the whole reason why I did not set got_some_output to the proper
byte count but simply to 1, to avoid this integer overflow situation
which, if dealt with properly, would have made things a bit more
complex and difficult.

Naturally you can still do a greater than comparison and choose the
operands (more or less) properly that way. Since that calculation does
happen at two different spots in the code and I generally perceived the
bytes_read return value as unnecessary as well as the fact that I did
not really like that whole idea, I forgo doing this and went with the
simple return value of 1.

But maybe this is the simplest of all solutions, keeping KISS in mind.

> I don't agree with this line of reasoning.  We are not developing a
> library whose users will be familiar only with the formal interface
> definition and nothing else.  We are talking about code that is being
> read, studied, and used by Emacs hack^H^H^H^Hdevelopers all the time.

Even the C interface? It is my impression that the C parts of Emacs are
usually what is being avoided at all costs. Besides that, I treated the
C parts like implementation details and kept strictly to the documented
interfaces, expecting no user to ever touch this.

Nevertheless, I agree, that consistency is important and I should have
made that more of a priority in this case, given what you said.

> Having a non-trivial function whose behavior is hard to describe and
> remember accurately makes using it error-prone, especially if the
> deviant behavior happens only in some corner use case that is hard to
> reproduce.

That is why it is important that the documentation and implementation of
a function actually align. In this case, no user should actually expect
the function to return the number of bytes read, just a positive or
negative number or zero... just like it is stated.

But, again, that might be a matter of perspective. For me, what is
documented is like a contract.

> Sorry, you lost me half-way through this description.  I actually
> meant to return only the increment of how many bytes were read since
> the last call, but I don't see why you'd need more than one counter
> that will be reset to zero once its value is consumed.

My suggestion was to have an in-flight bytes read counter, that will
only track the number of bytes read while /any/ wait for that specific
process is active.

To be able to properly convey that is through another variable that
will be non-zero if we have anything waiting for that process's output.

Eventually that in-flight bytes read counter needs to be reset to zero
again. This can only happen, if we have no other parties waiting for
this process's output higher up in the call chain as otherwise we would
loose information and make it difficult/impossible for them to detect
that something was actually read.

So, that "other variable" that is non-zero if we have anything waiting
for that process's output needs to actually track the number of parties
that are waiting, so we can actually know when we are the last one and
can safely reset the in-flight bytes read counter.

But like I already stated earlier, maybe simply checking if an overflow
happened through a greater than comparison is the simplest approach,
since this solution introduces even more complexity and corner-cases,
imho.

> It's a bug that has been there "forever", and the fix is too risky to
> have it in the middle of a pretest.  I don't think we understand well
> enough all of its implications, and reading from subprocesses is a
> very central and very delicate part of Emacs.  Sorry.

Ack. Even though this means that users running into this right now will
have to wait quite a while for a fix in a stable release, which is a
pity imho.

Thanks for all the feedback,
Matthias

PS. Sorry for the huge wall of text. :-( I just noticed how big this
    mail has gotten... yikes.

-- 
Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu



reply via email to

[Prev in Thread] Current Thread [Next in Thread]