savannah-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-hackers] Re: lsh-1.4 still having the same problem


From: Niels Möller
Subject: [Savannah-hackers] Re: lsh-1.4 still having the same problem
Date: 10 Jun 2002 10:03:43 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.2

Loic Dachary <address@hidden> writes:

> The upgrade to lsh-1.4 apparently did not create any new problems.
> However, when a connection dies the shell is still hanging.

Which processes? Is this processes with a pty (ordinary interective
shells), processes without a pty (usual case for non-interactive
processes like cvs over ssh), or both? 

> I read your mail explaining that there is no safe way to notify the
> process about the connection that died.

That is in the pty case. When no pty is involved, the right thing is
to simply close the process' stdin. If that's not enough, one might
other tricks like waiting a second or two, and then sending some
signal, but I don't want to do that unless it really is necessary.

>       However, I'd like to understand why bash has file descriptors
> 0, 1, 2 bound to a unix socket. I was kind of expecting that they would
> be bound to a regular socket.

Sure. They can't be bound to a regular socket, because the only
regular socket involved is the ssh connection, on which all data is
encrypted. So lshd decrypts that data and then feeds it to the process
using pipes. And because ordinary pipes() have some special features
(the PIPE_BUF property) that doesn't fit well with non-blocking i/o,
lshd uses socketpair() to create the pipes.

So the unix sockets that bash sees are connected to lshd, not to the
remote client.

>       I guess that openssh uses some tricks to prevent this problem to
> happen. I also guess that you investigated how they manage to solve it.

Actually, I haven't looked closely at the openssh code, for various
reasons.

I think we need to address the pty and the non-pty cases separately.
The following is what is supposed to happen when the tcp connection
goes down:

1. lshd gets a read or write error of the fd. An exception is raised,
   and the connection is marked for close.

2. The fd's close callback is invoked, and it kills all resources
   associated with the connection or any channel on the connection.

3. That implies that all fd:s associated with the connection are
   closed. In this case the "write ^D on pty" trick never happens.

It's hard to say, without more information, which of these steps are
broken. Some information that would be helpful, for both the pty and
the non-pty cases, is

* What the hanging processes are doing. They're probably blocking on a
  read on stdin, but one never knows... strace will tell this.

* Output from lshd -v --debug --trace.

* A simple way to reproduce the problem. Something like

    Run lsh server sleep 60
    On a different terminal, kill -9 the above lsh process.
    When sleep is finished, the shell hangs.

Best regards,
/Niels

PS. The disk on which I store my maile crashed a few days ago. The
    backup worked, but there may be a window of a few hours during
    which mail was lost. If you've sent me any more information which
    I haven't replied to, please send it again.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]