[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Monotone-devel] monotone server hangs
From: |
Hugo Cornelis |
Subject: |
Re: [Monotone-devel] monotone server hangs |
Date: |
Tue, 29 Sep 2009 12:07:00 -0500 |
Overnight we upgraded our server to a Debian system with a kernel
version 2.6.26. This seems to have solved the problem.
Thanks for all your help.
Hugo
On Thu, Sep 24, 2009 at 1:15 PM, Hugo Cornelis <address@hidden> wrote:
> On Thu, Sep 24, 2009 at 1:33 AM, Stephen Leake
> <address@hidden> wrote:
>> Hugo Cornelis <address@hidden> writes:
>>
>>> From time to time, we reinstall a developer PC from scratch, recreate
>>> the appropriate directory layout and pull the monotone repositories
>>> from scratch. Unfortunately pulling a large monotone repository for
>>> the first time often hangs the monotone server. This error seems to
>>> happen in 90% of the trials, although it does not happen always.
>>> There are no error messages from either side. Restarting the monotone
>>> server and interrupting the client allows one to retry to pull the
>>> repository.
>>>
>>> We would appreciate any help to solve this problem.
>>
>> Not a direct solution to the problem, but a good workaround is to
>> simply use scp to copy the monotone database on a "from scratch" setup.
>>
>
> We just had the same problem on a partial pull. When 'it' hangs, the
> server is in a select() system call, the client is idle (don't know
> what system call is).
>
> Here is the strace output on the server:
>
> address@hidden monotone-0.45]# strace -p 1871
> Process 1871 attached - interrupt to quit
> [ Process PID=1871 runs in 32 bit mode. ]
> select(11, [9 10], [9 10], [9 10], {20945, 988000}) = 2 (in [10], out
> [10], left {20671, 308000})
> select(11, [10], NULL, NULL, {21600, 0}) = 1 (in [10], left {21600, 0})
> recv(10, 0xffd733e9, 262143, 0) = -1 ETIMEDOUT (Connection timed out)
> write(3, "mtn-ns-sli: peer 129.111.247.96:"..., 88) = 88
> select(11, NULL, [10], NULL, {21600, 0}
>
> The first select() system call is where it hangs for some time, and
> then it continues with the second select() system call. I am not an
> expert on select() programming, but it seems to say that data is ready
> to be read from the socket, resulting in the call to recv(), but this
> call times out. The write() system call is for logging ?
>
> So after some time, the connection on the server side seems to timeout
> (but the server is not ready, see below). At the client side, the
> connection still hangs. The client was a MAC in this case.
>
> I did a new pull on a different (linux) machine. This connection
> hangs at the client side with the following output:
>
>
> [12:54] (0,8) ~ $ mtn --db
> /local_home/local_home/hugo/neurospaces_project/MTN/ns-sli.mtn pull
> --ticker=count repo-genesis3.cbi.utsa.edu:4692 "*"
> mtn: doing anonymous pull; use -kKEYNAME if you need authentication
> mtn: connecting to repo-genesis3.cbi.utsa.edu:4692
>
> for 'netstat -nap' I get the following output for this process:
>
>
> tcp 0 0 129.111.247.65:56980 129.115.117.89:4692
> ESTABLISHED814/mtn
>
> and for strace:
>
> [12:58] (0,9) ~ $ strace -p 814
> Process 814 attached - interrupt to quit
> select(7, [6], [], [6], {21388, 352000}
>
> This client is now 'hanging' for about 20 minutes.
>
> The strace output on the server side did not change during these
> events, which I assume means that the first client is still blocking
> the server.
>
> Anyone knows how to continue from here?
>
>
> --
>
> Hugo
>
>
> --
>
> Hugo Cornelis Ph.D.
>
> Neurospaces Project Architect
> http://www.neurospaces.org/
>
> Research Imaging Center
> University of Texas Health Science Center at San Antonio
> 7703 Floyd Curl Drive
> San Antonio, TX 78284-6240
>
> Phone: 210 567 8112
> Fax: 210 567 8152
>
--
Hugo
--
Hugo Cornelis Ph.D.
Neurospaces Project Architect
http://www.neurospaces.org/
Research Imaging Center
University of Texas Health Science Center at San Antonio
7703 Floyd Curl Drive
San Antonio, TX 78284-6240
Phone: 210 567 8112
Fax: 210 567 8152