monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] monotone server hangs


From: Hugo Cornelis
Subject: Re: [Monotone-devel] monotone server hangs
Date: Thu, 24 Sep 2009 13:15:13 -0500

On Thu, Sep 24, 2009 at 1:33 AM, Stephen Leake
<address@hidden> wrote:
> Hugo Cornelis <address@hidden> writes:
>
>> From time to time, we reinstall a developer PC from scratch, recreate
>> the appropriate directory layout and pull the monotone repositories
>> from scratch.  Unfortunately pulling a large monotone repository for
>> the first time often hangs the monotone server.  This error seems to
>> happen in 90% of the trials, although it does not happen always.
>> There are no error messages from either side.  Restarting the monotone
>> server and interrupting the client allows one to retry to pull the
>> repository.
>>
>> We would appreciate any help to solve this problem.
>
> Not a direct solution to the problem, but a good workaround is to
> simply use scp to copy the monotone database on a "from scratch" setup.
>

We just had the same problem on a partial pull.  When 'it' hangs, the
server is in a select() system call, the client is idle (don't know
what system call is).

Here is the strace output on the server:

address@hidden monotone-0.45]# strace -p 1871
Process 1871 attached - interrupt to quit
[ Process PID=1871 runs in 32 bit mode. ]
select(11, [9 10], [9 10], [9 10], {20945, 988000}) = 2 (in [10], out
[10], left {20671, 308000})
select(11, [10], NULL, NULL, {21600, 0}) = 1 (in [10], left {21600, 0})
recv(10, 0xffd733e9, 262143, 0)         = -1 ETIMEDOUT (Connection timed out)
write(3, "mtn-ns-sli: peer 129.111.247.96:"..., 88) = 88
select(11, NULL, [10], NULL, {21600, 0}

The first select() system call is where it hangs for some time, and
then it continues with the second select() system call.  I am not an
expert on select() programming, but it seems to say that data is ready
to be read from the socket, resulting in the call to recv(), but this
call times out.  The write() system call is for logging ?

So after some time, the connection on the server side seems to timeout
(but the server is not ready, see below).  At the client side, the
connection still hangs.  The client was a MAC in this case.

I did a new pull on a different (linux) machine.  This connection
hangs at the client side with the following output:


[12:54] (0,8) ~ $ mtn --db
/local_home/local_home/hugo/neurospaces_project/MTN/ns-sli.mtn pull
--ticker=count repo-genesis3.cbi.utsa.edu:4692 "*"
mtn: doing anonymous pull; use -kKEYNAME if you need authentication
mtn: connecting to repo-genesis3.cbi.utsa.edu:4692

for 'netstat -nap' I get the following output for this process:


tcp        0      0 129.111.247.65:56980    129.115.117.89:4692
ESTABLISHED814/mtn

and for strace:

[12:58] (0,9) ~ $ strace -p 814
Process 814 attached - interrupt to quit
select(7, [6], [], [6], {21388, 352000}

This client is now 'hanging' for about 20 minutes.

The strace output on the server side did not change during these
events, which I assume means that the first client is still blocking
the server.

Anyone knows how to continue from here?


-- 

Hugo


--

                    Hugo Cornelis Ph.D.

              Neurospaces Project Architect
                http://www.neurospaces.org/

                  Research Imaging Center
   University of Texas Health Science Center at San Antonio
                    7703 Floyd Curl Drive
                 San Antonio, TX  78284-6240

                    Phone: 210 567 8112
                      Fax: 210 567 8152




reply via email to

[Prev in Thread] Current Thread [Next in Thread]