monotone-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Monotone-devel] Fatal error: std::runtime_error


From: Jens Seidel
Subject: Re: [Monotone-devel] Fatal error: std::runtime_error
Date: Fri, 13 Apr 2007 22:34:25 +0200
User-agent: Mutt/1.5.13 (2006-08-11)

On Wed, Apr 11, 2007 at 04:16:07PM +0200, Markus Schiltknecht wrote:
> Jens Seidel wrote:
> >But now it failed again! strace output:
> >$ strace -p 9521
> >Process 9521 attached - interrupt to quit
> >select(7, [6], [], [6], {21439, 988000}
> 
> So monotone is waiting in a select()...

Please note that 21439 are the seconds select() waits until it timeouts.
It defaults initially to 21600 which corresponds to 6 hours! A really
large und useless timeout value, right?

I reduced netsync_timeout_seconds in constants.hh to 180 (3 minutes) and
now monotone reports a timeout during waiting for I/O and exits.
I tried this with version 0.34 on one of my mips boxes.

The error occurred multiple times after fetching approximately 60 revisions.

> are you sure the connection to
> the server is fine? Maybe save a tcpdump and check that.

The last lines of
$ tcpdump host monotone.openembedded.org
are:
21:20:29.043385 IP linuxtogo.org.4691 > dream.sol.2012: . 
25371638:25373026(1388) ack 487 win 6432 <nop,nop,timestamp 316580726 1586027>
21:20:29.045501 IP linuxtogo.org.4691 > dream.sol.2012: . 
25373026:25374414(1388) ack 487 win 6432 <nop,nop,timestamp 316580726 1586027>
21:20:29.047568 IP linuxtogo.org.4691 > dream.sol.2012: . 
25374414:25375802(1388) ack 487 win 6432 <nop,nop,timestamp 316580726 1586027>
21:20:29.077204 IP linuxtogo.org.4691 > dream.sol.2012: . 
25375802:25377190(1388) ack 487 win 6432 <nop,nop,timestamp 316580776 1586077>
21:20:29.077301 IP dream.sol.2012 > linuxtogo.org.4691: . ack 25377190 win 1339 
<nop,nop,timestamp 1586135 316580726>
21:20:29.134441 IP linuxtogo.org.4691 > dream.sol.2012: . 
25377190:25378578(1388) ack 487 win 6432 <nop,nop,timestamp 316580833 1586135>
21:20:29.184178 IP dream.sol.2012 > linuxtogo.org.4691: . ack 25378578 win 730 
<nop,nop,timestamp 1586242 316580833>
21:20:29.242905 IP linuxtogo.org.4691 > dream.sol.2012: . 
25378578:25379966(1388) ack 487

But the error happened several minutes later!

I assume that this error occurs more often on a slow CPU. My experiences
confirm this. Is it possible that monotone fetches network packages in
larger blocks (cached) so that data to revisions is already available on
the host before the revisions are processed? Once the CPU finished
processing of all available data monotone tries to obtain further
network packages from the remote host but the connection already expired
because of inactivity (my modem disconnects).

I suggest you try monotone on a slow host, maybe by starting it in a
valgrind session (allows you also to perform some additional checks).
If it is still too fast, compile a few linux kernels in background :-)

> Okay, I've tried to pull the openembedded repository from scratch. After 
> six hours, these are my results:
> 
> $ mtn --version
> monotone 0.33 (base revision: cfebc8eb7049def476cc5fd61fef64eb14120e68)
> 
> $ time mtn -d oe.db pull monotone.openembedded.org "*"
> mtn: doing anonymous pull; use -kKEYNAME if you need authentication
> mtn: connecting to monotone.openembedded.org
> mtn: finding items to synchronize:
> mtn:  bytes in | bytes out | certs in | revs in
> mtn:    16.1 k |       462 |        0 |       0
> mtn:  bytes in | bytes out |      certs in |       revs in
> mtn:    63.4 M |       462 | 42,548/64,630 | 10,686/16,148
> mtn: error: I/O failure while talking to peer monotone.openembedded.org, 
> disconnecting

> real    137m32.758s
> user    102m21.388s
> sys     1m4.308s

Why the hell did it terminate after only 2 hours if the timeout is set
to 6 hours?

Call stack once it hangs:

#0  0x2afe504c in select () from /lib/libc.so.6
#1  0x00d90e34 in Netxx::Probe_impl::probe (this=0x117d878, address@hidden, 
rt=0) at ../netxx/probe_select.cxx:217
#2  0x00d8ae7c in Netxx::Probe::ready (this=0x7f8990b4, address@hidden, rt=0) 
at ../netxx/probe.cxx:130
#3  0x00a2aa98 in call_server (role=sink_role, address@hidden, address@hidden, 
address@hidden, address@hidden, default_port=4691, 
    timeout_seconds=21600) at ../netsync.cc:2424
#4  0x00a2c0f8 in run_netsync_protocol (voice=client_voice, role=sink_role, 
address@hidden, address@hidden, address@hidden, 
    address@hidden) at ../netsync.cc:3240
#5  0x00787d3c in commands::cmd_pull::exec (this=0x10f8728, address@hidden, 
address@hidden) at ../cmd_netsync.cc:152
#6  0x0076de18 in commands::process (address@hidden, address@hidden, 
address@hidden) at ../commands.cc:237
#7  0x00ca2b14 in cpp_main (argc=5, argv=0x7f899dc4) at ../monotone.cc:260
#8  0x00ca53a8 in main (argc=5, argv=0x7f899dc4) at ../unix/main.cc:155


217             if ( (rc = select(max_fd+1, rd_fdptr, wr_fdptr, er_fdptr, 
timeout_ptr)) > 0) {
218                 std::set<socket_type>::const_iterator 
all_it=all_sockets.begin(), all_end=all_sockets.end();
219                 Probe::ready_type ready_bits;
220
221                 for (; all_it!=all_end; ++all_it) {
(gdb) print *rd_fdptr
$2 = {fds_bits = {512, 0 <repeats 31 times>}}
(gdb) print *wr_fdptr
$4 = {fds_bits = {0 <repeats 32 times>}}
(gdb) print er_fdptr
$6 = {fds_bits = {512, 0 <repeats 31 times>}}
(gdb) print *timeout_ptr
$8 = {tv_sec = 20635, tv_usec = 44000} # approximately 6 hours!!!

Hope this helps,
Jens




reply via email to

[Prev in Thread] Current Thread [Next in Thread]