|
From: | Thomas W. Ekberg |
Subject: | [Duplicity-talk] Duplicity hangs |
Date: | Tue, 31 Oct 2017 17:24:11 +0000 |
This is a follow-on to a previous submission where duplicity (under duply) hangs. The duplicity hangs occurred on 7/11, 9/20 and 10/30. That is 71 days between #1 and #2, and 40 days between #2 and #3.
You can see the hang in this ps command:
$ ps wwaux|grep duply
transfe+ 7764 0.0 0.0 4508 852 ? Ss Oct30 0:00 /bin/sh -c duply db_backup pre+bkp+post >> /var/log/duply transfe+ 7766 0.0 0.0 13056 3420 ? S Oct30 0:00 bash /usr/bin/duply db_backup pre+bkp+post transfe+ 8776 0.0 0.0 231436 31732 ? Sl Oct30 0:58 python2 /usr/bin/duplicity --name duply_db_backup --encrypt-key 11D39D41 --sign-key 11D39D41 --verbosity 2 --full-if-older-than 1W --exclude-filelist /home/transfers/.duply/db_backup/exclude /var/tmp/pg_dumps scp://address@hidden//mnt/backup/backups/db3.labmed.uw.edu/db_backup Duply and Duplicity have been running since yesterday morning - over 33 hours ago.
Next I noted that the disk space at the remote partition is at 37% - no problem there. The local partition has 11TB free at 2% usage. One responder suggested running strace. When I did that on the parent duplicity process I got this:
sudo strace -p8776
strace: Process 8776 attached futex(0x19ce500, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff
and it just sat there. Then I did the same on the child process (LWP in ps) and got this:
sudo strace -p8781
strace: Process 8781 attached restart_syscall(<... resuming interrupted poll ...>) = 0 poll([{fd=4, events=POLLIN}], 1, 100) = 0 (Timeout) poll([{fd=4, events=POLLIN}], 1, 100) = 0 (Timeout)
repeating the poll timeout commands quickly. Occasionally, after about 150 poll timeouts, a sendto command would appear like this:
poll([{fd=4, events=POLLOUT}], 1, 100) = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "It\372\261\306\277{V\256\374\351'f\231t\201\245Y\311\343\251`\333\246\224\314\366Q\200\327\331]"..., 64, 0, NULL, 0) = 64
and the repeating poll timeout commands would continue. The second argument to sendto, if you drop the high bit, starts with
Itz1F?{V.|i'f^Y^A%YIc)`. Not very enlightening. I did this - sudo strace -f -p8776 - and it outputted the same kind of output as above, only it combined them and identified the PID on each line.
Eventually I would kill the strace command and it would detach normally.
So it looks like the duplicity process is waiting for its thread to complete, and the thread is polling on fd 4 and it keeps timing out. I don't know what file descriptor 4 is for. The exception is calling sendto which causes the poll command to not be a timeout.
Does this make sense to anyone?
Tom
Ekberg
|
[Prev in Thread] | Current Thread | [Next in Thread] |