rdiff-backup-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [rdiff-backup-users] Problem with Detection of Multiple rdiff-backup


From: Steven Willoughby
Subject: Re: [rdiff-backup-users] Problem with Detection of Multiple rdiff-backup instances
Date: Thu, 24 Sep 2009 19:16:48 -0600
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

Dean Cording wrote:
I've come across an issue with the way that rdiff-backup ensures that only one server is accessing a backup dataset.
...
Recently I had a backup fail, probably because of a network outage. All subsequent backups refuse to run because rdiff-backup believes the failed rdiff- backup instance is still running - even though this is clearly impossible because it is a totally different instance of the virtual server.

This had me stumped for a while but I finally figured out what is happening.

Because I start a new virtual server instance each time and I run the backup from a script, everything happens in a consistent order. As a result the instance of rdiff-backup running on the server for each backup session almost always has the same PID. So when a backup fails, the subsequent backup looks at the metadata, finds the PID of the failed backup and sees that that PID is still running - not realising that the other instance is actually itself.

A cursory look at regress.py seems to confirm this behavior: Specifically in check_pids() it says:

    if pid is not None and pid_running(pid):

This could say:

    if pid is not None and pid is not os.getpid() and pid_running(pid):


I'm not sure of a way of working around this problem as the virtual machine is always started from a known state and hasn't been running long enough to build up any entropy to generate unique random numbers between different sessions.

The current time adds a little randomness. A silly workaround would be to call the following perl script before running rdiff-backup:

#!/usr/bin/perl
`/bin/true` for 0..int(rand(100));

This will increase the pid and should stop your job from failing continuously.

Steven




reply via email to

[Prev in Thread] Current Thread [Next in Thread]