[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[rdiff-backup-users] [PATCH] Optimization for --check-destination
From: |
Josh Nisly |
Subject: |
[rdiff-backup-users] [PATCH] Optimization for --check-destination |
Date: |
Wed, 25 Jun 2008 20:24:03 +0600 |
User-agent: |
Thunderbird 2.0.0.14 (X11/20080505) |
Actually, since the optimization affects determining whether the
destination needs checking, it speeds up all backups.
What is happening is that we are going through the rdiff-backup-data
directory, looking for current_mirror files. But for each file in the
directory, we instantiate an RORPath object, which goes to the server to
set file information. The problem is that it doesn't need any file
information, since it works on the filename alone.
Since this involves the rdiff-backup-data directory, the size of the
repository is irrelevant; it's the number of times it has been backed up
that matters, and it is a linear slowdown.
What this patch does is factor out the logic of determining based on a
filename whether a file is an increment or not, thus removing the need
to go to the remote end for every file. It does go to the remote end for
each file that matches, but there are only one or two matches most of
the time.
I've been backing up to a repository daily for a year and a half, and
over my 300ms latency link, it takes >30 minutes to run
--check-destination, which runs for over 30 minutes, then says, "Fatal
Error: Destination dir does not need checking." With this patch, it
takes about one minute.
Thanks,
JoshN
--- rdiff_backup/rpath.py 10 Jun 2008 13:14:52 -0000 1.120
+++ rdiff_backup/rpath.py 25 Jun 2008 13:40:45 -0000
@@ -297,6 +300,26 @@
assert rpath.conn is Globals.local_connection
return open(rpath.path, "rb")
+def get_incfile_info(basename):
+ """Returns None or tuple of
+ (is_compressed, timestr, type, and basename)"""
+ dotsplit = basename.split(".")
+ if dotsplit[-1] == "gz":
+ compressed = 1
+ if len(dotsplit) < 4: return None
+ timestring, ext = dotsplit[-3:-1]
+ else:
+ compressed = None
+ if len(dotsplit) < 3: return None
+ timestring, ext = dotsplit[-2:]
+ if Time.stringtotime(timestring) is None: return None
+ if not (ext == "snapshot" or ext == "dir" or
+ ext == "missing" or ext == "diff" or ext == "data"):
+ return None
+ if compressed: basestr = ".".join(dotsplit[:-3])
+ else: basestr = ".".join(dotsplit[:-2])
+ return (compressed, timestring, ext, basestr)
+
class RORPath:
"""Read Only RPath - carry information about a path
Also sets various inc information used by the *inc* functions.
"""
- if self.index: dotsplit = self.index[-1].split(".")
- else: dotsplit = self.base.split(".")
- if dotsplit[-1] == "gz":
- self.inc_compressed = 1
- if len(dotsplit) < 4: return None
- timestring, ext = dotsplit[-3:-1]
+ if self.index: basename = self.index[-1]
+ else: basename = self.base
+
+ inc_info = get_incfile_info(basename)
+
+ if inc_info:
+ self.inc_compressed, self.inc_timestr, \
+ self.inc_type, self.inc_basestr = inc_info
+ return 1
else:
- self.inc_compressed = None
- if len(dotsplit) < 3: return None
- timestring, ext = dotsplit[-2:]
- if Time.stringtotime(timestring) is None: return None
- if not (ext == "snapshot" or ext == "dir" or
- ext == "missing" or ext == "diff" or ext ==
"data"):
return None
- self.inc_timestr = timestring
- self.inc_type = ext
- if self.inc_compressed: self.inc_basestr =
".".join(dotsplit[:-3])
- else: self.inc_basestr = ".".join(dotsplit[:-2])
- return 1
def isinccompressed(self):
"""Return true if inc file is compressed"""
--- rdiff_backup/restore.py 7 Jul 2007 22:43:34 -0000 1.60
+++ rdiff_backup/restore.py 25 Jun 2008 13:41:36 -0000
@@ -47,8 +64,10 @@
inc_list = []
for filename in parent_dir.listdir():
- inc = parent_dir.append(filename)
- if inc.isincfile() and inc.getincbase_str() == basename:
+ inc_info = rpath.get_incfile_info(filename)
+ if inc_info and inc_info[3] == basename:
+ inc = parent_dir.append(filename)
+ assert inc.isincfile()
inc_list.append(inc)
return inc_list
- [rdiff-backup-users] [PATCH] Optimization for --check-destination,
Josh Nisly <=