[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[rdiff-backup-users] gzip --rsyncable (was: Re: New User Seeking Some Cl
From: |
Gregor Zattler |
Subject: |
[rdiff-backup-users] gzip --rsyncable (was: Re: New User Seeking Some Clarification) |
Date: |
Fri, 30 Jan 2004 00:34:32 +0100 |
User-agent: |
Mutt/1.5.5.1+cvs20040105i |
Hi rdiff-backup-users,
* Gregor Zattler <address@hidden> [28. Jan. 2004]:
> Hi Ben,
> * Ben Escoto <address@hidden> [27. Jan. 2004]:
> > >>>>> Alan <address@hidden>
> > >>>>> wrote the following on Fri, 9 Jan 2004 13:35:37 -0800
> [...]
> > > until I realized that because of the
> > > bzip the .sql file was completely different each time, so the entire
> > > file was transfered as an increment. When I removed the bzip part of
> > > the process the base file was larger, but the increments were much
> > > smaller because they were simply text diffs of new/changed data, not a
> > > binary diff of an entirely changed file.
>
> > I think there is a patch to gzip floating around that adds an option
> > to reset the buffer at certain clever intervals. The end result is
> > that similar data gzipped stays similar---one extra byte at the
> > beginning doesn't result in two totally separate gzip archives.
>
> This is in Debian unstable since almost one year:
I tested it. The results are as expected:
I rdiff-backup-ed a directory /tmp/testdir which
contained three files: a 46 MB mbox, the same mbox as a a bzip2
compressed file and as a gzip compressed file:
0 pit:/tmp/testdir$ ls -Al
total 57084
-rw-r--r-- 1 grfz grfz 46154878 2004-01-29 23:35 mbox
-rw-r--r-- 1 grfz grfz 5438580 2004-01-29 23:38 mbox.bz2
-rw-r--r-- 1 grfz grfz 6779264 2004-01-29 23:36 mbox.gz
0 pit:/tmp$ rdiff-backup -b testdir testdir-backup
I modified the mbox slightly by deleting two unimportant header in
the first mail, rebuild the two compressed files and did a
rdiff-backup:
0 pit:/tmp/testdir$ cat mbox |gzip -9 >mbox.gz;cat mbox |bzip2 -9 >mbox.bz2;
ls -Al
total 57084
-rw-r--r-- 1 grfz grfz 46154774 2004-01-29 23:40 mbox
-rw-r--r-- 1 grfz grfz 5437410 2004-01-29 23:42 mbox.bz2
-rw-r--r-- 1 grfz grfz 6779265 2004-01-29 23:40 mbox.gz
0 pit:/tmp$ rdiff-backup -b testdir testdir-backup
0 pit:/tmp$ ls -Al testdir-backup/rdiff-backup-data/increments/
0 pit:/tmp/testdir-backup/rdiff-backup-data/increments$ ls -al
total 11968
drwx------ 2 grfz grfz 100 2004-01-29 23:43 ./
drwx------ 3 grfz grfz 600 2004-01-29 23:43 ../
-rw-r--r-- 1 grfz grfz 6779788 2004-01-29 23:36
mbox.gz.2004-01-29T23:38:12+01:00.diff
-rw-r--r-- 1 grfz grfz 5438999 2004-01-29 23:38
mbox.bz2.2004-01-29T23:38:12+01:00.diff
-rw-r--r-- 1 grfz grfz 4803 2004-01-29 23:35
mbox.2004-01-29T23:38:12+01:00.diff.gz
In fact the increments of both compressed files are bigger then
the original compressed files.
I then deleted the first backup and did it again, this time with
the --rsyncable option:
0 pit:/tmp/testdir$ rm mbox.gz
rm: remove regular file `mbox.gz'? y
0 pit:/tmp/testdir$ cat mbox |gzip -9 --rsyncable >mbox.gz
0 pit:/tmp/testdir$ ls -Al
total 57372
-rw-r--r-- 1 grfz grfz 46154774 2004-01-29 23:40 mbox
-rw-r--r-- 1 grfz grfz 5437410 2004-01-29 23:42 mbox.bz2
-rw-r--r-- 1 grfz grfz 7076210 2004-01-29 23:46 mbox.gz
0 pit:/tmp$ rdiff-backup -b testdir testdir-backup
Deleted two header in the first mail and did it again:
0 pit:/tmp/testdir$ cat mbox |gzip -9 --rsyncable >mbox.gz;cat mbox |bzip2
-9 >mbox.bz2; ls -Al
total 57372
-rw-r--r-- 1 grfz grfz 46154537 2004-01-29 23:46 mbox
-rw-r--r-- 1 grfz grfz 5437393 2004-01-29 23:48 mbox.bz2
-rw-r--r-- 1 grfz grfz 7076151 2004-01-29 23:47 mbox.gz
0 pit:/tmp$ rdiff-backup -b testdir testdir-backup
0 pit:/tmp$ ls -Al testdir-backup/rdiff-backup-data/increments/
total 5340
-rw-r--r-- 1 grfz grfz 4810 2004-01-29 23:40
mbox.2004-01-29T23:46:22+01:00.diff.gz
-rw-r--r-- 1 grfz grfz 5437829 2004-01-29 23:42
mbox.bz2.2004-01-29T23:46:22+01:00.diff
-rw-r--r-- 1 grfz grfz 7558 2004-01-29 23:46
mbox.gz.2004-01-29T23:46:22+01:00.diff
So while the gzip -9 --rsyncable produced slightly bigger archives
(~ 4.38 %), the increment in the second test case was significantly
smaller than in the first test case without the --rsyncable
option. The size of the increment was roughly the same as the
increment of the not compressed mbox.
Sadly the last stable gzip version (1.24) is several years old.
So it is totally unclear when this feature will become widely
available.
Ciao; Gregor