--- Begin Message ---
Subject: |
Bug#556114: patch |
Date: |
Wed, 25 Nov 2009 14:00:58 +0100 |
Hi,
I have created the following small patch, which, solves the problem as
much as possible without a major rework.
Basically, instead of recording the start time of the backup in the
incremental file list, it computes a time based on the most recent
modification time of all files that are in the archive, and that were
modified *before* the backup started. The computation consists of
adding one second, and then trunctating the fractional part, unless
that would make the time larger than the start time of the backup, in
which case it is left unchanged. Effectively, if the filesystem is not
being modified, the recorded timestamp now deterministically depends on
the actual state of the filesystem instead of on the arbitrary time tar
happens to be invoked.
The result is that the next level incremental backup will contain all
files modified since the youngest file that was in the backup, if that
file was modified at least 1 second before the dump started. Files that
were modified since: very shortly before, during or after the backup,
will always be included in the next incremental backup.
This patch also takes care of another problem: If the filesystem has a
1-second resolution for its times, and if a backup starts at time T.100
(T seconds + 100 milliseconds), and finishes at time T.300, and if
directly thereafter, at T.400, a file is modified, its modification
time will be stored (with 1-second resolution) as T.000. I.e. it will
have a modification time earlier than the time of the backup (which tar
stores as T.100). Therefore, that file will not be included in the next
incremental backup.
Similarly, with the same 1-second resolution, if tar starts at T.100,
if it stores a file at T.200, and if at T.300, i.e. after it was dumped,
but before tar terminates (which may be at T+1000), that file is
modified, then the filesystem will record a modification time of T.000,
and again, the next level incremental backup will not include the file.
The patch was made against tar 1.20, and it passes all tests. One of the
test cases required a small fix. The patch also applies to tar 1.22,
passing all tests as well.
Issues not solved by this patch:
- For filesystems with a better time resolution than 1 second, more
files may be included in incremental backups than strictly necessary,
but never more than 1 second worth. IMHO this is preferable to the
alternative of some files not being included on filesystems with a
larger time resolution. Especially since this is not limited to the
snapshot case, but can (will) also happen during regular dumps - see
above.
- For filesystems with a worse time resolution than 1 second, files
may still be left out of an incremental backup. In that case, the
behavior is no worse than it is now.
Use it as you see fit. Comments etc. are welcome.
Regards,
Rogier
------------------------------------------------------------------------
diff -Naur tar-1.20/src/buffer.c tar-1.20-rjg1/src/buffer.c
--- tar-1.20/src/buffer.c 2008-02-04 11:36:51.000000000 +0100
+++ tar-1.20-rjg1/src/buffer.c 2009-11-25 11:24:10.000000000 +0100
@@ -175,6 +175,29 @@
gettime (&start_time);
volume_start_time = start_time;
last_stat_time = start_time;
+ deterministic_start_time.tv_sec = TYPE_MINIMUM (time_t);
+ deterministic_start_time.tv_nsec = 0;
+}
+
+void
+update_deterministic_start_time (struct timespec t)
+{
+ /* The deterministic start time is a timestamp that can be
+ deterministically computed from the 'input' (i.e. filesystem
+ status).
+ Theoretically, the dump could have been started at any time
+ between the last change to the filesystem before the actual
+ start, and the first change after the actual start.
+ The deterministic time will be that of the last change
+ before the start of the dump.
+ Using this value instead of the actual start time makes
+ the file list of a listed-incremental backup deterministic
+ as well
+ */
+
+ if (timespec_cmp (t,deterministic_start_time) > 0
+ && timespec_cmp (t,start_time) < 0)
+ deterministic_start_time = t;
}
void
diff -Naur tar-1.20/src/common.h tar-1.20-rjg1/src/common.h
--- tar-1.20/src/common.h 2008-04-14 14:03:12.000000000 +0200
+++ tar-1.20-rjg1/src/common.h 2009-11-25 11:24:10.000000000 +0100
@@ -304,6 +304,9 @@
/* Timestamps: */
GLOBAL struct timespec start_time; /* when we started execution */
+GLOBAL struct timespec deterministic_start_time; /* alternative start-time,
+ deterministically computed from
+ contents of filesystem */
GLOBAL struct timespec volume_start_time; /* when the current volume was
opened*/
GLOBAL struct timespec last_stat_time; /* when the statistics was last
@@ -406,6 +409,7 @@
void archive_read_error (void);
off_t seek_archive (off_t size);
void set_start_time (void);
+void update_deterministic_start_time (struct timespec t);
void mv_begin (struct tar_stat_info *st);
void mv_end (void);
diff -Naur tar-1.20/src/create.c tar-1.20-rjg1/src/create.c
--- tar-1.20/src/create.c 2009-11-25 07:45:14.000000000 +0100
+++ tar-1.20-rjg1/src/create.c 2009-11-25 11:24:10.000000000 +0100
@@ -1553,6 +1553,13 @@
if (!is_dir && dump_hard_link (st))
return;
+ /* File will be dumped (albeit with some exceptions) - record time */
+ if (listed_incremental_option)
+ {
+ update_deterministic_start_time(st->mtime);
+ update_deterministic_start_time(st->ctime);
+ }
+
if (is_dir || S_ISREG (st->stat.st_mode) || S_ISCTG (st->stat.st_mode))
{
bool ok;
diff -Naur tar-1.20/src/incremen.c tar-1.20-rjg1/src/incremen.c
--- tar-1.20/src/incremen.c 2008-04-14 14:03:13.000000000 +0200
+++ tar-1.20-rjg1/src/incremen.c 2009-11-25 11:24:10.000000000 +0100
@@ -19,6 +19,7 @@
#include <system.h>
#include <hash.h>
+#include <time.h>
#include <quotearg.h>
#include "common.h"
@@ -1285,11 +1286,37 @@
fprintf (fp, "%s-%s-%d\n", PACKAGE_NAME, PACKAGE_VERSION,
TAR_INCREMENTAL_VERSION);
+ /* A level n+1 dump will include files modified after or *at* the time
+ recorded in the directory file. Therefore, record a timestamp that
+ is higher than the time of the newest file in the archive, but less
+ than the start time of the dump.
+ In order to make the time deterministic, it should be incremented
+ with as little as possible, preferably with the filesystem's time
+ resolution, which might be 1 msec, 1 usec, 1 sec, or 2 sec ! Also,
+ before comparison, the start time of the dump should be rounded down
+ to match the filesystem's resolution.
+ That would add significant complexity. It may even be (probably is)
+ impossible to reliably determine the resolution.
+ So, this code just assumes the resolution is 1 second, and increments
+ the recorded time to the next full second, if that remains less than
+ the start time in seconds.
+ Note that this will break if the filesystem's resolution is larger
+ than 1 second, the result being that modified files may be left out of
+ a level n+1 incremental dump.
+ The more reliable solution would be to record more metadata for every
+ file, but that is a nontrivial change.
+ */
+ if (deterministic_start_time.tv_sec < start_time.tv_sec)
+ {
+ deterministic_start_time.tv_sec += 1;
+ deterministic_start_time.tv_nsec = 0;
+ }
+
s = (TYPE_SIGNED (time_t)
- ? imaxtostr (start_time.tv_sec, buf)
- : umaxtostr (start_time.tv_sec, buf));
+ ? imaxtostr (deterministic_start_time.tv_sec, buf)
+ : umaxtostr (deterministic_start_time.tv_sec, buf));
fwrite (s, strlen (s) + 1, 1, fp);
- s = umaxtostr (start_time.tv_nsec, buf);
+ s = umaxtostr (deterministic_start_time.tv_nsec, buf);
fwrite (s, strlen (s) + 1, 1, fp);
if (! ferror (fp) && directory_table)
diff -Naur tar-1.20/tests/rename01.at tar-1.20-rjg1/tests/rename01.at
--- tar-1.20/tests/rename01.at 2007-06-27 15:30:32.000000000 +0200
+++ tar-1.20-rjg1/tests/rename01.at 2009-11-25 11:24:34.000000000 +0100
@@ -32,6 +32,7 @@
genfile --file foo/file2
mkdir foo/bar
genfile --file foo/bar/file
+sleep 1
echo "Creating base archive"
tar -g incr -cf arch.1 -v foo
-------------------------------------------------------------------------------
--- End Message ---