[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: cvs repository files being corrupted
From: |
Mark D. Baushke |
Subject: |
Re: cvs repository files being corrupted |
Date: |
Mon, 13 Feb 2006 09:11:32 -0800 |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Jeremy Birch <jeremy.birch@pulsic.com> writes:
> Dear CVS,
>
> We have a fairly large CVS repository with a few hundred binary files
> and thousands of ascii ones. On a fairly regular basis (and
> increasingly over the last month or so) we have noticed repository
> (*,v) files being corrupted and thus failing to checkout.
>
> The symptom is that the ,v file has a large amount of NULLs (several
> hundred) just after the @@ marker in the file, ie something that
> should look a bit like
>
> <snip>
> desc
> @@
>
>
> 1.69
> log
> @#3077:UNTESTABLE dont move track unless its width is increasing
> @
> text
> @ <the whole of the HEAD version of this file>
> <snip>
>
> looks more like
> <snip>
> desc
> @@
> <loads of NULLs>
> <the HEAD version missing the first N characters, where N seems to be
> roughly the same as the number of NULLs>
> <snip>
I have not seen this problem for many years (circa cvs 1.9). The last
time I saw it, the users where checking out the files over NFS and UDP
checksums were not enabled and there was a flakey ethernet bridge in the
network between the clients and the NFS home of the repository. The NUL
bytes were always in multiples of the UDP block size (typically 8192
bytes).
This is one of the reasons that I recommend that people never use NFS
as part of the component for their CVS repository.
>
>
> Obviously this is a real pain, having to retrieve a backed up version
> and re-apply whatever changes have got screwed in the meanwhile - this
> is doubly a pain for large binary files which seem to be more
> vulnerable.
>
> We thought initially this was only triggered by tagging, but we have
> seen it also happen just on commits of individual ascii files.
>
> The server is on linux, the clients are on linux and XP generally - we
> have seen corruption caused by both sets of clients, so the common
> factor is the server.
Where is the repository itself hosted? Is the disk for the repository
located on disk local to your GNU/Linux server? Or, is it mounted on
NFS or Samba or some other remote filesytem?
If you are indeed using local disk, then I would suspect the disk
controller on the system.
>
> I have looked through the bug email list and the issues fixed in
> recent versions and nothing looks like this issue (other than some
> suggestion that our machines memory is dodgy which does not convince
> me).
>
> What it feels like is that locking is failing ie two server processes
> are accessing the file at a critical time and the file is truncated
> (by the other process) part of the way through the write by the main
> process - this gives the packing with NULLs, and the tail of correct
> data.
This is not how things work. New files are created by doing an exclusive
write to open a new file (,filename,) and then writing the updated
contents of the filename,v into that file. After everything is complete,
the ,filename, is renamed as filename,v ... In normal repositories, the
CVS locks are kept in the same directory as the filename,v file. If you
have a LockDir specified, it might be in another directory, but it
should still do the right thing if you are using the client/server :ext:
method.
The fact that the number of NUL bytes equals the number of missing bytes
implies that you are getting a corruption in a full file block at a
time. If the number of NUL bytes is a multiple of 512 and the location
of the start of the corruption was a multiple of 512, then it is
possible that you have a flakey disk subsystem as well.
> Is this a plausible explanation?
No.
> Is this a known bug?
Not this particular bug.
This bug:
* Thanks to Rahul Bhargava <rahul@wandisco.com>, heavily loaded systems
suffering from a disk crash or power failure will not lose data they claimed
to have committed.
was fixed in cvs 1.12.13 and cvs 1.11.21.
> How do we fix it?
Provide more information about your setup.
>
> We are currently using version 1.11.14 for our server, and will be
> trying 1.11.21 instead to see if that fixes it, but we can see no
> entry in the release notes covering this issue.
Well, 1.11.21 does fix some corruptions by forcing a sync to the
filesystem.
-- Mark
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (FreeBSD)
iD8DBQFD8L3ECg7APGsDnFERAhPGAJ91yVXoIdtfhf2JQg09i7wRB929qACgsmp6
YFRxeAlsj4eowaPRlLXzCg8=
=m3Cm
-----END PGP SIGNATURE-----