Re: [Gluster-devel] Selfheal is not working? Once more

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Selfheal is not working? Once more

From:	Kevan Benson
Subject:	Re: [Gluster-devel] Selfheal is not working? Once more
Date:	Wed, 30 Jul 2008 14:42:07 -0700
User-agent:	Thunderbird 2.0.0.14 (X11/20080421)


Previous quotes posts removed for brevity...

Martin Fick wrote:

It does seem like it would be fairly easy to add anothermetadata attribute to each file/directory that would holda checksum for it. This way, AFR itself could beconfigured to check/compute the checksum anytime the fileis read/written. Since this would slow AFR down, I would
suggest a configuration option to turn this on.  If the
checksum is wrong, it could heal to the version of the
other brick if the other brick's checksum is correct.
Another alternative would be to create an offlinechecksummer that updates such an attribute if it does not
exist, and checks the checksum if it does exist.  If when
it checks the checksum it fails, it would simply delete the
file and its attributes (and potentially the directory
attributes up the tree) so that AFR will then heal it.

The only modification needed by AFR to support this
would be to delete the checksum attribute anytime thefile/directory is updated so that the offline checksummerwill recreate it instead of thinking it is corrupt.In fact, even this could be eliminated so that theoffline checksummer is completely "self-powered",anytime it calculates a checksum it could copy theglusterfs version and timestamp attributes to two new"checksummer" attributes. If these become out of date thecheksummer will know to recompute the checksum instead of
assuming that the file has been corrupted.

The one risk with this is that if a file gets corrupted
on both nodes, it will get deleted on both nodes so youwill not have a corrupted file to at least look at.This too could be overcome by saving any deleted filesin a separate "trash can" and cleaning the trash canonce the files in it have been healed, sort of a self cleaning lost+found directory.
I know this may not be the answers that you werelooking for, but I hope it helps clarify thingsa little.

A while back I seem to remember someone talking about eventuallycreating a fsck.glusterfs utility. Since underlying server nodecorruption would (hopefully) not be a common problem, it seems like aspecific tool that could be run when prudent would be a good approach.If the underlying data is suspected of corruption on a node, run thenormal fsck on that node, then the fsck.glusterfs on the share utilitywhich can utilize a much more comprehensive set of checks and repairsthan would be feasible in normal AFR file processing.


--

-Kevan Benson
-A-1 Networks

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Selfheal is not working? Once more, Łukasz Osipiuk, 2008/07/30
- Re: [Gluster-devel] Selfheal is not working? Once more, Martin Fick, 2008/07/30
  - Re: [Gluster-devel] Selfheal is not working? Once more, Kevan Benson <=
    - Re: [Gluster-devel] Selfheal is not working? Once more, Kevan Benson, 2008/07/30
  - Re: [Gluster-devel] Selfheal is not working? Once more, Łukasz Osipiuk, 2008/07/30
    - Re: [Gluster-devel] Selfheal is not working? Once more, Raghavendra G, 2008/07/31
    - Re: [Gluster-devel] Selfheal is not working? Once more, Łukasz Osipiuk, 2008/07/31
    - RE: [Gluster-devel] Selfheal is not working? Once more, Rohan, 2008/07/31
    - Re: [Gluster-devel] Selfheal is not working? Once more, Łukasz Osipiuk, 2008/07/31
    - RE: [Gluster-devel] Selfheal is not working? Once more, Rohan, 2008/07/31

Prev by Date: Re: [Gluster-devel] Selfheal is not working? Once more
Next by Date: Re: [Gluster-devel] Selfheal is not working? Once more
Previous by thread: Re: [Gluster-devel] Selfheal is not working? Once more
Next by thread: Re: [Gluster-devel] Selfheal is not working? Once more
Index(es):
- Date
- Thread