Re: [Gluster-devel] Self-heal with partial files

gluster-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Self-heal with partial files

From:	Kevan Benson
Subject:	Re: [Gluster-devel] Self-heal with partial files
Date:	Thu, 04 Oct 2007 10:32:25 -0700
User-agent:	Thunderbird 2.0.0.6 (X11/20070728)

Krishna Srinivas wrote:

On 10/4/07, Kevan Benson <address@hidden> wrote:

Is self heal supposed to work with partial files?  I have an issue where
self-heal isn't happening on some servers with AFR and unify in a HA
setup I developed.  Two servers, two clients, all AFR and unify done on
client side.

If I kill a connection while a large file is being written, the
glusterfs mount waits the appropriate timeout period (10 seconds in my
case) and then finishes writing the file to the still active server.
This results in a full file on one server and a partial file on the
other (the one I stopped traffic to temporarily to simulate a
crash/network problem).  If I then enable the disabled server and read
data from the problematic file, it doesn't self-heal  itself and move
the full file to the server with the partial file.

Anything written entirely while a server is offline (i.e. the offline
server has no knowledge of it) is correctly created on read from the
file, so the problem seems to be related to files that are partially
written to one server.

Can someone comment on the particular conditions that cause a self
heal?  Is there something I can do to force it to self heal at this
point (I repeat that reading data from the file does not work).  I know
I can use rsync and some foo to fix this, but that becomes less and less
feasible as the mount size grows and the time for rsync to compare sides
lengthens.


_______________________________________________
Gluster-devel mailing list
address@hidden
http://lists.nongnu.org/mailman/listinfo/gluster-devel


Hi Kevan,

It should have worked fine in your case. What version of glusterfs are
you using? Just before you do the second read (or open rather) which
should have triggered self-heal can you do getfattr -n trusted.afr.version <>
on the partial file and also the full file  in the backend and give the output?

Thanks
Krishna


Glusterfs TLA 504, fuse-2.7.0-gfs4.

The trusted.afr.version attribute doesn't exist on the partial file, itdoes exist on the complete file (with value "1"). From what I justtested, it doesn't look like it's set until the file operation iscomplete (it doesn't exist during writing). Are files without thisattribute assumed to have a value of "0" or something to ensure thatthey participate in self-heal correctly?

It doesn't look like it, as if I append data to the file, the partialversion gets assigned a trusted.afr.version=1, while the complete file'strusted.afr.version is incremented to 2. Self heal now works for thatfile, and on a read of file data the partial file is updated with alldata and the trusted.afr.version is set to 2.

[Prev in Thread]

Current Thread

[Next in Thread]

[Gluster-devel] Self-heal with partial files, Kevan Benson, 2007/10/03
- Re: [Gluster-devel] Self-heal with partial files, Krishna Srinivas, 2007/10/04
  - Re: [Gluster-devel] Self-heal with partial files, Kevan Benson <=
    - Re: [Gluster-devel] Self-heal with partial files, Krishna Srinivas, 2007/10/06
    - Re: [Gluster-devel] Self-heal with partial files, Raghavendra G, 2007/10/09

Prev by Date: Re: [Gluster-devel] stability and configurations
Next by Date: Re: [Gluster-devel] stability and configurations
Previous by thread: Re: [Gluster-devel] Self-heal with partial files
Next by thread: Re: [Gluster-devel] Self-heal with partial files
Index(es):
- Date
- Thread