On Fri, Jan 23, 2009 at 9:34 AM, Raghavendra G
<
address@hidden> wrote:
> Avati,
>
> ls/cd works fine for the test described by Nicolas. In fact, When I killed
> both the glusterfs servers, I got ENOTCONN, but when I started one of the
> servers 'ls' worked fine.
>
> regards,
> On Fri, Jan 23, 2009 at 6:22 AM, Anand Avati <
address@hidden> wrote:
>>
>> Nicolas,
>> Are you running any specific apps on the mountpoint? Or is it just
>> regular ls/cd kind of commands?
>>
>> Raghu,
>> Can you try to reproduce this in our lab?
>>
>> Thanks,
>> Avati
>>
>> On Wed, Jan 21, 2009 at 9:22 PM, nicolas prochazka
>> <
address@hidden> wrote:
>> > Hello,
>> > I think I localise the problem more precisely :
>> >
>> > volume last
>> > type cluster/replicate
>> > subvolumes brick_10.98.98.1 brick_10.98.98.2
>> > end-volume
>> >
>> > if i shutdown 10.98.98.2 , 10.98.98.1 is ok after timeout
>> > if i shutdown 10.98.98.1 , 10.98.98.2 is not ok after timout, it become
>> > ready if 10.98.98.1 comes back
>> >
>> > now if i change to : subvolumes brick_10.98.98.2 brick_10.98.98.1
>> > the situation is inversing.
>> >
>> > In afr doc, you 're telling : default, AFR considers the first subvolume
>> > as
>> > the sole lock server.
>> > perhaps bug comes from here, when lock server down other client does not
>> > work ?
>> >
>> > Regards,
>> > Nicolas Prochazka
>> >
>> >
>> > 2009/1/19 nicolas prochazka <
address@hidden>
>> >>
>> >> it is in private network,
>> >> I'm going to try to simulate this issue in virtual qemu environnement,
>> >> I recontact you ,
>> >> Thanks a lot for your great job.
>> >> Nicolas
>> >>
>> >> 2009/1/19 Anand Avati <
address@hidden>
>> >>>
>> >>> nicolas,
>> >>> It is hard for us to debug with such brief description. Is it
>> >>> possible for us to inspect the system with a remote login while this
>> >>> error is created?
>> >>>
>> >>> avati
>> >>>
>> >>> On Mon, Jan 19, 2009 at 8:32 PM, nicolas prochazka
>> >>> <
address@hidden> wrote:
>> >>> > hi again,
>> >>> > with tla855 , now if i change network card interface ip, 'ls' test
>> >>> > runs
>> >>> > after timeout, so there's a big progress,
>> >>> > but now, if im stopping server with hard powerdown ( swith on/off as
>> >>> > a
>> >>> > crash) , this problem persist , i do not understand différence
>> >>> > between
>> >>> > network cut and powerdown.
>> >>> >
>> >>> > Regards,
>> >>> > Nicolas Prochazka
>> >>> >
>> >>> > 2009/1/19 nicolas prochazka <
address@hidden>
>> >>> >>
>> >>> >> hi,
>> >>> >> Do you more information about this bug ?
>> >>> >> I do not understand how afr works,
>> >>> >> with my initial configuration, if i change ip of network card (
>> >>> >> from
>> >>> >> 10.98.98.2 => 10.98.98.4 ) on server B during test,
>> >>> >> on client and server (A ,C ) 'ls' works after some timeout, but
>> >>> >> some
>> >>> >> program seems to be block all system (
>> >>> >> if i run my own program or qemu for example) 'ls' does not respond
>> >>> >> anymore, and if i rechange from 10.98.98.4 => 10.98.98.2 ) then all
>> >>> >> become
>> >>> >> ok again.
>> >>> >>
>> >>> >> Regards,
>> >>> >> Nicolas Prochazka
>> >>> >>
>> >>> >>
>> >>> >> 2009/1/14 Krishna Srinivas <
address@hidden>
>> >>> >>>
>> >>> >>> Nicolas,
>> >>> >>>
>> >>> >>> It might be a bug. Let me try to reproduce the problem here and
>> >>> >>> get
>> >>> >>> back
>> >>> >>> to you.
>> >>> >>>
>> >>> >>> Krishna
>> >>> >>>
>> >>> >>> On Wed, Jan 14, 2009 at 6:59 PM, nicolas prochazka
>> >>> >>> <
address@hidden> wrote:
>> >>> >>> > hello again,
>> >>> >>> > To finish with this issue and information I can send you :
>> >>> >>> > If i stop glusterfsd ( on server B) before to stop this server
>> >>> >>> > (
>> >>> >>> > hard
>> >>> >>> > poweroff by pressed on/off ) , the problem does not occur. If i
>> >>> >>> > hard
>> >>> >>> > poweroff without stop gluster ( a real crash ) problem occur .
>> >>> >>> > Regards
>> >>> >>> > Nicolas Prochazka.
>> >>> >>> >
>> >>> >>> > 2009/1/14 nicolas prochazka <
address@hidden>
>> >>> >>> >>
>> >>> >>> >> hi again,
>> >>> >>> >> I continue my tests and :
>> >>> >>> >> In my case, if one file is open on gluster mount during stop of
>> >>> >>> >> one
>> >>> >>> >> afr
>> >>> >>> >> server,
>> >>> >>> >> gluster mount can not be acces ( gap ? ) in this server. All
>> >>> >>> >> other
>> >>> >>> >> client
>> >>> >>> >> ( C for example) which not opening file during stop, isn't
>> >>> >>> >> affect,
>> >>> >>> >> i
>> >>> >>> >> can do
>> >>> >>> >> a ls or open after transport timeout time.
>> >>> >>> >> If i kill the process that's use this file, then i can using
>> >>> >>> >> gluster
>> >>> >>> >> mount
>> >>> >>> >> point without problem.
>> >>> >>> >>
>> >>> >>> >> Regards,
>> >>> >>> >> Nicolas Prochazka.
>> >>> >>> >>
>> >>> >>> >> 2009/1/12 nicolas prochazka <
address@hidden>
>> >>> >>> >>>
>> >>> >>> >>> for your attention,
>> >>> >>> >>> it seems that's this problem occur only when files is open and
>> >>> >>> >>> use
>> >>> >>> >>> and
>> >>> >>> >>> gluster mount point .
>> >>> >>> >>> I use big files of computation ( ~ 10G) with in the most
>> >>> >>> >>> important
>> >>> >>> >>> part,
>> >>> >>> >>> read. In this case problem occurs.
>> >>> >>> >>> If i using only small files which create only some time, no
>> >>> >>> >>> problem
>> >>> >>> >>> occur, gluster mount can use other afr server.
>> >>> >>> >>>
>> >>> >>> >>> Regards,
>> >>> >>> >>> Nicolas Prochazka
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>>
>> >>> >>> >>> 2009/1/12 nicolas prochazka <
address@hidden>
>> >>> >>> >>>>
>> >>> >>> >>>> Hi,
>> >>> >>> >>>> I'm tryning to set
>> >>> >>> >>>> option transport-timeout 5
>> >>> >>> >>>> in protocol/client
>> >>> >>> >>>>
>> >>> >>> >>>> so a max of 10 seconds before restoring gluster in normal
>> >>> >>> >>>> situation
>> >>> >>> >>>> ?
>> >>> >>> >>>> no success, i always in the same situation, a 'ls
>> >>> >>> >>>> /mnt/gluster'
>> >>> >>> >>>> not
>> >>> >>> >>>> respond after > 10 mins
>> >>> >>> >>>> I can not reuse glustermount exept kill glusterfs process.
>> >>> >>> >>>>
>> >>> >>> >>>> Regards
>> >>> >>> >>>> Nicolas Prochazka
>> >>> >>> >>>>
>> >>> >>> >>>>
>> >>> >>> >>>>
>> >>> >>> >>>> 2009/1/12 Raghavendra G <
address@hidden>
>> >>> >>> >>>>>
>> >>> >>> >>>>> Hi Nicolas,
>> >>> >>> >>>>>
>> >>> >>> >>>>> how much time did you wait before concluding the mount point
>> >>> >>> >>>>> to
>> >>> >>> >>>>> be
>> >>> >>> >>>>> not
>> >>> >>> >>>>> working? afr waits for a maximum of (2 * transport-timeout)
>> >>> >>> >>>>> seconds
>> >>> >>> >>>>> before
>> >>> >>> >>>>> returning sending reply to the application. Can you wait for
>> >>> >>> >>>>> some
>> >>> >>> >>>>> time and
>> >>> >>> >>>>> check out is this the issue you are facing?
>> >>> >>> >>>>>
>> >>> >>> >>>>> regards,
>> >>> >>> >>>>>
>> >>> >>> >>>>> On Mon, Jan 12, 2009 at 7:49 PM, nicolas prochazka
>> >>> >>> >>>>> <
address@hidden> wrote:
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> Hi.
>> >>> >>> >>>>>> I've installed this model to test Gluster :
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> + 2 servers ( A B )
>> >>> >>> >>>>>> - with glusterfsd server (
>> >>> >>> >>>>>> glusterfs--mainline--3.0--patch-842 )
>> >>> >>> >>>>>> - with glusterfs client
>> >>> >>> >>>>>> server conf file .
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> + 1 server C only client mode.
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> My issue :
>> >>> >>> >>>>>> If C open big file in this client configuration and then i
>> >>> >>> >>>>>> stop
>> >>> >>> >>>>>> server
>> >>> >>> >>>>>> A (or B )
>> >>> >>> >>>>>> gluster mount point on server C seems to be block, i can
>> >>> >>> >>>>>> not
>> >>> >>> >>>>>> do
>> >>> >>> >>>>>> 'ls
>> >>> >>> >>>>>> -l' for example.
>> >>> >>> >>>>>> Is a this thing is normal ? as C open his file on A or B ,
>> >>> >>> >>>>>> then it
>> >>> >>> >>>>>> is
>> >>> >>> >>>>>> blocking when server down ?
>> >>> >>> >>>>>> I was thinking in client AFR, client can reopen file/block
>> >>> >>> >>>>>> an
>> >>> >>> >>>>>> other
>> >>> >>> >>>>>> server , i'm wrong ?
>> >>> >>> >>>>>> Should use HA translator ?
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> Regards,
>> >>> >>> >>>>>> Nicolas Prochazka.
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> volume brickless
>> >>> >>> >>>>>> type storage/posix
>> >>> >>> >>>>>> option directory /mnt/disks/export
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> volume brick
>> >>> >>> >>>>>> type features/posix-locks
>> >>> >>> >>>>>> option mandatory on # enables mandatory locking on
>> >>> >>> >>>>>> all
>> >>> >>> >>>>>> files
>> >>> >>> >>>>>> subvolumes brickless
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> volume server
>> >>> >>> >>>>>> type protocol/server
>> >>> >>> >>>>>> subvolumes brick
>> >>> >>> >>>>>> option transport-type tcp
>> >>> >>> >>>>>> option auth.addr.brick.allow 10.98.98.*
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>> ---------------------------
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> client config
>> >>> >>> >>>>>> volume brick_10.98.98.1
>> >>> >>> >>>>>> type protocol/client
>> >>> >>> >>>>>> option transport-type tcp/client
>> >>> >>> >>>>>> option remote-host 10.98.98.1
>> >>> >>> >>>>>> option remote-subvolume brick
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> volume brick_10.98.98.2
>> >>> >>> >>>>>> type protocol/client
>> >>> >>> >>>>>> option transport-type tcp/client
>> >>> >>> >>>>>> option remote-host 10.98.98.2
>> >>> >>> >>>>>> option remote-subvolume brick
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> volume last
>> >>> >>> >>>>>> type cluster/replicate
>> >>> >>> >>>>>> subvolumes brick_10.98.98.1 brick_10.98.98.2
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> volume iothreads
>> >>> >>> >>>>>> type performance/io-threads
>> >>> >>> >>>>>> option thread-count 2
>> >>> >>> >>>>>> option cache-size 32MB
>> >>> >>> >>>>>> subvolumes last
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> volume io-cache
>> >>> >>> >>>>>> type performance/io-cache
>> >>> >>> >>>>>> option cache-size 1024MB # default is 32MB
>> >>> >>> >>>>>> option page-size 1MB #128KB is default option
>> >>> >>> >>>>>> option force-revalidate-timeout 2 # default is 1
>> >>> >>> >>>>>> subvolumes iothreads
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> volume writebehind
>> >>> >>> >>>>>> type performance/write-behind
>> >>> >>> >>>>>> option aggregate-size 256KB # default is 0bytes
>> >>> >>> >>>>>> option window-size 3MB
>> >>> >>> >>>>>> option flush-behind on # default is 'off'
>> >>> >>> >>>>>> subvolumes io-cache
>> >>> >>> >>>>>> end-volume
>> >>> >>> >>>>>>
>> >>> >>> >>>>>>
>> >>> >>> >>>>>> _______________________________________________
>> >>> >>> >>>>>> Gluster-devel mailing list
>> >>> >>> >>>>>>
address@hidden
>> >>> >>> >>>>>>
http://lists.nongnu.org/mailman/listinfo/gluster-devel
>> >>> >>> >>>>>>
>> >>> >>> >>>>>
>> >>> >>> >>>>>
>> >>> >>> >>>>>
>> >>> >>> >>>>> --
>> >>> >>> >>>>> Raghavendra G
>> >>> >>> >>>>>
>> >>> >>> >>>>
>> >>> >>> >>>
>> >>> >>> >>
>> >>> >>> >
>> >>> >>> >
>> >>> >>> > _______________________________________________
>> >>> >>> > Gluster-devel mailing list
>> >>> >>> >
address@hidden
>> >>> >>> >
http://lists.nongnu.org/mailman/listinfo/gluster-devel
>> >>> >>> >
>> >>> >>> >
>> >>> >>
>> >>> >
>> >>> >
>> >>> > _______________________________________________
>> >>> > Gluster-devel mailing list
>> >>> >
address@hidden
>> >>> >
http://lists.nongnu.org/mailman/listinfo/gluster-devel
>> >>> >
>> >>> >
>> >>
>> >
>> >
>
>
>
> --
> Raghavendra G
>
>