[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gluster-devel] The return of the all-null pending matrix
From: |
Emmanuel Dreyfus |
Subject: |
Re: [Gluster-devel] The return of the all-null pending matrix |
Date: |
Tue, 23 Jul 2013 02:19:34 +0200 |
User-agent: |
MacSOUP/2.7 (unregistered for 2376 days) |
Vijay Bellur <address@hidden> wrote:
> I have not been able to re-create the problem in my setup. I think it
> would be a good idea to track this bug and address it. For now, can we
> not use the volume set mechanism to disable eager-locking?
Our exchanges have gone off list after this message. I repost here
the 100k last lines of log with debug mode:
http://ftp.espci.fr/shadow/manu/log
relevant part:
[2013-07-22 15:36:22.923866] D [afr-lk-common.c:447:transaction_lk_op]
0-gfs34-replicate-0: lk op is for a transaction
[2013-07-22 15:36:22.924484] D [client-rpc-fops.c:2789:client_fdctx_destroy]
0-gfs34-client-0: sending release on fd
[2013-07-22 15:36:22.924560] D [client-rpc-fops.c:2789:client_fdctx_destroy]
0-gfs34-client-1: sending release on fd
[2013-07-22 15:36:22.943156] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1:
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:22.943202] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1:
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:22.943236] D [afr-self-heal-common.c:887:afr_mark_sources]
0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:22.943271] D
[afr-self-heal-data.c:794:afr_lookup_select_read_child_by_txn_type]
0-gfs34-replicate-1: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po:
Possible split-brain
[2013-07-22 15:36:22.943305] D
[afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]
0-gfs34-replicate-1: returning read_child: 1
[2013-07-22 15:36:22.943336] D [afr-common.c:1380:afr_lookup_select_read_child]
0-gfs34-replicate-1: Source selected as 1 for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:22.943374] D
[afr-common.c:1117:afr_lookup_build_response_params] 0-gfs34-replicate-1:
Building lookup response from 1
[2013-07-22 15:36:22.943409] D [afr-common.c:1265:afr_detect_self_heal_by_iatt]
0-gfs34-replicate-1: size differs for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:22.943444] D
[afr-common.c:1291:afr_detect_self_heal_by_split_brain_status]
0-gfs34-replicate-1: split brain detected during lookup of
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po.
[2013-07-22 15:36:22.943478] D [afr-common.c:1426:afr_launch_self_heal]
0-gfs34-replicate-1: background data self-heal triggered. path:
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po, reason:
lookup detected pending operations
[2013-07-22 15:36:23.272807] D
[afr-self-heal-metadata.c:486:afr_sh_metadata_post_nonblocking_inodelk_cbk]
0-gfs34-replicate-1: Non Blocking metadata inodelks done for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po. Proceeding to FOP
[2013-07-22 15:36:23.272868] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool
is full. Callocing mem
[2013-07-22 15:36:23.272900] D
[afr-self-heal-common.c:1930:afr_sh_common_lookup] 0-gfs34-replicate-1: looking
up /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po on subvolume gfs34-client-2
[2013-07-22 15:36:23.272986] D
[afr-self-heal-common.c:1930:afr_sh_common_lookup] 0-gfs34-replicate-1: looking
up /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po on subvolume gfs34-client-3
[2013-07-22 15:36:23.273596] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool
is full. Callocing mem
[2013-07-22 15:36:23.273752] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool
is full. Callocing mem
[2013-07-22 15:36:23.273792] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1:
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.273829] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1:
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.273862] D [afr-self-heal-common.c:887:afr_mark_sources]
0-gfs34-replicate-1: Number of sources: 2
[2013-07-22 15:36:23.273895] D [afr-lk-common.c:452:transaction_lk_op]
0-gfs34-replicate-1: lk op is for a self heal
[2013-07-22 15:36:23.276705] D
[afr-self-heal-metadata.c:61:afr_sh_metadata_done] 0-gfs34-replicate-1:
proceeding to data check on /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.278390] D
[afr-self-heal-data.c:1158:afr_sh_data_post_nonblocking_inodelk_cbk]
0-gfs34-replicate-1: Non Blocking data inodelks done for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po by 5c3e47ba. Proceeding to
self-heal
[2013-07-22 15:36:23.278520] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool
is full. Callocing mem
[2013-07-22 15:36:23.278540] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool
is full. Callocing mem
[2013-07-22 15:36:23.280422] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool
is full. Callocing mem
[2013-07-22 15:36:23.281824] D [mem-pool.c:422:mem_get] 0-mem-pool: Mem pool
is full. Callocing mem
[2013-07-22 15:36:23.282746] D
[afr-self-heal-data.c:686:afr_sh_data_fxattrop_fstat_done] 0-gfs34-replicate-1:
Pending matrix for: 5c3e47ba
[2013-07-22 15:36:23.282798] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1:
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.282831] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1:
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.282862] D [afr-self-heal-common.c:887:afr_mark_sources]
0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:23.282897] E
[afr-self-heal-common.c:197:afr_sh_print_split_brain_log] 0-gfs34-replicate-1:
Unable to self-heal contents of
'/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po'
(possible split-brain). Please delete the file from all but the preferred
subvolume.- Pending matrix: [ [ 0 0 ] [ 0 0 ] ]
[2013-07-22 15:36:23.282931] D [afr-self-heal-data.c:336:afr_sh_data_fail]
0-gfs34-replicate-1: finishing failed data selfheal of
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.282962] D [afr-lk-common.c:452:transaction_lk_op]
0-gfs34-replicate-1: lk op is for a self heal
[2013-07-22 15:36:23.283575] E
[afr-self-heal-common.c:2212:afr_self_heal_completion_cbk] 0-gfs34-replicate-1:
background data self-heal failed on
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.283636] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1:
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.283669] D
[afr-self-heal-common.c:138:afr_sh_print_pending_matrix] 0-gfs34-replicate-1:
pending_matrix: [ 0 0 ]
[2013-07-22 15:36:23.283700] D [afr-self-heal-common.c:887:afr_mark_sources]
0-gfs34-replicate-1: Number of sources: -1
[2013-07-22 15:36:23.283730] D
[afr-self-heal-data.c:794:afr_lookup_select_read_child_by_txn_type]
0-gfs34-replicate-1: /manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po:
Possible split-brain
[2013-07-22 15:36:23.283763] D
[afr-self-heal-data.c:825:afr_lookup_select_read_child_by_txn_type]
0-gfs34-replicate-1: returning read_child: 1
[2013-07-22 15:36:23.283794] D [afr-common.c:1380:afr_lookup_select_read_child]
0-gfs34-replicate-1: Source selected as 1 for
/manu/netbsd/usr/src/lib/libterminfo/obj/tparm.po
[2013-07-22 15:36:23.283828] D
[afr-common.c:1117:afr_lookup_build_response_params] 0-gfs34-replicate-1:
Building lookup response from 1
[2013-07-22 15:36:23.284755] W [afr-open.c:213:afr_open] 0-gfs34-replicate-1:
failed to open as split brain seen, returning EIO
--
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
address@hidden
- Re: [Gluster-devel] The return of the all-null pending matrix, (continued)
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/13
- Re: [Gluster-devel] The return of the all-null pending matrix, Vijay Bellur, 2013/07/14
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/14
- Re: [Gluster-devel] The return of the all-null pending matrix, Vijay Bellur, 2013/07/14
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/14
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/15
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/15
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/18
- Re: [Gluster-devel] The return of the all-null pending matrix, Vijay Bellur, 2013/07/18
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/18
- Re: [Gluster-devel] The return of the all-null pending matrix,
Emmanuel Dreyfus <=
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/26
- Re: [Gluster-devel] The return of the all-null pending matrix, Anand Avati, 2013/07/28
- Re: [Gluster-devel] The return of the all-null pending matrix, Emmanuel Dreyfus, 2013/07/28