gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] gluster doesn't like Oracle's FSINFO RPC call


From: Niels de Vos
Subject: Re: [Gluster-devel] gluster doesn't like Oracle's FSINFO RPC call
Date: Sat, 13 Apr 2013 00:50:33 +0200
User-agent: Mutt/1.5.20 (2009-12-10)

On Fri, Apr 12, 2013 at 03:58:04PM -0400, Michael Brown wrote:
> KERBOOM
> 
> address@hidden ~]$ sudo mount -a -t nfs
> [sudo] password for michael:
> mount: fearless1:/gv0 failed, reason given by server: No such file or
> directory
> mount: fearless1:/gv0/fleming1/db0/ALTUS_config failed, reason given by
> server: unknown nfs status return value: 22
> mount: fearless1:/gv0/fleming1/db0/ALTUS_data failed, reason given by
> server: unknown nfs status return value: 22
> mount: fearless1:/gv0/fleming1/db0/ALTUS_flash failed, reason given by
> server: unknown nfs status return value: 22
> mount.nfs: mount point /db/flash_recovery_area/ALTUS/onlinelog does not
> exist
> 
> nfs.log:
> [2013-04-12 15:55:16.507084] E [nfs3.c:305:__nfs3_get_volume_id]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo+0x22c)
> [0x7f45bfbb852c]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo_reply+0x29)
> [0x7f45bfbb2ce9]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51)
> [0x7f45bfbb2481]))) 0-nfs-nfsv3: invalid argument: xl
> [2013-04-12 15:55:16.538560] E [nfs3.c:4706:nfs3_fsinfo] 0-nfs-nfsv3:
> Bad Handle
> [2013-04-12 15:55:16.538580] W [nfs3-helpers.c:3389:nfs3_log_common_res]
> 0-nfs-nfsv3: XID: 242c1550, FSINFO: NFS: 10001(Illegal NFS file handle),
> POSIX: 14(Bad address)
> [2013-04-12 15:55:16.538617] E [nfs3.c:305:__nfs3_get_volume_id]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo+0x22c)
> [0x7f45bfbb852c]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_fsinfo_reply+0x29)
> [0x7f45bfbb2ce9]
> (-->/usr/lib64/glusterfs/3.3.1/xlator/nfs/server.so(nfs3_request_xlator_deviceid+0x51)
> [0x7f45bfbb2481]))) 0-nfs-nfsv3: invalid argument: xl
> 
> (I tried both with and without modifying your uint32_t size to a
> 'int32_t size' to correct the signedness of the argument)
> 
> Get ahold of me in IRC and let's get this figured out. I've got a
> debugger attached.

23:51 < ndevos> Supermathie: ah, I've thought of the error in my 
   suggestion - that function is used to encode and decode
23:52 < ndevos> which means, that the size parameter must be set 
   correctly - the .data_len attribute contain the size when encoding, 
   and should be overwritten when decoding
23:53 < ndevos> KERBOOM happens when an idea is only half looked at :-/

Maybe something the attached patch works better? It should encode/decode 
both the length and the fhandle value. Compile tested only.

Niels

> 
> M.
> 
> On 13-04-12 11:32 AM, Niels de Vos wrote:
> > On Fri, Apr 12, 2013 at 05:23:08PM +0200, Niels de Vos wrote:
> >> On Thu, Apr 11, 2013 at 12:37:30PM -0400, Michael Brown wrote:
> >>> That actually broke everything (including Linux trying to mount NFS).
> >>>
> >>> I've modified it slightly to be:
> >>>
> >>> bool_t
> >>> xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp)
> >>> {
> >>>         if (!xdr_bytes (xdrs, (char **)&objp->data.data_val, (u_int *)
> >>> &objp->data.data_len, NFS3_FHSIZE))
> >>>                 if (!xdr_opaque (xdrs, &objp, (u_int *)
> >>> &objp->data.data_len))
> >>>                         return FALSE;
> >>>         return TRUE;
> >>> }
> >>>
> >>> (i.e. only call the xdr_opaque function if the xdr_bytes decode fails)
> >> Nah, that won't work. The xdr_* functions are modifying the position of 
> >> the cursor in the XDR-stream. Subsequent reads will continue where the 
> >> previous one finished.
> >>
> >> What you probably need to do is something like this:
> >>
> >> xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp)
> >> {
> >>    uint32_t size;
> >>
> >>    if (!xdr_int (xdrs, &size))
> >>            if (!xdr_opaque (xdrs, (u_int *)&objp->data.data_len, size))
> > ^ that should be objp->data.data_val of course :-/
> >
> >>                    return FALSE
> >>    return TRUE;
> >> }
> >>
> >> That will read the size of the fhandle first, to determine how long the 
> >> opaque 
> >> fhandle is, and use that size to read it.
> >>
> >> Cheers,
> >> Niels
> >>
> >>> But I get no change in behaviour.
> >>>
> >>> Also get these warnings:
> >>>
> >>> xdr-nfs3.c: In function 'xdr_nfs_fh3':
> >>> xdr-nfs3.c:197: warning: passing argument 2 of 'xdr_opaque' from
> >>> incompatible pointer type
> >>> /usr/include/rpc/xdr.h:313: note: expected 'caddr_t' but argument is of
> >>> type 'struct nfs_fh3 **'
> >>> xdr-nfs3.c:197: warning: passing argument 3 of 'xdr_opaque' makes
> >>> integer from pointer without a cast
> >>> /usr/include/rpc/xdr.h:313: note: expected 'u_int' but argument is of
> >>> type 'u_int *'
> >>>
> >>> M.
> >>>
> >>> On 13-04-11 07:42 AM, Niels de Vos wrote:
> >>>> My guess is that this (untested) change would fix it, can you try that?
> >>>>
> >>>> --- a/rpc/xdr/src/xdr-nfs3.c
> >>>> +++ b/rpc/xdr/src/xdr-nfs3.c
> >>>> @@ -184,7 +184,7 @@ xdr_specdata3 (XDR *xdrs, specdata3 *objp)
> >>>>  bool_t
> >>>>  xdr_nfs_fh3 (XDR *xdrs, nfs_fh3 *objp)
> >>>>  {
> >>>> -         if (!xdr_bytes (xdrs, (char **)&objp->data.data_val, (u_int *) 
> >>>> &objp->data.data_len, NFS3_FHSIZE))
> >>>> +         if (!xdr_opaque (xdrs, &objp, (u_int *) &objp->data.data_len))
> >>>>                   return FALSE;
> >>>>          return TRUE;
> >>>>  }
> >>>>
> >>>>
> >>>> HTH,
> >>>> Niels
> >>>>
> >>>>> All I get out of gluster is:
> >>>>> [2013-04-08 12:54:32.206312] E [nfs3.c:4741:nfs3svc_fsinfo] 0-nfs-nfsv3:
> >>>>> Error decoding arguments
> >>>>>
> >>>>>
> >>>>> I've attached abridged packet captures and text explanations of the
> >>>>> packets (thanks to wireshark).
> >>>>>
> >>>>> Can someone please look at this and determine if it's gluster's parsing
> >>>>> of the RPC call to blame, or if it's Oracle?
> >>>>>
> >>>>> This is the same setup on which I reported the NFS race condition bug.
> >>>>> It does have that patch applied.
> >>>>> Details:
> >>>>> http://lists.gnu.org/archive/html/gluster-devel/2013-04/msg00014.html
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Michael
> >>>>>
> >>>>> -- 
> >>>>> Michael Brown               | `One of the main causes of the fall of
> >>>>> Systems Consultant          | the Roman Empire was that, lacking zero,
> >>>>> Net Direct Inc.             | they had no way to indicate successful
> >>>>> ?: +1 519 883 1172 x5106    | termination of their C programs.' - Firth
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>> _______________________________________________
> >>>>> Gluster-devel mailing list
> >>>>> address@hidden
> >>>>> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> >>>
> >>> -- 
> >>> Michael Brown               | `One of the main causes of the fall of
> >>> Systems Consultant          | the Roman Empire was that, lacking zero,
> >>> Net Direct Inc.             | they had no way to indicate successful
> >>> ☎: +1 519 883 1172 x5106    | termination of their C programs.' - Firth
> >>>
> >> -- 
> >> Niels de Vos
> >> Sr. Software Maintenance Engineer
> >> Support Engineering Group
> >> Red Hat Global Support Services
> >>
> >> _______________________________________________
> >> Gluster-devel mailing list
> >> address@hidden
> >> https://lists.nongnu.org/mailman/listinfo/gluster-devel
> 
> 
> -- 
> Michael Brown               | `One of the main causes of the fall of
> Systems Consultant          | the Roman Empire was that, lacking zero,
> Net Direct Inc.             | they had no way to indicate successful
> ☎: +1 519 883 1172 x5106    | termination of their C programs.' - Firth
> 

-- 
Niels de Vos
Sr. Software Maintenance Engineer
Support Engineering Group
Red Hat Global Support Services

Attachment: 0001-nfs-encode-decode-fhandles-as-opaque-and-not-as-byte.patch
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]