gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Patch for "Striped" read from AFR volumes


From: Krishna Srinivas
Subject: Re: [Gluster-devel] Patch for "Striped" read from AFR volumes
Date: Tue, 1 Jan 2008 00:50:24 +0530

No, even for large files associating a file with a node will be better
conceptually.
(we need to take advantage of kernel read ahead algorithm)

The case where this will fail is if majority of the files that are opened get
scheduled to the same node (as we decide the read child based on the
inode number)
But by taking the "law of averages" this is fine and will work quite well.

Thanks
Krishna

On Jan 1, 2008 12:21 AM, Gareth Bult <address@hidden> wrote:
> Hi,
>
> If I understand what you're saying, you effectively want to tie a specific 
> file to a specific node?
>
> The proposed patch (at the very least - in concept) might work well for large 
> files .. but if you tie files to nodes, a large file would not gain any of 
> the benefit of these striped reads ...
>
> ???
>
>
>
> ----- Original Message -----
> From: "Krishna Srinivas" <address@hidden>
> To: "Csibra Gergo" <address@hidden>
> Cc: address@hidden
> Sent: Monday, December 31, 2007 6:14:24 PM (GMT) Europe/London
> Subject: Re: [Gluster-devel] Patch for "Striped" read from AFR volumes
>
> Hi Csibra,
>
> The patch contribution is really appreciated. I did not verify the
> correctness of
> the code but I can make out that you are doing RR of readv().
> But making read()s round-robin will decrease the performance (theoritically)
> as we wont be taking advantage of read ahead algorithm of the kernel. The
> better approach would be to make a file to be read from the same child
> everytime (even on the next open) but make different files to be read from
> different children. A good way of deciding the child to be read from is
> by (inode_number % child_count), this change is in the TLA repository. Could
> you test how your patch performs against the TLA source?
>
> check doc/translator-option.txt for the options of AFR (option read-subvolume)
>
> A better way to define striped reads would be: if a read request comes for 
> 1MB,
> get 0.5 MB from first child and 0.5MB from second child and combine the reads.
> However this way also we are not sure about the performance gain.
>
> Thanks
> Krishna
>
> On Dec 31, 2007 9:44 PM, Csibra Gergo <address@hidden> wrote:
> > Hi,
> >
> > apply following patch, to read AFR volumes like RAID0 volumes. The
> > current implementation of AFR reads every blocks from the first child
> > if that available. With this simple patch cycles through all available
> > childs. This meand every afr_readv calls reads from the next child
> > readed as previous call. So if U have 4 child, first block will be
> > readed from 1st next from 2nd next from 3rd next from 4th and starts
> > from first so next from 1st.
> >
> > to apply this patch
> > cd xlators/cluster/afr/src
> > patch -p0 <afr_striped_read_1.3.7.diff
> > make
> > make install
> >
> > patch also available here:
> > http://www.csibra.hu/glusterfs/afr_striped_read_1.3.7.diff
> >
> > as you see this patch against 1.3.7 version.
> >
> > here's the patch:
> > >>>>CUT HERE<<<<
> > *** /root/afr.c 2007-10-17 17:40:37.000000000 +0200
> > --- afr.c       2007-12-31 16:51:38.000000000 +0100
> > ***************
> > *** 2448,2453 ****
> > --- 2448,2469 ----
> >         if (afrfdp->fdstate[i])
> >           break;
> >         }
> > +       if(i == pvt->child_count) {
> > +         // if we reached the last child, test if maybe there're unreaded 
> > child
> > +         data_t *fr = dict_get(local->fd->ctx, "first_read");
> > +       if(fr) {
> > +         int32_t frd = data_to_int32(fr);
> > +         // frd contains the first child what readed
> > +         if(frd > 0) {
> > +           // if first readed child was not the first physical child, 
> > start child search again
> > +           i = 0;
> > +           for (; i < pvt->child_count; i++) {
> > +             if (afrfdp->fdstate[i])
> > +               break;
> > +           }
> > +         }
> > +       }
> > +       }
> >         if (i < pvt->child_count) {
> >                 STACK_WIND (frame,
> >                     afr_readv_cbk,
> > ***************
> > *** 2492,2501 ****
> >     local->size = size;
> >     local->fd = fd;
> >
> > !   for (i = 0; i < child_count; i++) {
> >       if (afrfdp->fdstate[i] && pvt->state[i])
> >         break;
> >     }
> >     if (i == child_count) {
> >       STACK_UNWIND (frame, -1, ENOTCONN, NULL, 0, NULL);
> >     } else {
> > --- 2508,2548 ----
> >     local->size = size;
> >     local->fd = fd;
> >
> > !   int32_t next_child, first_read = 0;
> > !   data_t *nxtc = dict_get(fd->ctx, "next_child");
> > !   if(nxtc) {
> > !     next_child = data_to_int32(nxtc);
> > !   } else {
> > !     next_child = -1;
> > !     first_read = 1;
> > !   }
> > !   next_child++;
> > !   if(next_child == child_count) {
> > !     next_child = 0;
> > !   }
> > !
> > !   for (i = next_child; i < child_count; i++) {
> >       if (afrfdp->fdstate[i] && pvt->state[i])
> >         break;
> >     }
> > +
> > +   if(i == child_count) {
> > +     i = 0;
> > +     for (i = 0; i < child_count; i++) {
> > +       if (afrfdp->fdstate[i] && pvt->state[i])
> > +       break;
> > +     }
> > +     if(i == child_count) {
> > +       next_child = 0;
> > +     } else {
> > +       next_child = i;
> > +     }
> > +   }
> > +   dict_set(fd->ctx, "next_child", data_from_int32(next_child));
> > +   if(first_read) {
> > +       dict_set(fd->ctx, "first_read", data_from_int32(i));
> > +   }
> > +
> >     if (i == child_count) {
> >       STACK_UNWIND (frame, -1, ENOTCONN, NULL, 0, NULL);
> >     } else {
> >
> >
> >
> > _______________________________________________
> > Gluster-devel mailing list
> > address@hidden
> > http://lists.nongnu.org/mailman/listinfo/gluster-devel
> >
>
>
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]