gluster-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gluster-devel] Change in glusterfs[master]: bd: posix/multi-brick s


From: Anand Avati
Subject: Re: [Gluster-devel] Change in glusterfs[master]: bd: posix/multi-brick support to BD xlator
Date: Thu, 05 Sep 2013 16:50:17 -0700
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130329 Thunderbird/17.0.5

On 09/01/2013 11:26 AM, M. Mohan Kumar (Code Review) wrote:
Hello Anand Avati, Gluster Build System,

I'd like you to reexamine a change.  Please visit

     http://review.gluster.org/4809

to look at the new patch set (#4).

Change subject: bd: posix/multi-brick support to BD xlator
......................................................................

bd: posix/multi-brick support to BD xlator

Current BD xlator (block backend) has a few limitations such as
* Creation of directories not supported
* Supports only single brick
* Does not use extended attributes (and client gfid) like posix xlator
* Creation of special files (symbolic links, device nodes etc) not
supported

Basic limitation of not allowing directory creation is blocking
oVirt/VDSM to consume BD xlator as part of Gluster domain since VDSM
creates multi-level directories when GlusterFS is used as storage
backend for storing VM images.

To overcome these limitations a new BD xlator with following
improvements is suggested.

* New hybrid BD xlator that handles both regular files and block device files
* The volume will have both POSIX and BD bricks. Regular files are
   created on POSIX bricks, block devices are created on the BD brick (VG)
* BD xlator leverages exiting POSIX xlator for most POSIX calls and
   hence sits above the POSIX xlator
* Block device file is differentiated from regular file by an extended attribute
* The xattr 'user.glusterfs.bd' (BD_XATTR) plays a role in mapping a
   posix file to Logical Volume (LV).
* When a client sends a request to set BD_XATTR on a posix file, a new
   LV is created and mapped to posix file. So every block device will
   have a representative file in POSIX brick with 'user.glusterfs.bd'
   (BD_XATTR) and 'user.glusterfs.bd.size' (BD_XATTR_SIZE) set.
* Here after all operations on this file results in LV related operations.

New BD xlator code is placed in xlators/storage/bd directory.

For example opening a file that has BD_XATTR_PATH set results in opening
the LV block device, reading results in reading the corresponding LV block
device.

When BD xlator gets request to set BD_XATTR via setxattr call, it
creates a LV and information about this LV is placed in the xattr of the
posix file. xattr "user.glusterfs.bd", "user.glusterfs.bd.size" used to
identify that posix file is mapped to BD.

Usage:
Server side:
address@hidden ~]# gluster volume create bdvol device vg 
host1:/storage/vg1_info?vg1 host2:/storage/vg2_info?vg2
It creates a distributed gluster volume 'bdvol' with Volume Group vg1
using posix brick /storage/vg1_info in host1 and Volume Group vg2 using
/storage/vg2_info in host2.

address@hidden ~]# gluster volume start bdvol

Client side:
address@hidden ~]# mount -t glusterfs host1:/bdvol /media
address@hidden ~]# touch /media/posix
It creates regular posix file 'posix' in either host1:/vg1 or host2:/vg2
brick

address@hidden ~]# mkdir /media/image
address@hidden ~]# touch /media/image/lv1
It also creates regular posix file 'lv1' in either host1:/vg1 or
host2:/vg2 brick

address@hidden ~]# setfattr -n "user.glusterfs.bd" -v "lv" /media/image/lv1
address@hidden ~]#
Above setxattr results in creating a new LV in corresponding brick's VG
and it sets 'user.glusterfs.bd' with value 'lv' and
'user.glusterfs.size' with default extent size.

address@hidden ~]# truncate -s5G /media/image/lv1
It results in resizig LV 'lv1'to 5G

Changes from previous version V3:
* Added support in FUSE to support full/linked clone
* Added support to merge snapshots and provide information about origin
* bd_map xlator removed
* iatt structure used in inode_ctx. iatt is cached and updated during
fsync/flush
* aio support
* Type and capabilities of volume are exported through getxattr

Changes from version 2:
* Used inode_context for caching BD size and to check if loc/fd is BD or
   not.
* Added GlusterFS server offloaded copy and snapshot through setfattr
   FOP. As part of this libgfapi is modified.
* BD xlator supports stripe
* During unlinking if a LV file is already opened, its added to delete
   list and bd_del_thread tries to delete from this list when a last
   reference to that file is closed.

Changes from previous version:
* gfid is used as name of LV
* ? is used to specify VG name for creating BD volume in volume
   create, add-brick. gluster volume create volname host:/path?vg
* open-behind issue is fixed
* A replicate brick can be added dynamically and LVs from source brick are
   replicated to destination brick
* A distribute brick can be added dynamically and rebalance operation
   distributes existing LVs/files to the new brick
* Thin provisioning support added.
* bd_map xlator support retained
* setfattr -n user.glusterfs.bd -v "lv" creates a regular LV and
   setfattr -n user.glusterfs.bd -v "thin" creates thin LV
* Capability and backend information added to gluster volume info (and --xml) so
   that management tools can exploit BD xlator.
* tracing support for bd xlator added

TODO:
* Add support to display snapshots for a given LV
* Display posix filename for list-origin instead of gfid

Change-Id: I00d32dfbab3b7c806e0841515c86c3aa519332f2
Signed-off-by: M. Mohan Kumar <address@hidden>
---
M configure.ac
M xlators/storage/Makefile.am
A xlators/storage/bd/Makefile.am
A xlators/storage/bd/src/Makefile.am
A xlators/storage/bd/src/bd-helper.c
A xlators/storage/bd/src/bd.c
A xlators/storage/bd/src/bd.h
7 files changed, 638 insertions(+), 1 deletion(-)


   git pull ssh://git.gluster.org/glusterfs refs/changes/09/4809/4



Mohan,
In general, other than the specific comments in the various patches, we should probably squash some of the patches into a smaller set (from 15) -

0 - remove old bd_map xlator
1 - implement basic new bd xlator (include everything not listed below)
2 - other translators' changes to support BD
3 - add snapshot/clone support
4 - add aio support

There are a lot of instances in the patch set which are an earlier patch does things a certain way and a later patch changes it. All this seems quite redundant and hard to review for a new feature which is adding code from scratch.

Avati




reply via email to

[Prev in Thread] Current Thread [Next in Thread]