qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] Align file accesses with cache=off (O_DIRECT)


From: Laurent Vivier
Subject: Re: [Qemu-devel] [PATCH] Align file accesses with cache=off (O_DIRECT)
Date: Tue, 29 Apr 2008 17:48:24 +0200

Le mardi 29 avril 2008 à 16:49 +0200, Kevin Wolf a écrit :
> Hi Laurent,
> 
> Laurent Vivier schrieb:
> > But if we want too keep simplicity without memcpy(), we could only
> > de-activate O_DIRECT on pread() or pwrite() :
> 
> You're right, this approach is simpler, better and makes the patch 
> smaller. I've attached a new version of the patch.
> 
> However, now that you've pointed me to your patch I realize that my 
> patch might be simple but it is incomplete as well. Normal read/write 
> operation on qcow images should work fine (only metadata is unaligned 
> and there pread is used). Snapshots don't work though because they end 
> up in aio requests which are not routed to raw_pread.
> 
> Disabling O_DIRECT for a single aio request is impossible (after all, 
> aio is asynchronous), and disabling it for at least one aio request is 

Perhaps I'm wrong, but I think it is possible: the only consequence is
the asynchronous I/O becomes synchronous...

> going to be ugly. So maybe we better turn O_DIRECT off for snapsnot 
> saving/loading, even if it's not the generic fix I wanted to have when I 
> started.

I don't think it is a good idea:

In linux world, there are three reasons to use O_DIRECT:

1- to use linux AIO (not POSIX AIO).

2- to avoid a buffer copy between user- and kernel- space
(performance ?)

3- to increase reliability: by using O_DIRECT you are sure your data are
on the disk when the write is over and your system can now crash (if it
wants).

And I think reliability is better when the snapshot is being saved...

> I'm still undecided, though. What do you think?

Is it possible to align the last AIO ?

And see comments below

> Index: block-raw-posix.c
> ===================================================================
> --- block-raw-posix.c.orig
> +++ block-raw-posix.c
> @@ -77,6 +77,7 @@
>  typedef struct BDRVRawState {
>      int fd;
>      int type;
> +    int flags;
>      unsigned int lseek_err_cnt;
>  #if defined(__linux__)
>      /* linux floppy specific */
> @@ -95,6 +96,7 @@ static int raw_open(BlockDriverState *bs
>      BDRVRawState *s = bs->opaque;
>      int fd, open_flags, ret;
>  
> +    s->flags = flags;
>      s->lseek_err_cnt = 0;

I think you should store open_flags instead of flags (see below).

>      open_flags = O_BINARY;
> @@ -141,7 +143,14 @@ static int raw_open(BlockDriverState *bs
>  #endif
>  */
>  
> -static int raw_pread(BlockDriverState *bs, int64_t offset,
> +/* 
> + * offset and count are in bytes, but must be multiples of 512 for
> files 
> + * opened with O_DIRECT. buf must be aligned to 512 bytes then.
> + *
> + * This function may be called without alignment if the caller
> ensures
> + * that O_DIRECT is not in effect.
> + */
> +static int raw_pread_aligned(BlockDriverState *bs, int64_t offset,
>                       uint8_t *buf, int count)
>  {
>      BDRVRawState *s = bs->opaque;
> @@ -194,7 +203,14 @@ label__raw_read__success:
>      return ret;
>  }
>  
> -static int raw_pwrite(BlockDriverState *bs, int64_t offset,
> +/* 
> + * offset and count are in bytes, but must be multiples of 512 for
> files 
> + * opened with O_DIRECT. buf must be aligned to 512 bytes then.
> + *
> + * This function may be called without alignment if the caller
> ensures
> + * that O_DIRECT is not in effect.
> + */
> +static int raw_pwrite_aligned(BlockDriverState *bs, int64_t offset,
>                        const uint8_t *buf, int count)
>  {
>      BDRVRawState *s = bs->opaque;
> @@ -230,6 +246,69 @@ label__raw_write__success:
>      return ret;
>  }
>  
> +
> +#ifdef O_DIRECT
> +/* 
> + * offset and count are in bytes and possibly not aligned. For files
> opened 
> + * with O_DIRECT, necessary alignments are ensured before calling 
> + * raw_pread_aligned to do the actual read.
> + */
> +static int raw_pread(BlockDriverState *bs, int64_t offset,
> +                     uint8_t *buf, int count)
> +{
> +    BDRVRawState *s = bs->opaque;
> +
> +    if (unlikely((s->flags & BDRV_O_DIRECT) &&
> +            (offset % 512 != 0 || (uintptr_t) buf % 512))) {
> +
> +        int flags, ret;
> +
> +        // Temporarily disable O_DIRECT for unaligned access
> +        flags = fcntl(s->fd, F_GETFL);
> +        fcntl(s->fd, F_SETFL, flags & ~O_DIRECT);
> +        ret = raw_pread_aligned(bs, offset, buf, count);
> +        fcntl(s->fd, F_SETFL, flags);
> +
> +        return ret;

if you store open_flag instead of flags, you can do:

if (unlikely((s->open_flags & O_DIRECT) && 
                (offset % 512 || (uintptr_t) buf % 512))) {

        fcntl(s->fd, F_SETFL, s->open_flags & ~O_DIRECT);
        ret = raw_pread_aligned(bs, offset, buf, count);
        fcntl(s->fd, F_SETFL, s->open_flags);
}

> +    } else {
> +        return raw_pread_aligned(bs, offset, buf, count);
> +    }
> +}
> +
> +/* 
> + * offset and count are in bytes and possibly not aligned. For files
> opened 
> + * with O_DIRECT, necessary alignments are ensured before calling 
> + * raw_pwrite_aligned to do the actual write.
> + */
> +static int raw_pwrite(BlockDriverState *bs, int64_t offset,
> +                      const uint8_t *buf, int count)
> +{
> +    BDRVRawState *s = bs->opaque;
> +
> +    if (unlikely((s->flags & BDRV_O_DIRECT) &&
> +            (offset % 512 != 0 || (uintptr_t) buf % 512))) {
> +
> +        int flags, ret;
> +
> +        // Temporarily disable O_DIRECT for unaligned access
> +        flags = fcntl(s->fd, F_GETFL);
> +        fcntl(s->fd, F_SETFL, flags & ~O_DIRECT);
> +        ret = raw_pwrite_aligned(bs, offset, buf, count);
> +        fcntl(s->fd, F_SETFL, flags);

ditto

> +        return ret;
> +    } else {
> +        return raw_pwrite_aligned(bs, offset, buf, count);
> +    }
> +}
> +
> +#else
> +#define raw_pread raw_pread_aligned
> +#define raw_pwrite raw_pwrite_aligned
> +#endif
> +
> +
>  /***********************************************************/
>  /* Unix AIO using POSIX AIO */
>  
> 
Regards,
Laurent
-- 
------------- address@hidden ---------------
"The best way to predict the future is to invent it."
- Alan Kay





reply via email to

[Prev in Thread] Current Thread [Next in Thread]