qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Outreachy project task: Adding QEMU block layer APIs resembling Linu


From: Sam Li
Subject: Re: Outreachy project task: Adding QEMU block layer APIs resembling Linux ZBD ioctls.
Date: Thu, 2 Jun 2022 13:43:31 +0800

Hi Stefan,

Stefan Hajnoczi <stefanha@gmail.com> 于2022年6月1日周三 19:43写道:
>
> On Wed, 1 Jun 2022 at 06:47, Damien Le Moal
> <damien.lemoal@opensource.wdc.com> wrote:
> >
> > On 6/1/22 11:57, Sam Li wrote:
> > > Hi Stefan,
> > >
> > > Stefan Hajnoczi <stefanha@gmail.com> 于2022年5月30日周一 19:19写道:
> > >
> > >
> > >>
> > >> On Mon, 30 May 2022 at 06:09, Sam Li <faithilikerun@gmail.com> wrote:
> > >>>
> > >>> Hi everyone,
> > >>> I'm Sam Li, working on the Outreachy project which is to add zoned
> > >>> device support to QEMU's virtio-blk emulation.
> > >>>
> > >>> For the first goal, adding QEMU block layer APIs resembling Linux ZBD
> > >>> ioctls, I think the naive approach would be to introduce a new stable
> > >>> struct zbd_zone descriptor for the library function interface. More
> > >>> specifically, what I'd like to add to the BlockDriver struct are:
> > >>> 1. zbd_info as zone block device information: includes numbers of
> > >>> zones, size of logical blocks, and physical blocks.
> > >>> 2. zbd_zone_type and zbd_zone_state
> > >>> 3. zbd_dev_model: host-managed zbd, host-aware zbd
> > >>> With those basic structs, we can start to implement new functions as
> > >>> bdrv*() APIs for BLOCK*ZONE ioctls.
> > >>>
> > >>> I'll start to finish this task based on the above description. If
> > >>> there is any problem or something I may miss in the design, please let
> > >>> me know.
> > >>
> > >> Hi Sam,
> > >> Can you propose function prototypes for the new BlockDriver callbacks
> > >> needed for zoned devices?
> > >
> > > I have made some modifications based on Damien's device in design part
> > > 1 and added the function prototypes in design part 2. If there is any
> > > problem or part I missed, please let me know.
> > >
> > > Design of Block Layer APIs in BlockDriver:
> > > 1. introduce a new stable struct zbd_zone descriptor for the library
> > > function interface.
> > >   a. zbd_info as zone block device information: includes numbers of
> > > zones, size of blocks, write granularity in byte(minimal write size
> > > and alignment
> > >     - write granularity: 512e SMRs: writes in units of physical block
> > > size, 4096 bytes; NVMe ZNS write granularity is equal to the block
> > > size.
> > >     - zone descriptor: start, length, capacity, write pointer, zone type
> > >   b. zbd_zone_type
> > >     - zone type: conventional, sequential write required, sequential
> > > write preferred
> > >   c. zbd_dev_model: host-managed zbd, host-aware zbd
> >
> > This explanation is a little hard to understand. It seems to be mixing up
> > device level information and per-zone information. I think it would be a
> > lot simpler to write a struct definition to directly illustrate what you
> > are planning.
> >
> > It is something like this ?
> >
> > struct zbd_zone {
> >         enum zone_type  type;
> >         enum zone_cond  cond;
> >         uint64_t        start;
> >         uint32_t        length;
> >         uint32_t        cap;
> >         uint64_t        wp;
> > };
> >
> > strcut zbd_dev {
> >         enum zone_model model;
> >         uint32_t        block_size;
> >         uint32_t        write_granularity;
> >         uint32_t        nr_zones
> >         struct zbd_zone *zones; /* array of zones */
> > };
> >
> > If yes, then my comments are as follows.
> >
> > For the device struct: It may be good to have also the maximum number of
> > open zones and the maximum number of active zones.
> >
> > For the zone struct: You may need to add a read-write lock per zone to be
> > able to write lock zones to ensure a sequential write pattern (virtio
> > devices can be multi-queue and so writes may be coming in from different
> > contexts) and to correctly emulate zone append operations with an atomic
> > update of the wp field.
> >
> > These need to be integrated into the generic block driver interface in
> > include/block/block_int-common.h or include/block/block-common.h.
>
> QEMU's block layer has a few ways of exposing information about block devices:
>
>     int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi);
>     ImageInfoSpecific *(*bdrv_get_specific_info)(BlockDriverState *bs,
> Error **errp);
>
> These fetch information from the BlockDriver and are good when a small
> amount of data is reported occassionally and consumed by the caller.
>
> For data that is continuously accessed or that could be large, it may
> be necessary for the data to reside inside BlockDriverState so that it
> can be accessed in place (without copying):
>
>     void (*bdrv_refresh_limits)(BlockDriverState *bs, Error **errp);
>
> QEMU uses this for the BlockLimits struct (BlockDriverState::bl) that
> is continuously accessed by the block layer while processing I/O
> requests. The "refresh" function updates the data in case the
> underlying storage device has changed somehow. If no update function
> is necessary then data can simply be populated during .bdrv_open() and
> no new BlockDriver callback needs to be added.
>
> So in the simplest case BlockDriverState can be extended with a struct
> zbd_dev field that is populated during .bdrv_open(). If the
> BlockDriver doesn't support zones then the zbd_dev.nr_zones field is 0
> or the model field indicates that this is not a zoned storage device.
>
> However, a BlockBackend (not BlockDriverState!) API will be needed to
> expose this data to users like the hw/block/virtio-blk.c emulation
> code or the qemu-io-cmds.c utility that can be used for testing. A
> BlockBackend has a root pointer to a BlockDriverState graph (for
> example, qcow2 on top of file-posix). It will be necessary to
> propagate zoned storage information from the leaf BlockDriverState all
> the way up to the BlockBackend. In simple cases the BB root points
> directly to the file-posix BDS that has Linux ZBD support but the
> design needs to account for additional BDS graph nodes.

I think a simple way to think BlockBackend APIs is to use following
callbacks: blk_aio_zone_mgmt() -> blk_aio_prwv() +
blk_aio_zone_mgmt_entry() -> blk_co_do_zone_mgmt() -> blk_zone_mgmt().
The last function call will call bdrv_co_zone_mgmt() in
block/file-posix.c. If I understand the additional case correctly, the
BlockBackend API can expose the zone information to the virtio-blk
emulation now.

Besides, comparing blk_aio_flush() with blk_flush() in block-backend.c
which lead to include/block/block-io.h, we may need consider the case
when calling block layer API from non-coroutine context. Meanwhile,
using bdrv_co_writev()/bdrv_co_readv() instead of read-write lock per
zone may be a option too.

If there is any problem, please let me know.

Best regards,
Sam

>
> In order to make progress on this interface I suggest looking at the
> virtio-blk spec extension for zoned storage and thinking what the
> BlockBackend API should look like that hw/block/virtio-blk.c will use.
> Then it may be easier to decide how to report zone information from
> BlockDriverState.
>
>
> Stefan



reply via email to

[Prev in Thread] Current Thread [Next in Thread]