qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI


From: ronnie sahlberg
Subject: Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI
Date: Wed, 21 Sep 2011 19:48:22 +1000

Stefan,
Thanks for your review.
I feel that iscsi would be beneficial for several usecases.

I have implemented all changes you pointed out, except two, and resent
an updated patch to the list.
Please see below.


regards
ronnie sahlberg


On Mon, Sep 12, 2011 at 7:14 PM, Stefan Hajnoczi
<address@hidden> wrote:
> On Sat, Sep 10, 2011 at 02:23:30PM +1000, Ronnie Sahlberg wrote:
>
> Looking good.  I think this is worth merging because it does offer
> benefits over host iSCSI.
>
>> +static void
>> +iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
>> *command_data,
>> +                    void *private_data)
>> +{
>> +}
>> +
>> +static void
>> +iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
>> +{
>> +    IscsiAIOCB *acb = (IscsiAIOCB *)blockacb;
>> +    IscsiLun *iscsilun = acb->iscsilun;
>> +
>> +    acb->status = -ECANCELED;
>> +    acb->common.cb(acb->common.opaque, acb->status);
>> +    acb->canceled = 1;
>> +
>> +    iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
>> +                                     iscsi_abort_task_cb, NULL);
>> +}
>
> The asynchronous abort task call looks odd.  If a caller allocates a
> buffer and issues a read request, then we need to make sure that the
> request is really aborted by the time .bdrv_aio_cancel() returns.
>
> If I understand the code correctly, iscsi_aio_cancel() returns
> immediately but the read request will still be in progress.  That means
> the caller could now free their data buffer and the read request will
> overwrite that unallocated memory.

You are right.
I have added a new function to libiscsi that will also release the
task structure from libiscsi as well.
So we should no longer have a window where we might have a cancelled
task in flight writing the data to an already released buffer once the
cancelled data buffers start arriving on the socket.

>
>> +static void
>> +iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
>> +                     void *command_data, void *opaque)
>> +{
>> +    IscsiAIOCB *acb = opaque;
>> +
>> +    trace_iscsi_aio_write10_cb(iscsi, status, acb, acb->canceled);
>> +
>> +    if (acb->buf != NULL) {
>> +        free(acb->buf);
>> +    }
>
> Please just use g_free(acb->buf).  g_free(NULL) is defined as a nop so
> the check isn't necessary.  Also, this code uses free(3) when it shoulde.
> use g_free(3).
>

Done.

>> +
>> +    if (acb->canceled != 0) {
>> +        qemu_aio_release(acb);
>> +        scsi_free_scsi_task(acb->task);
>> +        acb->task = NULL;
>> +        return;
>> +    }
>> +
>> +    acb->status = 0;
>> +    if (status < 0) {
>> +        error_report("Failed to write10 data to iSCSI lun. %s",
>> +                     iscsi_get_error(iscsi));
>> +        acb->status = -EIO;
>> +    }
>> +
>> +    iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
>> +    scsi_free_scsi_task(acb->task);
>> +    acb->task = NULL;
>> +}
>> +
>> +static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
>> +{
>> +    return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
>> +}
>> +
>> +static BlockDriverAIOCB *
>> +iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
>> +                 QEMUIOVector *qiov, int nb_sectors,
>> +                 BlockDriverCompletionFunc *cb,
>> +                 void *opaque)
>> +{
>> +    IscsiLun *iscsilun = bs->opaque;
>> +    struct iscsi_context *iscsi = iscsilun->iscsi;
>> +    IscsiAIOCB *acb;
>> +    size_t size;
>> +    int fua = 0;
>> +
>> +    /* set FUA on writes when cache mode is write through */
>> +    if (!(bs->open_flags & (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))) {
>> +        fua = 1;
>> +    }
>
> FUA needs to reflect the guest semantics - does this disk have an
> enabled write cache?  When bs->open_flags has BDRV_O_CACHE_WB, then the
> guest knows it needs to send flushes because there is a write cache:
>
> if (!(bs->open_flags & BDRV_O_CACHE_WB)) {
>    fua = 1;
> }
>
> BDRV_O_NOCACHE is just for local files and sets the O_DIRECT hint.  It
> doesn't affect the cache semantics that the guest sees.
>

Done. When I first started building the patch a while ago, both flags
were needed. I missed when the second flag became obsolete upstream.


>> +/*
>> + * We support iscsi url's on the form
>> + * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
>> + */
>> +static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
>> +{
>> +    IscsiLun *iscsilun = bs->opaque;
>> +    struct iscsi_context *iscsi = NULL;
>> +    struct iscsi_url *iscsi_url = NULL;
>> +    struct IscsiTask task;
>> +    int ret;
>> +
>> +    if ((BDRV_SECTOR_SIZE % 512) != 0) {
>> +        error_report("iSCSI: Invalid BDRV_SECTOR_SIZE. "
>> +                     "BDRV_SECTOR_SIZE(%lld) is not a multiple "
>> +                     "of 512", BDRV_SECTOR_SIZE);
>> +        return -EINVAL;
>> +    }
>
> Another way of saying this is:
> QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE % 512 != 0);
>
> The advantage is that the build fails instead of waiting until iscsi is
> used at runtime until the failure is detected.
>
> What will happen if BDRV_SECTOR_SIZE is not a multiple of 512?
>

I did not do this change.
If blocksize is not multiple of 512,  the iscsi backend will just
refuse to work since having a read-modify-write cycle across the
network would be extremely costly.
I dont think iscsi should cause the entire build to fail.

While it is difficult to imagine a different blocksize, it is not
impossible I guess.
If someone comes up with a non-512-multiple blocksize and a good
reason for one, I dont want to be the guy that broke the build.


>> +
>> +    memset(iscsilun, 0, sizeof(IscsiLun));
>> +
>> +    /* Should really append the KVM name after the ':' here */
>> +    iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:");
>> +    if (iscsi == NULL) {
>> +        error_report("iSCSI: Failed to create iSCSI context.");
>> +        ret = -ENOMEM;
>> +        goto failed;
>> +    }
>> +
>> +    iscsi_url = iscsi_parse_full_url(iscsi, filename);
>> +    if (iscsi_url == NULL) {
>> +        error_report("Failed to parse URL : %s %s", filename,
>> +                     iscsi_get_error(iscsi));
>> +        ret = -ENOMEM;
>
> -EINVAL?

Done.

>
>> +static BlockDriver bdrv_iscsi = {
>> +    .format_name     = "iscsi",
>> +    .protocol_name   = "iscsi",
>> +
>> +    .instance_size   = sizeof(IscsiLun),
>> +    .bdrv_file_open  = iscsi_open,
>> +    .bdrv_close      = iscsi_close,
>> +
>> +    .bdrv_getlength  = iscsi_getlength,
>> +
>> +    .bdrv_aio_readv  = iscsi_aio_readv,
>> +    .bdrv_aio_writev = iscsi_aio_writev,
>> +    .bdrv_aio_flush  = iscsi_aio_flush,
>
> block.c does not emulate .bdrv_flush() using your .bdrv_aio_flush()
> implementation.  I have sent a patch to add this emulation.  This will
> turn bdrv_flush(iscsi_bs) into an actual operation instead of a nop :).

I did not implement this since I understood from the discussion that a
patch is imminent and it is required for existing backend(s?) anyway.


>
>> diff --git a/trace-events b/trace-events
>> index 3fdd60f..4e461e5 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -494,3 +494,10 @@ escc_sunkbd_event_in(int ch) "Untranslated keycode 
>> %2.2x"
>>  escc_sunkbd_event_out(int ch) "Translated keycode %2.2x"
>>  escc_kbd_command(int val) "Command %d"
>>  escc_sunmouse_event(int dx, int dy, int buttons_state) "dx=%d dy=%d 
>> buttons=%01x"
>> +
>> +# block/iscsi.c
>> +disable iscsi_aio_write10_cb(void *iscsi, int status, void *acb, int 
>> canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_writev(void *iscsi, int64_t sector_num, int nb_sectors, 
>> void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque 
>> %p acb %p"
>> +disable iscsi_aio_read10_cb(void *iscsi, int status, void *acb, int 
>> canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_readv(void *iscsi, int64_t sector_num, int nb_sectors, 
>> void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque 
>> %p acb %p"
>
> It is no longer necessary to prefix trace event definitions with
> "disable".

Done.


>
> The stderr and simple backends now start up with all events disabled by
> default.  The set of events can be set using the -trace
> events=<events-file> option or on the QEMU monitor using set trace-event
> <name> on.
>
> Stefan
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]