qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI


From: ronnie sahlberg
Subject: Re: [Qemu-devel] [PATCH] This patch adds a new block driver : iSCSI
Date: Thu, 15 Sep 2011 09:08:03 +1000

Stefan,

Thanks for your review.
I will do the changes you recommend and send an updated patch over the weekend.

Could you advice the best path for handling the .bdrv_flush  and the
blocksize questions?


I think it is reasonable to just not support iscsi at all for
blocksize that is not multiple of 512 bytes
since a read-modify-write cycle across a network would probably be
prohibitively expensive.

.bdrv_flush() I can easily add a synchronous implementation of this
unless your patch is expected to be merged
in the near future.


regards
ronnie sahlberg

On Mon, Sep 12, 2011 at 7:14 PM, Stefan Hajnoczi
<address@hidden> wrote:
> On Sat, Sep 10, 2011 at 02:23:30PM +1000, Ronnie Sahlberg wrote:
>
> Looking good.  I think this is worth merging because it does offer
> benefits over host iSCSI.
>
>> +static void
>> +iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void 
>> *command_data,
>> +                    void *private_data)
>> +{
>> +}
>> +
>> +static void
>> +iscsi_aio_cancel(BlockDriverAIOCB *blockacb)
>> +{
>> +    IscsiAIOCB *acb = (IscsiAIOCB *)blockacb;
>> +    IscsiLun *iscsilun = acb->iscsilun;
>> +
>> +    acb->status = -ECANCELED;
>> +    acb->common.cb(acb->common.opaque, acb->status);
>> +    acb->canceled = 1;
>> +
>> +    iscsi_task_mgmt_abort_task_async(iscsilun->iscsi, acb->task,
>> +                                     iscsi_abort_task_cb, NULL);
>> +}
>
> The asynchronous abort task call looks odd.  If a caller allocates a
> buffer and issues a read request, then we need to make sure that the
> request is really aborted by the time .bdrv_aio_cancel() returns.
>
> If I understand the code correctly, iscsi_aio_cancel() returns
> immediately but the read request will still be in progress.  That means
> the caller could now free their data buffer and the read request will
> overwrite that unallocated memory.
>

Right.
I will need to remove the read pdu from the local initiator side too
so that if the read is in flight and we receive data for it, that this
data is discarded
and not written into the possibly no longer valid buffers.
I will do this change in the next version of the patch.

>> +static void
>> +iscsi_aio_write10_cb(struct iscsi_context *iscsi, int status,
>> +                     void *command_data, void *opaque)
>> +{
>> +    IscsiAIOCB *acb = opaque;
>> +
>> +    trace_iscsi_aio_write10_cb(iscsi, status, acb, acb->canceled);
>> +
>> +    if (acb->buf != NULL) {
>> +        free(acb->buf);
>> +    }
>
> Please just use g_free(acb->buf).  g_free(NULL) is defined as a nop so
> the check isn't necessary.  Also, this code uses free(3) when it should
> use g_free(3).

Will do.

>
>> +
>> +    if (acb->canceled != 0) {
>> +        qemu_aio_release(acb);
>> +        scsi_free_scsi_task(acb->task);
>> +        acb->task = NULL;
>> +        return;
>> +    }
>> +
>> +    acb->status = 0;
>> +    if (status < 0) {
>> +        error_report("Failed to write10 data to iSCSI lun. %s",
>> +                     iscsi_get_error(iscsi));
>> +        acb->status = -EIO;
>> +    }
>> +
>> +    iscsi_schedule_bh(iscsi_readv_writev_bh_cb, acb);
>> +    scsi_free_scsi_task(acb->task);
>> +    acb->task = NULL;
>> +}
>> +
>> +static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
>> +{
>> +    return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
>> +}
>> +
>> +static BlockDriverAIOCB *
>> +iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
>> +                 QEMUIOVector *qiov, int nb_sectors,
>> +                 BlockDriverCompletionFunc *cb,
>> +                 void *opaque)
>> +{
>> +    IscsiLun *iscsilun = bs->opaque;
>> +    struct iscsi_context *iscsi = iscsilun->iscsi;
>> +    IscsiAIOCB *acb;
>> +    size_t size;
>> +    int fua = 0;
>> +
>> +    /* set FUA on writes when cache mode is write through */
>> +    if (!(bs->open_flags & (BDRV_O_CACHE_WB|BDRV_O_NOCACHE))) {
>> +        fua = 1;
>> +    }
>
> FUA needs to reflect the guest semantics - does this disk have an
> enabled write cache?  When bs->open_flags has BDRV_O_CACHE_WB, then the
> guest knows it needs to send flushes because there is a write cache:
>
> if (!(bs->open_flags & BDRV_O_CACHE_WB)) {
>    fua = 1;
> }
>
> BDRV_O_NOCACHE is just for local files and sets the O_DIRECT hint.  It
> doesn't affect the cache semantics that the guest sees.

Thanks.  I'll do this change.

>
>> +/*
>> + * We support iscsi url's on the form
>> + * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
>> + */
>> +static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
>> +{
>> +    IscsiLun *iscsilun = bs->opaque;
>> +    struct iscsi_context *iscsi = NULL;
>> +    struct iscsi_url *iscsi_url = NULL;
>> +    struct IscsiTask task;
>> +    int ret;
>> +
>> +    if ((BDRV_SECTOR_SIZE % 512) != 0) {
>> +        error_report("iSCSI: Invalid BDRV_SECTOR_SIZE. "
>> +                     "BDRV_SECTOR_SIZE(%lld) is not a multiple "
>> +                     "of 512", BDRV_SECTOR_SIZE);
>> +        return -EINVAL;
>> +    }
>
> Another way of saying this is:
> QEMU_BUILD_BUG_ON(BDRV_SECTOR_SIZE % 512 != 0);
>
> The advantage is that the build fails instead of waiting until iscsi is
> used at runtime until the failure is detected.

I can add this. But is it better to fail the whole build than to
return an error at runtime if/when iscsi is used?

>
> What will happen if BDRV_SECTOR_SIZE is not a multiple of 512?

iscsi will not work.
I dont think it is worth implementing a read-modify-write cycle for iscsi.
Performance would be so poor for this case that I think it is better
to just not support it at all.

>
>> +
>> +    memset(iscsilun, 0, sizeof(IscsiLun));
>> +
>> +    /* Should really append the KVM name after the ':' here */
>> +    iscsi = iscsi_create_context("iqn.2008-11.org.linux-kvm:");
>> +    if (iscsi == NULL) {
>> +        error_report("iSCSI: Failed to create iSCSI context.");
>> +        ret = -ENOMEM;
>> +        goto failed;
>> +    }
>> +
>> +    iscsi_url = iscsi_parse_full_url(iscsi, filename);
>> +    if (iscsi_url == NULL) {
>> +        error_report("Failed to parse URL : %s %s", filename,
>> +                     iscsi_get_error(iscsi));
>> +        ret = -ENOMEM;
>
> -EINVAL?

Will do.

>
>> +static BlockDriver bdrv_iscsi = {
>> +    .format_name     = "iscsi",
>> +    .protocol_name   = "iscsi",
>> +
>> +    .instance_size   = sizeof(IscsiLun),
>> +    .bdrv_file_open  = iscsi_open,
>> +    .bdrv_close      = iscsi_close,
>> +
>> +    .bdrv_getlength  = iscsi_getlength,
>> +
>> +    .bdrv_aio_readv  = iscsi_aio_readv,
>> +    .bdrv_aio_writev = iscsi_aio_writev,
>> +    .bdrv_aio_flush  = iscsi_aio_flush,
>
> block.c does not emulate .bdrv_flush() using your .bdrv_aio_flush()
> implementation.  I have sent a patch to add this emulation.  This will
> turn bdrv_flush(iscsi_bs) into an actual operation instead of a nop :).

Should I add a synchronous .bdrv_flush ? That is very easy for me to add.
Or should I just wait for your patch to be merged ?


>
>> diff --git a/trace-events b/trace-events
>> index 3fdd60f..4e461e5 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -494,3 +494,10 @@ escc_sunkbd_event_in(int ch) "Untranslated keycode 
>> %2.2x"
>>  escc_sunkbd_event_out(int ch) "Translated keycode %2.2x"
>>  escc_kbd_command(int val) "Command %d"
>>  escc_sunmouse_event(int dx, int dy, int buttons_state) "dx=%d dy=%d 
>> buttons=%01x"
>> +
>> +# block/iscsi.c
>> +disable iscsi_aio_write10_cb(void *iscsi, int status, void *acb, int 
>> canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_writev(void *iscsi, int64_t sector_num, int nb_sectors, 
>> void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque 
>> %p acb %p"
>> +disable iscsi_aio_read10_cb(void *iscsi, int status, void *acb, int 
>> canceled) "iscsi %p status %d acb %p canceled %d"
>> +disable iscsi_aio_readv(void *iscsi, int64_t sector_num, int nb_sectors, 
>> void *opaque, void *acb) "iscsi %p sector_num %"PRId64" nb_sectors %d opaque 
>> %p acb %p"
>
> It is no longer necessary to prefix trace event definitions with
> "disable".

Will fix this.

>
> The stderr and simple backends now start up with all events disabled by
> default.  The set of events can be set using the -trace
> events=<events-file> option or on the QEMU monitor using set trace-event
> <name> on.
>
> Stefan
>



reply via email to

[Prev in Thread] Current Thread [Next in Thread]