[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
From: |
Alex Pyrgiotis |
Subject: |
Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic |
Date: |
Fri, 19 Feb 2016 13:50:00 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.1 |
Hi Paolo,
Sorry for the late reply and thanks for the review. See my comments inline:
On 02/11/2016 01:17 PM, Paolo Bonzini wrote:
>
>
> On 16/12/2015 17:55, Alex Pyrgiotis wrote:
>> +/*
>> + * Create a QEMUIOVector from a scatter-gather list.
>> + *
>> + * This function does not copy the data of the scatter-gather list.
>> Instead, it
>> + * uses the dma_memory_map() function to map physical memory regions of the
>> + * virtual device (as interpreted by the guest kernel) into the address
>> space
>> + * of the QEMU process, in order to have access to the data.
>> + */
>> +static void dma_map_sg(DMAAIOCB *dbs)
>
> In special cases where the QEMUSGList includes MMIO regions, dma_map_sg
> might not be able to map the whole list. In this case, for regular I/O
> it is possible to break the operation in multiple steps---in fact, this
> breaking of requests is the main purpose of most of the code in
> dma-helpers.c.
You're right. I've written a comment about that yet, somehow, the
implications of rescheduling a DMA operation flew over my head. Great.
So, I've tried to grasp the current situation and this is what I've got
so far. Please correct me where I'm wrong.
Guest kernel space:
1. The guest kernel wants to send a SCSI request to a (para)virtual SCSI
controller.
2. This SCSI request involves a list of physical address ranges.
Whether the request only refers to a single contiguous physical
range, or whether it is a scatter-gather list depends on the
implementation of the kernel's Low-level Device Driver (LLDD) and
the capabilities of the SCSI controller.
3. Some of these ranges may include MMIO regions. Since I'm not aware
of a scenario where this happens, can you provide an example?
4. The LLDD writes the SCSI CDB to the SCSI controller's queue and
informs the SCSI controller by writing to some sort of doorbell
register.
QEMU/Hardware space:
5. The SCSI controller code will create a QEMUSGList that points to
the memory regions of the SCSI request. This QEMUSGList will also
include the MMIO regions.
6. The QEMU device implementation, e.g. scsi-block, chooses to use
the dma_* interface.
7. The dma_blk_read/write() code will ultimately attempt to map all the
memory regions pointed by the QEMUSGList in order to create a
QEMUIOVector.
8. At some point during the mapping loop, the code will encounter an
MMIO region. Since reading and writing from/to an MMIO region
requires special handling, e.g., we need to call
MemoryRegion->ops->write(), we cannot include it in our read/write
system call to the host kernel.
9. This leads to a partial read/write and the mapping loop will resume
once the partial read/write() has finished.
Are we in the same page so far? If so, let's dig into the
dma_blk_read/write() logic. The following is a simplified pseudocode:
allocate QEMUIOVector
callback:
# Reset any changes to QEMUIOVector from previous operations
for address in QEMUIOVector:
unmap address
reset QEMUIOVector
# Check if the previous IO operation was the last one and if so,
# call the completion callback of the user and return.
if operation is completed:
dma_complete()
return
offset = adjust read/write offset
# Map each sg of QEMUSGList and add it to the QEMUIOVector
for sg in QEMUSGList:
address = dma_memory_map(sg)
if address == NULL:
break
add address to QEMUIOVector
# When either of the following operations have finished, restart
# this DMA operation.
if nothing mapped:
# If we can't map anything, we need to retry
create reschedule_dma callback
cpu_register_map_client(callback)
else:
# Else, we can perform a partial read/write()
do readv/writev(offset, QEMUIOVector, callback)
Are the above OK? If so, I have some questions:
a) Is an MMIO region one of the reasons why we can't map an sg?
b) At which point will the relevant ops->write() method for the MMIO
region be called when we have to DMA into the region?? Is it handled
implicitly in dma_memory_map()?
c) I'm not quite sure about the logic of the "nothing mapped" section.
Correct me if I'm wrong, but what I think it does is that it
registers a callback (reschedule_dma) once some sort of mapping has
completed. What kind of mapping is this? Is there anything more to
it?
> However, it is not possible to do the same for ioctls. This is actually
> the reason why no one has ever tried to make scsi-generic do anything
> but bounce-buffering. I think that your code breaks horribly in this
> case, and I don't see a way to fix it, except for reverting to bounce
> buffering.
>
> This would require major changes in your patches, and I'm not sure
> whether they are worth it for the single use case of tape devices...
Well, I wouldn't narrow it down to tape devices. The scsi-generic
interface is the universal interface for all SCSI devices and the only
interface that is fully passthrough. So this patch would effectively
boost the baseline performance of SCSI devices. I think it's worth a try.
Thanks,
Alex