qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic


From: Paolo Bonzini
Subject: Re: [Qemu-devel] [PATCH 1/9] dma-helpers: Expose the sg mapping logic
Date: Thu, 25 Feb 2016 11:22:05 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.5.0


On 25/02/2016 11:10, Alex Pyrgiotis wrote:
> All normal regions in a QEMUSGList point to an address range in the
> guest's RAM. The MMIO regions of QEMU's virtual devices, however, do not
> correspond to such an address range, so QEMU must create a bounce buffer
> to represent them. This bounce buffer is added in the I/O vector which
> contains the rest of the mapped addresses and is later given to a
> readv()/writev() call.

Correct.

>>> 9. This leads to a partial read/write and the mapping loop will resume
>>>    once the partial read/write() has finished.
> 
> The MMIO region is the trigger for a partial read/write, but it's not
> the actual reason. The actual reason is that there is only *one*
> *global* bounce buffer. This means that if it's in use it or we
> need to use it twice, we will have to wait.

Yes.

>>>> However, it is not possible to do the same for ioctls.  This is actually
>>>> the reason why no one has ever tried to make scsi-generic do anything
>>>> but bounce-buffering. I think that your code breaks horribly in this
>>>> case, and I don't see a way to fix it, except for reverting to bounce
>>>> buffering.
> 
> Sure, you're right, there's no sensible way to break an ioctl()
> operation in many. However, I'd argue that we shouldn't need to, as it
> would be much better if the DMA operation was never restarted in the
> first place. Instead, if we dealt with the bigger issue of the global
> bounce buffer, we could kill two birds with one stone.
> 
> I see that there was an attempt [1] to replace the global bounce buffer
> with something more dynamic. In short, the objections [2] were the
> following:
> 
> 1. It introduced locking/unlocking a global mutex in the hot path as
>    well as a hash table lookup.
> 2. It allowed for unbounded memory allocations.
> 
> An improvement that would address (1) is to get rid of any global state:
> 
> Since the mapping operation takes place in the context of a DMA
> operation, we could provide a ctx-type struct to the dma_memory_(un)map
> --> address_space_(un)map functions that would contain a hash table. If
> any memory allocations were needed, we would track them using that hash
> table, which would require no locks. Moreover, if the initialization of
> the hash table hurts the performance in the general case, we could use
> instead a skip list, if the number of memory allocations is small (e.g.
> < 100).

You don't need a hash table either if you manage the bounce buffer list
per DMA transfer, and the simplest way to achieve that is to move the
bounce buffer from exec.c to dma-helpers.c entirely.

The patch could first introduce address_space_map_direct that never uses
the bounce buffer.  dma-helpers.c can call address_space_map_direct and,
if it fails, proceed to allocate (and fill if writing to the device) a
bounce buffer.  Since the QEMUSGList is mapped and unmapped
beginning-to-end, you can just use a FIFO queue.  The FIFO queue stores
a (QEMUSGList, buffer) tuple.  When unmapping a QEMUSGList you check if
it matches the head of the queue; if it does, you write back the
contents of the bounce buffer (for reads from the device) and free it.
If it doesn't match, you call address_space_unmap.

Then, once the bounce buffer is implemented within dma-helpers.c, you
remove address_space_map and rename address_space_map_direct to
address_space_map.  cpu_register_map_client goes away.

The unbounded memory allocation can be avoided by bounding the number of
entries in the queue.  In the case of scsi-generic you could just as
well allow INT_MAX entries, because scsi-generic would do unbounded
memory allocation anyway for the bounce buffer.

Modulo the "& BDRV_SECTOR_MASK" issue, this actually seems simpler than
what this series was doing.

Paolo



reply via email to

[Prev in Thread] Current Thread [Next in Thread]