qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1


From: Andrea Arcangeli
Subject: Re: [Qemu-devel] [RFC 1/2] pci-dma-api-v1
Date: Sun, 30 Nov 2008 19:04:08 +0100

On Fri, Nov 28, 2008 at 09:03:06PM +0200, Blue Swirl wrote:
> There's also lio_listio that provides for vectored AIO.

Discussed this in answer to Jamie, basically no LIO_READV/WRITEV, no
way to submit 'struct iovec' to the kernel with it still, which is a
must to perform with cache=off.

> >  > Anthony's second version:
> >  > http://lists.gnu.org/archive/html/qemu-devel/2008-04/msg00077.html

Actually this version of the emulated bdrv_writev/readv should run
faster thanks to malloc+memcpy instead of not having any memcpy and
running more syscalls. I opted for an emulated bdrv_aio_readv/writev
that does true zerocopy. But it doesn't make a whole lot of difference
as neither one should run on any host kernel supporting readv/writev
syscalls, this is just to we can test the rest of the zerocopy dma
api. The bdrv_aio_readv/writev support has to be in a separated from
the pci dma api anyway and surely I intend to reject my version of
bdrv_aio_readv/writev as I think all qemu targets supports at least
pthread posix API and readv/writev sycsalls allowing not having to do
hacks like my current _em.

> Perhaps you could point out why the previous attempts failed, but
> yours won't? ;-)

One can always hope to be more lucky? ;)

Seriously, just try to apply my last patch to your qemu tree (kvm
rejects in Makefile.target ppc section but it'll work on kvm too for
x86* targets) and try to test it. As an example also look at the below
IDE code, much of an improvement compared to current code IMHO and it
leaves all aiocb knowledge outside of the dma API itself, as it has to be.

static int build_dma_sg(BMDMAState *bm)
{
    struct {
        uint32_t addr;
        uint32_t size;
    } prd;
    int len;
    int idx;

    for (idx = 1; idx <= IDE_DMA_BUF_SECTORS; idx++) {
        cpu_physical_memory_read(bm->cur_addr, (uint8_t *)&prd, 8);
        bm->cur_addr += 8;
        bm->sg[idx-1].addr = le32_to_cpu(prd.addr);
        prd.size = le32_to_cpu(prd.size);
        len = prd.size & 0xfffe;
        if (len == 0)
            len = 0x10000;
        bm->sg[idx-1].len = len;
        /* end of table (with a fail safe of one page) */
        if ((prd.size & 0x80000000) ||
            (bm->cur_addr - bm->addr) >= 4096)
            break;
    }
    if (idx > IDE_DMA_BUF_SECTORS)
        printf("build_dma_sg: too many sg entries\n");
    return idx;
}

static void ide_dma_complete(void *opaque, int ret)
{
    BMDMAState *bm = opaque;
    IDEState *s = bm->ide_if;

    bm->bdrv_aio_iov = NULL;
    bm->ide_if = NULL;
    bm->aiocb = NULL;
    /* end of transfer ? */
    if (s->nsector == 0 && !ret) {
        s->status = READY_STAT | SEEK_STAT;
        ide_set_irq(s);
        bm->status &= ~BM_STATUS_DMAING;
        bm->status |= BM_STATUS_INT;
    } else {
        ide_dma_error(s);
        printf("ide_dma_complete error: nsector %d err %d\n", s->nsector, ret);
    }
}

static int ide_dma_submit(void *opaque, struct iovec *dma_iov,
                          int iovcnt, size_t len,
                          BlockDriverCompletionFunc dma_cb,
                          void *dma_cb_param)
{
    BMDMAState *bm = opaque;
    IDEState *s = bm->ide_if;
    size_t sectors;
    int64_t sector_num;

    sectors = len >> 9;
    if (s->nsector < sectors)
        return -3000;
    sector_num = ide_get_sector(s);
    ide_set_sector(s, sector_num  + sectors);
    s->nsector -= sectors;

#ifdef DEBUG_AIO
    printf("ide_dma_submit_write: sector_num=%lld n=%d\n", sector_num, sectors);
#endif
    bm->aiocb = bm->bdrv_aio_iov(s->bs, sector_num, dma_iov, iovcnt, len,
                                 dma_cb, dma_cb_param);
    if (!bm->aiocb)
        return -3001;

    return 0;
}




reply via email to

[Prev in Thread] Current Thread [Next in Thread]