qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH 3/5] Implement fw_cfg DMA interface


From: Laszlo Ersek
Subject: Re: [Qemu-devel] [PATCH 3/5] Implement fw_cfg DMA interface
Date: Thu, 6 Aug 2015 22:49:12 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.1.0

On 08/06/15 13:01, Marc Marí wrote:
> Based on the specifications on docs/specs/fw_cfg.txt
> 
> This interface is an addon. The old interface can still be used as usual.
> 
> For x86 arch, this addon will use one extra port (0x512). For ARM, it will
> use 8 more bytes, following the actual implementation.
> 
> Based on Gerd Hoffman's initial implementation.
> 
> Signed-off-by: Marc Marí <address@hidden>
> ---
>  hw/arm/virt.c             |   2 +-
>  hw/nvram/fw_cfg.c         | 212 
> +++++++++++++++++++++++++++++++++++++++++++---
>  include/hw/nvram/fw_cfg.h |  12 ++-
>  3 files changed, 211 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 4846892..374660c 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -612,7 +612,7 @@ static void create_fw_cfg(const VirtBoardInfo *vbi)
>      hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
>      char *nodename;
>  
> -    fw_cfg_init_mem_wide(base + 8, base, 8);
> +    fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL);
>  
>      nodename = g_strdup_printf("/address@hidden" PRIx64, base);
>      qemu_fdt_add_subnode(vbi->fdt, nodename);
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index 88481b7..7399008 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -23,6 +23,7 @@
>   */
>  #include "hw/hw.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/dma.h"
>  #include "hw/isa/isa.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/sysbus.h"
> @@ -30,10 +31,13 @@
>  #include "qemu/error-report.h"
>  #include "qemu/config-file.h"
>  
> -#define FW_CFG_SIZE 2
> +#define FW_CFG_SIZE 3

I see a problem with this. FW_CFG_SIZE controls two things:

- the size of the region that hosts the combined ioport range, for the
  ioport-mapped device. See fw_cfg_io_realize().

- the size of the region that hosts the control (ie. selector)
  register, for the memory-mapped device. See fw_cfg_mem_realize() --
  "fwcfg.ctl".

Setting FW_CFG_SIZE to 3 is incorrect for both cases, but for different
reasons.

- For the ioport-mapped device, the port range should be extended by 4,
  not 1.

  I do realize that the current code works, ie. it is not enforced by
  the memory API that a 4 byte wide outl() at 0x512 actually fit in the
  containing region -- only the start address and the size (treated in
  isolation) matters.

  Sometimes this is even "exploited" for the greater good. For example
  the 4-byte wide ioport at 0xCF8 (PCI config address) happily coexists
  with the 1-byte wide ioport at 0xCF9 (reset control register). The
  overlap doesn't confuse anything because the access is dispatched by
  start address, and the size of the access is "just" auxiliary
  information.

  However, I think this is ugly nonetheless. If we intend for the guest
  to issue outl() against 0x512, then we should extend the region up to
  and including 0x515.

  - Separate problem to be mentioned here: according to the
    documentation, the DMA address register should receive the address
    of the FWCfgDmaAccess structure. Now consider a 16 GB guest that
    allocates that structure above 4GB. There is simply no IO
    instruction ("outq") that could transfer that address in one go.

    However, fw_cfg_comb_write(), even the patched version, insists on
    that. (See it lower down in the patch.) If there's an access at
    0x512, fw_cfg_dma_transfer() is immediately fired off at the
    control structure that exists at the address. Too bad that the four
    most significant bytes of the address will always be zero, because
    the guest has no means to communicate otherwise.

    While this is probably not an issue for SeaBIOS, it certainly would
    be for OVMF. (Not that I intend to hook the feature into OVMF (ie.
    x86) any time soon, but still.) If this is a limitation by design,
    that's fine, but then it should be documented. (In OVMF low memory
    can be easily allocated, but whoever writes that code will have to
    know.)

- FW_CFG_SIZE := 3 is also a problem for the memory mapped device. The
  size of the traditional selector port -- see fw_cfg_mem_realize() --
  should not change.

The upshot is that in this patch series you should split apart the two
uses of FW_CFG_SIZE. In fact, at the moment it should be more correctly
called "FW_CFG_CTL_SIZE". At the moment FW_CFG_SIZE is used for
fw_cfg_io_realize() too because the control register's size dominates /
determines the entire ioport range.

So, you could consider dropping FW_CFG_SIZE altogether, and open-code
the separate sizes in the respective functions (fw_cfg_io_realize,
fw_cfg_mem_realize.)

Let's continue:

>  #define FW_CFG_NAME "fw_cfg"
>  #define FW_CFG_PATH "/machine/" FW_CFG_NAME
>  
> +#define FW_CFG_VERSION      1
> +#define FW_CFG_VERSION_DMA  2
> +

As discussed, this should be a bitmap of features instead.

>  #define TYPE_FW_CFG     "fw_cfg"
>  #define TYPE_FW_CFG_IO  "fw_cfg_io"
>  #define TYPE_FW_CFG_MEM "fw_cfg_mem"
> @@ -42,6 +46,11 @@
>  #define FW_CFG_IO(obj)  OBJECT_CHECK(FWCfgIoState,  (obj), TYPE_FW_CFG_IO)
>  #define FW_CFG_MEM(obj) OBJECT_CHECK(FWCfgMemState, (obj), TYPE_FW_CFG_MEM)
>  
> +/* FW_CFG_DMA_CONTROL bits */
> +#define FW_CFG_DMA_CTL_ERROR   0x01
> +#define FW_CFG_DMA_CTL_READ    0x02
> +#define FW_CFG_DMA_CTL_MASK    0x03
> +
>  typedef struct FWCfgEntry {
>      uint32_t len;
>      uint8_t *data;
> @@ -59,6 +68,10 @@ struct FWCfgState {
>      uint16_t cur_entry;
>      uint32_t cur_offset;
>      Notifier machine_ready;
> +
> +    bool       dma_enabled;
> +    AddressSpace *dma_as;
> +    dma_addr_t dma_addr;
>  };
>  
>  struct FWCfgIoState {
> @@ -75,7 +88,7 @@ struct FWCfgMemState {
>      FWCfgState parent_obj;
>      /*< public >*/
>  
> -    MemoryRegion ctl_iomem, data_iomem;
> +    MemoryRegion ctl_iomem, data_iomem, dma_iomem;
>      uint32_t data_width;
>      MemoryRegionOps wide_data_ops;
>  };
> @@ -294,6 +307,108 @@ static void fw_cfg_data_mem_write(void *opaque, hwaddr 
> addr,
>      } while (i);
>  }
>  
> +static void fw_cfg_dma_transfer(FWCfgState *s)
> +{
> +    dma_addr_t len;
> +    uint8_t *ptr;
> +    FWCfgDmaAccess *dma;
> +    int arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
> +    FWCfgEntry *e = &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];
> +
> +    len = sizeof(*dma);
> +    dma = dma_memory_map(s->dma_as, s->dma_addr, &len,
> +                                DMA_DIRECTION_FROM_DEVICE);
> +
> +    if (!dma || !len) {
> +        return;
> +    }
> +
> +    if (dma->control & FW_CFG_DMA_CTL_ERROR) {
> +        return;
> +    }
> +
> +    if (!(dma->control & FW_CFG_DMA_CTL_READ)) {
> +        return;
> +    }
> +
> +    while (dma->length > 0) {
> +        if (s->cur_entry == FW_CFG_INVALID || !e->data ||
> +                                    s->cur_offset >= e->len) {

whitespace damage

I'm skipping the loop body below for now (I'm too tired and I'd like to
focus on the interface first):

> +            len = dma->length;
> +            if (dma->address) {
> +                ptr = dma_memory_map(s->dma_as, dma->address, &len,
> +                                     DMA_DIRECTION_FROM_DEVICE);
> +                if (!ptr || !len) {
> +                    dma->control |= FW_CFG_DMA_CTL_ERROR;
> +                    goto out;
> +                }
> +
> +                memset(ptr, 0, len);
> +
> +                dma_memory_unmap(s->dma_as, ptr, len,
> +                                 DMA_DIRECTION_FROM_DEVICE, len);
> +            }
> +
> +            dma->address += len;
> +            dma->length  -= len;
> +        } else {
> +            if (dma->length <= e->len) {
> +                len = dma->length;
> +            } else {
> +                len = e->len;
> +            }
> +
> +            if (e->read_callback) {
> +                e->read_callback(e->callback_opaque, s->cur_offset);
> +            }
> +
> +            if (dma->address) {
> +                ptr = dma_memory_map(s->dma_as, dma->address, &len,
> +                                     DMA_DIRECTION_FROM_DEVICE);
> +                if (!ptr || !len) {
> +                    dma->control |= FW_CFG_DMA_CTL_ERROR;
> +                    goto out;
> +                }
> +
> +                memcpy(ptr, &e->data[s->cur_offset], len);
> +
> +                dma_memory_unmap(s->dma_as, ptr, len,
> +                                 DMA_DIRECTION_FROM_DEVICE, len);
> +            }
> +
> +            s->cur_offset += len;
> +
> +            dma->address += len;
> +            dma->length  -= len;
> +
> +            dma->control = 0;
> +        }
> +    }
> +
> +    trace_fw_cfg_read(s, 0);
> +
> +out:
> +    dma_memory_unmap(s->dma_as, dma, sizeof(*dma),
> +                                DMA_DIRECTION_FROM_DEVICE, sizeof(*dma));
> +    return;
> +
> +}
> +
> +static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
> +                                 uint64_t value, unsigned size)
> +{
> +    FWCfgState *s = opaque;
> +
> +    s->dma_addr = be64_to_cpu(value);
> +    fw_cfg_dma_transfer(s);
> +}

So, this is similar to the ioport size limitation I described at the
top. Namely,  I think that an Aarch32 guest won't be able to transfer a
64-bit value with a single MMIO access. (I believe double-width store
instructions do exist, but they cannot be virtualized well. They trap,
but the instruction syndrome register won't give enough info to the
hypervisor.)

Therefore, the address of the dma control structure should be passed in
two 32-bit wide accesses, both for the ioport mapping and the mmio
mapping. This can be done in two ways:
- write the two halves to the same register, and use a latch to
  identify each 2nd access
- use different addresses.

The latch sucks, because the guest has no way to bring the register to a
known good state. Therefore:

In the ioport mapped case, the port range should go up to 0x519, and two
outl's are going to be necessary in the guest. The documentation should
spell out which outl (@ 0x512 or @ 0x516) will trigger the actual transfer.

(I vaguely recall that someone already described this, but I can't find
the message!)

In the memory mapped case: same thing.

> +
> +static bool fw_cfg_dma_mem_valid(void *opaque, hwaddr addr,
> +                                  unsigned size, bool is_write)
> +{
> +    return is_write && addr == 0;
> +}
> +

so this should change...

>  static bool fw_cfg_data_mem_valid(void *opaque, hwaddr addr,
>                                    unsigned size, bool is_write)
>  {
> @@ -321,12 +436,20 @@ static uint64_t fw_cfg_comb_read(void *opaque, hwaddr 
> addr,
>  static void fw_cfg_comb_write(void *opaque, hwaddr addr,
>                                uint64_t value, unsigned size)
>  {
> -    switch (size) {
> +    FWCfgState *s;
> +
> +    switch (addr) {
> +    case 0:
> +        fw_cfg_select(opaque, (uint16_t)value);
> +        break;
>      case 1:
>          fw_cfg_write(opaque, (uint8_t)value);
>          break;
>      case 2:
> -        fw_cfg_select(opaque, (uint16_t)value);
> +        /* FWCfgDmaAccess address */
> +        s = opaque;
> +        s->dma_addr = le64_to_cpu(value);
> +        fw_cfg_dma_transfer(s);
>          break;
>      }
>  }
> @@ -334,7 +457,11 @@ static void fw_cfg_comb_write(void *opaque, hwaddr addr,
>  static bool fw_cfg_comb_valid(void *opaque, hwaddr addr,
>                                    unsigned size, bool is_write)
>  {
> -    return (size == 1) || (is_write && size == 2);
> +    FWCfgState *s = opaque;
> +
> +    return (is_write && size == 2 && addr == 0) ||
> +            (size == 1 && addr == 1) ||
> +            (s->dma_enabled && is_write && addr == 2);
>  }

I get the idea.
- Write the selector (2 byte wide) at ioport offset 0,
- or read the data (1 byte wide) at ioport offset 1,
- or, if DMA is enabled, write the DMA control struct's address at
  ioport offset 2, "as wide as you can". Problem is, that's not wide
  enough.

Additionally, this incurs a guest visible change, for a corner case.
Before the patch, it used to be valid for the guest to outb() at ioport
offset 0. It was dispatched to fw_cfg_write(), which meant "ignore
silently".

We could have dropped the fw_cfg_write() function earlier (when we
turned it into the currently seen no-op). We didn't, because then such a
write access would have injected a guest-visible fault (at least on some
target architectures).

And that's what the above achieves too (probably involuntarily): an
outb() at ioport offset 0 is now a guest visible trap (because
fw_cfg_comb_valid() rejects it).

Even though this is quite obscure, I think we should preserve the
original behavior.

>  
>  static const MemoryRegionOps fw_cfg_ctl_mem_ops = {
> @@ -361,6 +488,12 @@ static const MemoryRegionOps fw_cfg_comb_mem_ops = {
>      .valid.accepts = fw_cfg_comb_valid,
>  };
>  
> +static const MemoryRegionOps fw_cfg_dma_mem_ops = {
> +    .write = fw_cfg_dma_mem_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid.accepts = fw_cfg_dma_mem_valid,
> +};
> +

The docs should say that the address of the DMA control struct is taken
as big endian. (And, in two halves, see above.)

>  static void fw_cfg_reset(DeviceState *d)
>  {
>      FWCfgState *s = FW_CFG(d);
> @@ -401,6 +534,22 @@ static bool is_version_1(void *opaque, int version_id)
>      return version_id == 1;
>  }
>  
> +static bool fw_cfg_dma_enabled(void *opaque)
> +{
> +    FWCfgState *s = opaque;
> +
> +    return s->dma_enabled;
> +}
> +
> +static VMStateDescription vmstate_fw_cfg_dma = {
> +    .name = "fw_cfg/dma",
> +    .needed = fw_cfg_dma_enabled,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(dma_addr, FWCfgState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
>  static const VMStateDescription vmstate_fw_cfg = {
>      .name = "fw_cfg",
>      .version_id = 2,
> @@ -410,6 +559,10 @@ static const VMStateDescription vmstate_fw_cfg = {
>          VMSTATE_UINT16_HACK(cur_offset, FWCfgState, is_version_1),
>          VMSTATE_UINT32_V(cur_offset, FWCfgState, 2),
>          VMSTATE_END_OF_LIST()
> +    },
> +    .subsections = (const VMStateDescription*[]) {
> +        &vmstate_fw_cfg_dma,
> +        NULL,
>      }
>  };
>  
> @@ -595,7 +748,6 @@ static void fw_cfg_init1(DeviceState *dev)
>      qdev_init_nofail(dev);
>  
>      fw_cfg_add_bytes(s, FW_CFG_SIGNATURE, (char *)"QEMU", 4);
> -    fw_cfg_add_i32(s, FW_CFG_ID, 1);
>      fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
>      fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)(display_type == 
> DT_NOGRAPHIC));
>      fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
> @@ -607,22 +759,42 @@ static void fw_cfg_init1(DeviceState *dev)
>      qemu_add_machine_init_done_notifier(&s->machine_ready);
>  }
>  
> -FWCfgState *fw_cfg_init_io(uint32_t iobase)
> +FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, AddressSpace *dma_as)
>  {
>      DeviceState *dev;
> +    FWCfgState *s;
>  
>      dev = qdev_create(NULL, TYPE_FW_CFG_IO);
>      qdev_prop_set_uint32(dev, "iobase", iobase);
> +
> +    s = FW_CFG(dev);
>      fw_cfg_init1(dev);
>  
> -    return FW_CFG(dev);
> +    if (dma_as) {
> +        /* 64 bits for the address field */
> +        s->dma_as = dma_as;
> +        s->dma_enabled = true;
> +
> +        fw_cfg_add_i32(s, FW_CFG_ID, FW_CFG_VERSION_DMA);
> +    } else {
> +        fw_cfg_add_i32(s, FW_CFG_ID, FW_CFG_VERSION);
> +    }

(should be a feature bitmap)

> +
> +    return s;
>  }
>  
> -FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
> -                                 uint32_t data_width)
> +FWCfgState *fw_cfg_init_io(uint32_t iobase)
> +{
> +    return fw_cfg_init_io_dma(iobase, NULL);
> +}
> +
> +FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
> +                                 hwaddr data_addr, uint32_t data_width,
> +                                 hwaddr dma_addr, AddressSpace *dma_as)
>  {
>      DeviceState *dev;
>      SysBusDevice *sbd;
> +    FWCfgState *s;
>  
>      dev = qdev_create(NULL, TYPE_FW_CFG_MEM);
>      qdev_prop_set_uint32(dev, "data_width", data_width);
> @@ -633,13 +805,25 @@ FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, 
> hwaddr data_addr,
>      sysbus_mmio_map(sbd, 0, ctl_addr);
>      sysbus_mmio_map(sbd, 1, data_addr);
>  
> -    return FW_CFG(dev);
> +    s = FW_CFG(dev);
> +
> +    if (dma_addr && dma_as) {
> +        s->dma_as = dma_as;
> +        s->dma_enabled = true;
> +        sysbus_mmio_map(sbd, 2, dma_addr);
> +        fw_cfg_add_i32(s, FW_CFG_ID, FW_CFG_VERSION_DMA);
> +    } else {
> +        fw_cfg_add_i32(s, FW_CFG_ID, FW_CFG_VERSION);
> +    }

(should be a feature bitmap)


> +
> +    return s;
>  }
>  
>  FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr data_addr)
>  {
>      return fw_cfg_init_mem_wide(ctl_addr, data_addr,
> -                                fw_cfg_data_mem_ops.valid.max_access_size);
> +                                fw_cfg_data_mem_ops.valid.max_access_size,
> +                                0, NULL);
>  }
>  
>  
> @@ -725,6 +909,10 @@ static void fw_cfg_mem_realize(DeviceState *dev, Error 
> **errp)
>      memory_region_init_io(&s->data_iomem, OBJECT(s), data_ops, FW_CFG(s),
>                            "fwcfg.data", data_ops->valid.max_access_size);
>      sysbus_init_mmio(sbd, &s->data_iomem);
> +
> +    memory_region_init_io(&s->dma_iomem, OBJECT(s), &fw_cfg_dma_mem_ops,
> +                          FW_CFG(s), "fwcfg.dma", sizeof(dma_addr_t));

I'd prefer (2 * sizeof(uint32_t)).

> +    sysbus_init_mmio(sbd, &s->dma_iomem);
>  }
>  
>  static void fw_cfg_mem_class_init(ObjectClass *klass, void *data)
> diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
> index e60d3ca..4ce51e2 100644
> --- a/include/hw/nvram/fw_cfg.h
> +++ b/include/hw/nvram/fw_cfg.h
> @@ -61,6 +61,12 @@ typedef struct FWCfgFiles {
>      FWCfgFile f[];
>  } FWCfgFiles;
>  
> +typedef struct FWCfgDmaAccess {
> +    uint64_t address;
> +    uint32_t length;
> +    uint32_t control;
> +} FWCfgDmaAccess;
> +

Should be QEMU_PACKED, if only for documentation purposes.

Thanks
Laszlo

>  typedef void (*FWCfgCallback)(void *opaque, uint8_t *data);
>  typedef void (*FWCfgReadCallback)(void *opaque, uint32_t offset);
>  
> @@ -77,10 +83,12 @@ void fw_cfg_add_file_callback(FWCfgState *s, const char 
> *filename,
>                                void *data, size_t len);
>  void *fw_cfg_modify_file(FWCfgState *s, const char *filename, void *data,
>                           size_t len);
> +FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, AddressSpace *dma_as);
>  FWCfgState *fw_cfg_init_io(uint32_t iobase);
>  FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr data_addr);
> -FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
> -                                 uint32_t data_width);
> +FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
> +                                 hwaddr data_addr, uint32_t data_width,
> +                                 hwaddr dma_addr, AddressSpace *dma_as);
>  
>  FWCfgState *fw_cfg_find(void);
>  
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]