qemu-riscv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation


From: Frank Chang
Subject: Re: [PATCH v2 03/15] hw/riscv: add RISC-V IOMMU base emulation
Date: Thu, 16 May 2024 15:13:20 +0800

On Mon, May 13, 2024 at 8:37 PM Daniel Henrique Barboza <dbarboza@ventanamicro.com> wrote:
Hi Frank,


On 5/8/24 08:15, Daniel Henrique Barboza wrote:
> Hi Frank,
>
> I'll reply with that I've done so far. Still missing some stuff:
>
> On 5/2/24 08:37, Frank Chang wrote:
>> Hi Daniel,
>>
>> Daniel Henrique Barboza <dbarboza@ventanamicro.com> 於 2024年3月8日 週五 上午12:04寫道:
>>>
>>> From: Tomasz Jeznach <tjeznach@rivosinc.com>
>>>
>>> The RISC-V IOMMU specification is now ratified as-per the RISC-V
>>> international process. The latest frozen specifcation can be found
>>> at:
>>>
>>> https://github.com/riscv-non-isa/riscv-iommu/releases/download/v1.0/riscv-iommu.pdf
>>>
>>> Add the foundation of the device emulation for RISC-V IOMMU, which
>>> includes an IOMMU that has no capabilities but MSI interrupt support and
>>> fault queue interfaces. We'll add add more features incrementally in the
>>> next patches.
>>>
>>> Co-developed-by: Sebastien Boeuf <seb@rivosinc.com>
>>> Signed-off-by: Sebastien Boeuf <seb@rivosinc.com>
>>> Signed-off-by: Tomasz Jeznach <tjeznach@rivosinc.com>
>>> Signed-off-by: Daniel Henrique Barboza <dbarboza@ventanamicro.com>
>>> ---
>>>   hw/riscv/Kconfig         |    4 +

(...)

>>> +
>>> +    s->iommus.le_next = NULL;
>>> +    s->iommus.le_prev = NULL;
>>> +    QLIST_INIT(&s->spaces);
>>> +    qemu_cond_init(&s->core_cond);
>>> +    qemu_mutex_init(&s->core_lock);
>>> +    qemu_spin_init(&s->regs_lock);
>>> +    qemu_thread_create(&s->core_proc, "riscv-iommu-core",
>>> +        riscv_iommu_core_proc, s, QEMU_THREAD_JOINABLE);
>>
>> In our experience, using QEMU thread increases the latency of command
>> queue processing,
>> which leads to the potential IOMMU fence timeout in the Linux driver
>> when using IOMMU with KVM,
>> e.g. booting the guest Linux.
>>
>> Is it possible to remove the thread from the IOMMU just like ARM, AMD,
>> and Intel IOMMU models?
>
> Interesting. We've been using this emulation internally in Ventana, with
> KVM and VFIO, and didn't experience this issue. Drew is on CC and can talk
> more about it.
>
> That said, I don't mind this change, assuming it's feasible to make it for this
> first version.  I'll need to check it how other IOMMUs are doing it.


I removed the threading and it seems to be working fine without it. I'll commit this
change for v3.

>
>
>
>>
>>> +}
>>> +
>
> (...)
>
>>> +
>>> +static AddressSpace *riscv_iommu_find_as(PCIBus *bus, void *opaque, int devfn)
>>> +{
>>> +    RISCVIOMMUState *s = (RISCVIOMMUState *) opaque;
>>> +    PCIDevice *pdev = pci_find_device(bus, pci_bus_num(bus), devfn);
>>> +    AddressSpace *as = NULL;
>>> +
>>> +    if (pdev && pci_is_iommu(pdev)) {
>>> +        return s->target_as;
>>> +    }
>>> +
>>> +    /* Find first registered IOMMU device */
>>> +    while (s->iommus.le_prev) {
>>> +        s = *(s->iommus.le_prev);
>>> +    }
>>> +
>>> +    /* Find first matching IOMMU */
>>> +    while (s != NULL && as == NULL) {
>>> +        as = riscv_iommu_space(s, PCI_BUILD_BDF(pci_bus_num(bus), devfn));
>>
>> For pci_bus_num(),
>> riscv_iommu_find_as() can be called at the very early stage
>> where software has no chance to enumerate the bus numbers.

I investigated and this doesn't seem to be a problem. This function is called at the
last step of the realize() steps of both riscv_iommu_pci_realize() and
riscv_iommu_sys_realize(), and by that time the pci_bus_num() is already assigned.
Other iommus use pci_bus_num() into their own get_address_space() callbacks like
this too.

Hi Daniel,

IIUC, pci_bus_num() by default is assigned to pcibus_num():

static int pcibus_num(PCIBus *bus)
{
    if (pci_bus_is_root(bus)) {
        return 0; /* pci host bridge */
    }
    return bus->parent_dev->config[PCI_SECONDARY_BUS];
}

If the bus is not the root bus, it tries to read the bus' parent device's
secondary bus number (PCI_SECONDARY_BUS) field in the PCI configuration space.
This field should be programmable by the SW during PCIe enumeration.
But I don't think SW has a chance to be executed before riscv_iommu_sys_realize() is called,
since it's pretty early before CPU's execution unless RISC-V IOMMU is hot-plugged.
Even if RISC-V IOMMU is hot-plugged, I think riscv_iommu_sys_realize() is still called
before SW aware of the existence of IOMMU on the PCI topology tree.

Do you think this matches your observation? 

Regards,
Frank Chang
 


Thanks,


Daniel


>
> I'll see how other IOMMUs are handling their iommu_find_as()
>
>
> Thanks,
>
>
> Daniel
>
>
>>
>>
>>
>>
>>> +        s = s->iommus.le_next;
>>> +    }
>>> +
>>> +    return as ? as : &address_space_memory;
>>> +}
>>> +
>>> +static const PCIIOMMUOps riscv_iommu_ops = {
>>> +    .get_address_space = riscv_iommu_find_as,
>>> +};
>>> +
>>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>>> +        Error **errp)
>>> +{
>>> +    if (bus->iommu_ops &&
>>> +        bus->iommu_ops->get_address_space == riscv_iommu_find_as) {
>>> +        /* Allow multiple IOMMUs on the same PCIe bus, link known devices */
>>> +        RISCVIOMMUState *last = (RISCVIOMMUState *)bus->iommu_opaque;
>>> +        QLIST_INSERT_AFTER(last, iommu, iommus);
>>> +    } else if (bus->iommu_ops == NULL) {
>>> +        pci_setup_iommu(bus, &riscv_iommu_ops, iommu);
>>> +    } else {
>>> +        error_setg(errp, "can't register secondary IOMMU for PCI bus #%d",
>>> +            pci_bus_num(bus));
>>> +    }
>>> +}
>>> +
>>> +static int riscv_iommu_memory_region_index(IOMMUMemoryRegion *iommu_mr,
>>> +    MemTxAttrs attrs)
>>> +{
>>> +    return attrs.unspecified ? RISCV_IOMMU_NOPASID : (int)attrs.pasid;
>>> +}
>>> +
>>> +static int riscv_iommu_memory_region_index_len(IOMMUMemoryRegion *iommu_mr)
>>> +{
>>> +    RISCVIOMMUSpace *as = container_of(iommu_mr, RISCVIOMMUSpace, iova_mr);
>>> +    return 1 << as->iommu->pasid_bits;
>>> +}
>>> +
>>> +static void riscv_iommu_memory_region_init(ObjectClass *klass, void *data)
>>> +{
>>> +    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
>>> +
>>> +    imrc->translate = riscv_iommu_memory_region_translate;
>>> +    imrc->notify_flag_changed = riscv_iommu_memory_region_notify;
>>> +    imrc->attrs_to_index = riscv_iommu_memory_region_index;
>>> +    imrc->num_indexes = riscv_iommu_memory_region_index_len;
>>> +}
>>> +
>>> +static const TypeInfo riscv_iommu_memory_region_info = {
>>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
>>> +    .name = TYPE_RISCV_IOMMU_MEMORY_REGION,
>>> +    .class_init = riscv_iommu_memory_region_init,
>>> +};
>>> +
>>> +static void riscv_iommu_register_mr_types(void)
>>> +{
>>> +    type_register_static(&riscv_iommu_memory_region_info);
>>> +    type_register_static(&riscv_iommu_info);
>>> +}
>>> +
>>> +type_init(riscv_iommu_register_mr_types);
>>> diff --git a/hw/riscv/riscv-iommu.h b/hw/riscv/riscv-iommu.h
>>> new file mode 100644
>>> index 0000000000..6f740de690
>>> --- /dev/null
>>> +++ b/hw/riscv/riscv-iommu.h
>>> @@ -0,0 +1,141 @@
>>> +/*
>>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>>> + *
>>> + * Copyright (C) 2022-2023 Rivos Inc.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along
>>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#ifndef HW_RISCV_IOMMU_STATE_H
>>> +#define HW_RISCV_IOMMU_STATE_H
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qom/object.h"
>>> +
>>> +#include "hw/riscv/iommu.h"
>>> +
>>> +struct RISCVIOMMUState {
>>> +    /*< private >*/
>>> +    DeviceState parent_obj;
>>> +
>>> +    /*< public >*/
>>> +    uint32_t version;     /* Reported interface version number */
>>> +    uint32_t pasid_bits;  /* process identifier width */
>>> +    uint32_t bus;         /* PCI bus mapping for non-root endpoints */
>>> +
>>> +    uint64_t cap;         /* IOMMU supported capabilities */
>>> +    uint64_t fctl;        /* IOMMU enabled features */
>>> +
>>> +    bool enable_off;      /* Enable out-of-reset OFF mode (DMA disabled) */
>>> +    bool enable_msi;      /* Enable MSI remapping */
>>> +
>>> +    /* IOMMU Internal State */
>>> +    uint64_t ddtp;        /* Validated Device Directory Tree Root Pointer */
>>> +
>>> +    dma_addr_t cq_addr;   /* Command queue base physical address */
>>> +    dma_addr_t fq_addr;   /* Fault/event queue base physical address */
>>> +    dma_addr_t pq_addr;   /* Page request queue base physical address */
>>> +
>>> +    uint32_t cq_mask;     /* Command queue index bit mask */
>>> +    uint32_t fq_mask;     /* Fault/event queue index bit mask */
>>> +    uint32_t pq_mask;     /* Page request queue index bit mask */
>>> +
>>> +    /* interrupt notifier */
>>> +    void (*notify)(RISCVIOMMUState *iommu, unsigned vector);
>>> +
>>> +    /* IOMMU State Machine */
>>> +    QemuThread core_proc; /* Background processing thread */
>>> +    QemuMutex core_lock;  /* Global IOMMU lock, used for cache/regs updates */
>>> +    QemuCond core_cond;   /* Background processing wake up signal */
>>> +    unsigned core_exec;   /* Processing thread execution actions */
>>> +
>>> +    /* IOMMU target address space */
>>> +    AddressSpace *target_as;
>>> +    MemoryRegion *target_mr;
>>> +
>>> +    /* MSI / MRIF access trap */
>>> +    AddressSpace trap_as;
>>> +    MemoryRegion trap_mr;
>>> +
>>> +    GHashTable *ctx_cache;          /* Device translation Context Cache */
>>> +
>>> +    /* MMIO Hardware Interface */
>>> +    MemoryRegion regs_mr;
>>> +    QemuSpin regs_lock;
>>> +    uint8_t *regs_rw;  /* register state (user write) */
>>> +    uint8_t *regs_wc;  /* write-1-to-clear mask */
>>> +    uint8_t *regs_ro;  /* read-only mask */
>>> +
>>> +    QLIST_ENTRY(RISCVIOMMUState) iommus;
>>> +    QLIST_HEAD(, RISCVIOMMUSpace) spaces;
>>> +};
>>> +
>>> +void riscv_iommu_pci_setup_iommu(RISCVIOMMUState *iommu, PCIBus *bus,
>>> +         Error **errp);
>>> +
>>> +/* private helpers */
>>> +
>>> +/* Register helper functions */
>>> +static inline uint32_t riscv_iommu_reg_mod32(RISCVIOMMUState *s,
>>> +    unsigned idx, uint32_t set, uint32_t clr)
>>> +{
>>> +    uint32_t val;
>>> +    qemu_spin_lock(&s->regs_lock);
>>> +    val = ldl_le_p(s->regs_rw + idx);
>>> +    stl_le_p(s->regs_rw + idx, (val & ~clr) | set);
>>> +    qemu_spin_unlock(&s->regs_lock);
>>> +    return val;
>>> +}
>>> +
>>> +static inline void riscv_iommu_reg_set32(RISCVIOMMUState *s,
>>> +    unsigned idx, uint32_t set)
>>> +{
>>> +    qemu_spin_lock(&s->regs_lock);
>>> +    stl_le_p(s->regs_rw + idx, set);
>>> +    qemu_spin_unlock(&s->regs_lock);
>>> +}
>>> +
>>> +static inline uint32_t riscv_iommu_reg_get32(RISCVIOMMUState *s,
>>> +    unsigned idx)
>>> +{
>>> +    return ldl_le_p(s->regs_rw + idx);
>>> +}
>>> +
>>> +static inline uint64_t riscv_iommu_reg_mod64(RISCVIOMMUState *s,
>>> +    unsigned idx, uint64_t set, uint64_t clr)
>>> +{
>>> +    uint64_t val;
>>> +    qemu_spin_lock(&s->regs_lock);
>>> +    val = ldq_le_p(s->regs_rw + idx);
>>> +    stq_le_p(s->regs_rw + idx, (val & ~clr) | set);
>>> +    qemu_spin_unlock(&s->regs_lock);
>>> +    return val;
>>> +}
>>> +
>>> +static inline void riscv_iommu_reg_set64(RISCVIOMMUState *s,
>>> +    unsigned idx, uint64_t set)
>>> +{
>>> +    qemu_spin_lock(&s->regs_lock);
>>> +    stq_le_p(s->regs_rw + idx, set);
>>> +    qemu_spin_unlock(&s->regs_lock);
>>> +}
>>> +
>>> +static inline uint64_t riscv_iommu_reg_get64(RISCVIOMMUState *s,
>>> +    unsigned idx)
>>> +{
>>> +    return ldq_le_p(s->regs_rw + idx);
>>> +}
>>> +
>>> +
>>> +
>>> +#endif
>>> diff --git a/hw/riscv/trace-events b/hw/riscv/trace-events
>>> new file mode 100644
>>> index 0000000000..42a97caffa
>>> --- /dev/null
>>> +++ b/hw/riscv/trace-events
>>> @@ -0,0 +1,11 @@
>>> +# See documentation at docs/devel/tracing.rst
>>> +
>>> +# riscv-iommu.c
>>> +riscv_iommu_new(const char *id, unsigned b, unsigned d, unsigned f) "%s: device attached %04x:%02x.%d"
>>> +riscv_iommu_flt(const char *id, unsigned b, unsigned d, unsigned f, uint64_t reason, uint64_t iova) "%s: fault %04x:%02x.%u reason: 0x%"PRIx64" iova: 0x%"PRIx64
>>> +riscv_iommu_pri(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova) "%s: page request %04x:%02x.%u iova: 0x%"PRIx64
>>> +riscv_iommu_dma(const char *id, unsigned b, unsigned d, unsigned f, unsigned pasid, const char *dir, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u #%u %s 0x%"PRIx64" -> 0x%"PRIx64
>>> +riscv_iommu_msi(const char *id, unsigned b, unsigned d, unsigned f, uint64_t iova, uint64_t phys) "%s: translate %04x:%02x.%u MSI 0x%"PRIx64" -> 0x%"PRIx64
>>> +riscv_iommu_cmd(const char *id, uint64_t l, uint64_t u) "%s: command 0x%"PRIx64" 0x%"PRIx64
>>> +riscv_iommu_notifier_add(const char *id) "%s: dev-iotlb notifier added"
>>> +riscv_iommu_notifier_del(const char *id) "%s: dev-iotlb notifier removed"
>>> diff --git a/hw/riscv/trace.h b/hw/riscv/trace.h
>>> new file mode 100644
>>> index 0000000000..b88504b750
>>> --- /dev/null
>>> +++ b/hw/riscv/trace.h
>>> @@ -0,0 +1,2 @@
>>> +#include "trace/trace-hw_riscv.h"
>>> +
>>> diff --git a/include/hw/riscv/iommu.h b/include/hw/riscv/iommu.h
>>> new file mode 100644
>>> index 0000000000..403b365893
>>> --- /dev/null
>>> +++ b/include/hw/riscv/iommu.h
>>> @@ -0,0 +1,36 @@
>>> +/*
>>> + * QEMU emulation of an RISC-V IOMMU (Ziommu)
>>> + *
>>> + * Copyright (C) 2022-2023 Rivos Inc.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify
>>> + * it under the terms of the GNU General Public License as published by
>>> + * the Free Software Foundation; either version 2 of the License.
>>> + *
>>> + * This program is distributed in the hope that it will be useful,
>>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>> + * GNU General Public License for more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along
>>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#ifndef HW_RISCV_IOMMU_H
>>> +#define HW_RISCV_IOMMU_H
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qom/object.h"
>>> +
>>> +#define TYPE_RISCV_IOMMU "riscv-iommu"
>>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUState, RISCV_IOMMU)
>>> +typedef struct RISCVIOMMUState RISCVIOMMUState;
>>> +
>>> +#define TYPE_RISCV_IOMMU_MEMORY_REGION "riscv-iommu-mr"
>>> +typedef struct RISCVIOMMUSpace RISCVIOMMUSpace;
>>> +
>>> +#define TYPE_RISCV_IOMMU_PCI "riscv-iommu-pci"
>>> +OBJECT_DECLARE_SIMPLE_TYPE(RISCVIOMMUStatePci, RISCV_IOMMU_PCI)
>>> +typedef struct RISCVIOMMUStatePci RISCVIOMMUStatePci;
>>> +
>>> +#endif
>>> diff --git a/meson.build b/meson.build
>>> index c59ca496f2..75e56f3282 100644
>>> --- a/meson.build
>>> +++ b/meson.build
>>> @@ -3361,6 +3361,7 @@ if have_system
>>>       'hw/rdma',
>>>       'hw/rdma/vmw',
>>>       'hw/rtc',
>>> +    'hw/riscv',
>>>       'hw/s390x',
>>>       'hw/scsi',
>>>       'hw/sd',
>>> --
>>> 2.43.2
>>>
>>>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]