[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH 26/77] ppc/pnv: Add skeletton PowerNV platform
From: |
David Gibson |
Subject: |
Re: [Qemu-ppc] [PATCH 26/77] ppc/pnv: Add skeletton PowerNV platform |
Date: |
Tue, 24 Nov 2015 13:43:51 +1100 |
User-agent: |
Mutt/1.5.23 (2015-06-09) |
On Tue, Nov 24, 2015 at 12:45:48PM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2015-11-20 at 19:21 +1100, David Gibson wrote:
> > On Wed, Nov 11, 2015 at 11:27:39AM +1100, Benjamin Herrenschmidt
> > wrote:
> > > No devices yet, not even an interrupt controller, just to get
> > > started.
> > >
> > > Signed-off-by: Benjamin Herrenschmidt <address@hidden>
> > > ---
> > > default-configs/ppc64-softmmu.mak | 1 +
> > > hw/ppc/Makefile.objs | 2 +
> > > hw/ppc/pnv.c | 600
> > > ++++++++++++++++++++++++++++++++++++++
> > > include/hw/ppc/pnv.h | 36 +++
> > > 4 files changed, 639 insertions(+)
> > > create mode 100644 hw/ppc/pnv.c
> > > create mode 100644 include/hw/ppc/pnv.h
> >
> > Many of my comments below may be made irrelevant by later patches in
> > the series.
>
> Heh, well there is where the "meat" of the new platform starts showing
> up :-)
>
> .../...
>
> > > +#define _FDT(exp) \
> > > + do { \
> > > + int ret = (exp); \
> > > + if (ret < 0) { \
> > > + fprintf(stderr, "qemu: error creating device tree: %s:
> > > %s\n", \
> > > + #exp, fdt_strerror(ret)); \
> > > + exit(1); \
> > > + } \
> > > + } while (0)
> >
> > We should probably make a file where helper routines used by both
> > spapr.c and pnv.c can live.
>
> Probably but I'd see that as a later cleanup rather than doing it
> in this series...
Ok.
>
> .../...
>
> > > + if (pcc->l1_dcache_size) {
> > > + _FDT((fdt_property_cell(fdt, "d-cache-size",
> > > pcc->l1_dcache_size)));
> > > + } else {
> > > + fprintf(stderr, "Warning: Unknown L1 dcache size for cpu\n");
> >
> > Hmm (note to self) should probably change a bunch of these both in
> > spapr.c and pnv.c from explicit fprintfs() to modern error_report()
> > and similar.
>
> That's a train I completely missed, but yes.
>
> .../...
>
> > > + }
> > > + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-server#s",
> > > + servers_prop, sizeof(servers_prop))));
> > > + _FDT((fdt_property(fdt, "ibm,ppc-interrupt-gserver#s",
> > > + gservers_prop, sizeof(gservers_prop))));
> > > +
> > > + _FDT((fdt_end_node(fdt)));
> > > +}
> > > +
> > > +static void *powernv_create_fdt(PnvSystem *sys, uint32_t initrd_base,
> > > uint32_t initrd_size)
> > > +{
> >
> > So.. does it make sense for qemu to create a device tree for powernv,
> > rather than leaving it to Opal?
>
> Well, OPAL only creates a device-tree if you are on an FSP machine in
> which case it expects a complex data structure (HDAT) coming from the
> FSP to use as a source of info.
>
> On OpenPower machines, which is closer to what we simulate here, we
> do get a device-tree as an input in OPAL, it's generated by HostBoot.
>
> Now, I am not running HostBoot in qemu because most of what it does
> is completely irrelevant to an emulated system (training the various
> links, initializing the memory buffer chips, etc...).
>
> However we do need to pass a number of platform information to OPAL
> which HB does via the device-tree, such as which cores are enabled,
> the memory map configured for PCI, which PHBs are enabled, etc... so
> creating a DT in qemu makes sense, it mimmics HB in essence.
>
> OPAL will enrich that device-tree before starting Linux.
Ok. Some comments mentioning that you're simulating the exit state
from HostBoot would be good then.
> > > + /*
> > > + * Add info to guest to indentify which host is it being run on
> > > + * and what is the uuid of the guest
> > > + */
> > > + if (kvmppc_get_host_model(&buf)) {
> > > + _FDT((fdt_property_string(fdt, "host-model", buf)));
> > > + g_free(buf);
> > > + }
> > > + if (kvmppc_get_host_serial(&buf)) {
> > > + _FDT((fdt_property_string(fdt, "host-serial", buf)));
> > > + g_free(buf);
> > > + }
> >
> > Since you're emulating a "bare metal" machine, surely the host
> > properties aren't relevant here.
>
> They may or may not. But yes, I can take that out.
>
> > > + buf = g_strdup_printf(UUID_FMT, qemu_uuid[0], qemu_uuid[1],
> > > + qemu_uuid[2], qemu_uuid[3], qemu_uuid[4],
> > > + qemu_uuid[5], qemu_uuid[6], qemu_uuid[7],
> > > + qemu_uuid[8], qemu_uuid[9], qemu_uuid[10],
> > > + qemu_uuid[11], qemu_uuid[12], qemu_uuid[13],
> > > + qemu_uuid[14], qemu_uuid[15]);
> > > +
> > > + _FDT((fdt_property_string(fdt, "vm,uuid", buf)));
> > > + g_free(buf);
> > > +
> > > + _FDT((fdt_begin_node(fdt, "chosen")));
> > > + _FDT((fdt_property(fdt, "linux,initrd-start",
> > > + &start_prop, sizeof(start_prop))));
> > > + _FDT((fdt_property(fdt, "linux,initrd-end",
> > > + &end_prop, sizeof(end_prop))));
> > > + _FDT((fdt_end_node(fdt)));
> > > +
> > > + _FDT((fdt_property_cell(fdt, "#address-cells", 0x2)));
> > > + _FDT((fdt_property_cell(fdt, "#size-cells", 0x2)));
> > > +
> > > + /* cpus */
> > > + _FDT((fdt_begin_node(fdt, "cpus")));
> > > + _FDT((fdt_property_cell(fdt, "#address-cells", 0x1)));
> > > + _FDT((fdt_property_cell(fdt, "#size-cells", 0x0)));
> > > +
> > > + CPU_FOREACH(cs) {
> > > + powernv_create_cpu_node(fdt, cs, smt);
> > > + }
> > > +
> > > + _FDT((fdt_end_node(fdt)));
> > > +
> > > + /* Memory */
> > > + _FDT((powernv_populate_memory(fdt)));
> > > +
> > > + /* /hypervisor node */
> > > + if (kvm_enabled()) {
> > > + uint8_t hypercall[16];
> > > +
> > > + /* indicate KVM hypercall interface */
> > > + _FDT((fdt_begin_node(fdt, "hypervisor")));
> > > + _FDT((fdt_property_string(fdt, "compatible", "linux,kvm")));
> > > + if (kvmppc_has_cap_fixup_hcalls()) {
> > > + /*
> > > + * Older KVM versions with older guest kernels were broken
> > > with the
> > > + * magic page, don't allow the guest to map it.
> > > + */
> > > + kvmppc_get_hypercall(first_cpu->env_ptr, hypercall,
> > > + sizeof(hypercall));
> > > + _FDT((fdt_property(fdt, "hcall-instructions", hypercall,
> > > + sizeof(hypercall))));
> > > + }
> > > + _FDT((fdt_end_node(fdt)));
> > > + }
> >
> > And a hypercall interface surely doesn't make sense for powernv.
>
> It's qemu paravirt, it exist on G5 too :-) It's for PR KVM, it allows
> to speed up some bits and pieces. But yeah we don't yet really
> "support" it at this point. However we might.
Ah, yes, I forgot about that.
> > > +
> > > + _FDT((fdt_end_node(fdt))); /* close root node */
> > > + _FDT((fdt_finish(fdt)));
> > > +
> > > + return fdt;
> > > +}
> > > +
> > > +static void powernv_cpu_reset(void *opaque)
> > > +{
> > > + PowerPCCPU *cpu = opaque;
> > > + CPUState *cs = CPU(cpu);
> > > + CPUPPCState *env = &cpu->env;
> > > +
> > > + cpu_reset(cs);
> > > +
> > > + env->spr[SPR_PIR] = ppc_get_vcpu_dt_id(cpu);
> > > + env->spr[SPR_HIOR] = 0;
> > > + env->gpr[3] = FDT_ADDR;
> > > + env->nip = 0x10;
> > > + env->msr |= MSR_HVB;
> > > +}
> >
> > So, I believe the qemu-ishly correct way of doing this is to have the
> > cpu initialization in the cpu code, rather than the platform code, as
> > much as possible. On PAPR we kind of get away with initialization in
> > the platform code on the grounds that it's a paravirt platform, but
> > powernv doesn't have that excuse.
> >
> > But this may well be stuff that changes in later patches, so..
>
> Well no, not really. But here too, we mimmic the state as coming out of
> HostBoot, which isn't quite the same thing. We need to provide the FDT
> entry, etc...
>
> The "real" reset state of a P8 isn't something we can easily
> simulate...
>
> It runs some microcode from a SEEPROM with a small microcontroller
> which initializes a core, which then runs some HB code off it's L3
> cache etc... really not something we want to do in qemu at least
> for now.
>
> So the initial state here is somewhat in between full virt and
> paravirt, we simulate a platform that has been partially initialized by
> HostBoot, to the state it has when it enters OPAL.
Ok, that makes sense, but I think it needs a bit more explanation in
the code to that effect.
> > > +static const VMStateDescription vmstate_powernv = {
> > > + .name = "powernv",
> > > + .version_id = 1,
> > > + .minimum_version_id = 1,
> > > +};
> >
> > It might be best to leave out the vmstate entirely until you're ready
> > to implement migration, rather than having a partial, probably not
> > working migration implementation.
>
> Ok.
>
> > > +
> > > +static void pnv_create_chip(PnvSystem *sys, unsigned int chip_no)
> > > +{
> > > + PnvChip *chip = &sys->chips[chip_no];
> > > +
> > > + if (chip_no >= PNV_MAX_CHIPS) {
> > > + return;
> > > + }
> > > +
> > > + /* XXX Improve chip numbering to better match HW */
> > > + chip->chip_id = chip_no;
> >
> > I think modern qemu conventions would suggest creating the chips as
> > QOM objects rather than having a fixed array.
>
> Yeah, more code & much larger memory footprint for the same result :-)
>
> I can look into it but it's low priority. I still want to rework some
> of that chip stuff in future patches anyway.
>
> > > +}
> > > +
> > > +static void ppc_powernv_init(MachineState *machine)
> > > +{
> > > + ram_addr_t ram_size = machine->ram_size;
> > > + const char *cpu_model = machine->cpu_model;
> > > + const char *kernel_filename = machine->kernel_filename;
> > > + const char *initrd_filename = machine->initrd_filename;
> > > + uint32_t initrd_base = 0;
> > > + long initrd_size = 0;
> > > + PowerPCCPU *cpu;
> > > + CPUPPCState *env;
> > > + MemoryRegion *sysmem = get_system_memory();
> > > + MemoryRegion *ram = g_new(MemoryRegion, 1);
> > > + sPowerNVMachineState *pnv_machine = POWERNV_MACHINE(machine);
> > > + PnvSystem *sys = &pnv_machine->sys;
> > > + long fw_size;
> > > + char *filename;
> > > + void *fdt;
> > > + int i;
> > > +
> > > + /* init CPUs */
> > > + if (cpu_model == NULL) {
> > > + cpu_model = kvm_enabled() ? "host" : "POWER8";
> > > + }
> > > +
> > > + for (i = 0; i < smp_cpus; i++) {
> > > + cpu = cpu_ppc_init(cpu_model);
> > > + if (cpu == NULL) {
> > > + fprintf(stderr, "Unable to find PowerPC CPU definition\n");
> > > + exit(1);
> > > + }
> > > + env = &cpu->env;
> > > +
> > > + /* Set time-base frequency to 512 MHz */
> > > + cpu_ppc_tb_init(env, TIMEBASE_FREQ);
> > > +
> > > + /* MSR[IP] doesn't exist nowadays */
> > > + env->msr_mask &= ~(1 << 6);
> > > +
> > > + qemu_register_reset(powernv_cpu_reset, cpu);
> > > + }
> > > +
> > > + /* allocate RAM */
> > > + memory_region_allocate_system_memory(ram, NULL, "ppc_powernv.ram",
> > > ram_size);
> > > + memory_region_add_subregion(sysmem, 0, ram);
> > > +
> > > + /* XXX We should decide how many chips to create based on #cores and
> > > + * Venice vs. Murano vs. Naples chip type etc..., for now, just
> > > create
> > > + * one chip. Also creation of the CPUs should be done per-chip
> > > + */
> > > + sys->num_chips = 1;
> > > +
> > > + /* Create only one PHB for now until I figure out what's wrong
> > > + * when I create more (resource assignment failures in Linux)
> > > + */
> > > + pnv_create_chip(sys, 0);
> > > +
> > > + if (bios_name == NULL) {
> > > + bios_name = FW_FILE_NAME;
> > > + }
> > > + filename = qemu_find_file(QEMU_FILE_TYPE_BIOS, bios_name);
> > > + fw_size = load_image_targphys(filename, 0, FW_MAX_SIZE);
> > > + if (fw_size < 0) {
> > > + hw_error("qemu: could not load OPAL '%s'\n", filename);
> > > + exit(1);
> > > + }
> > > + g_free(filename);
> > > +
> > > +
> > > + if (kernel_filename == NULL) {
> > > + kernel_filename = KERNEL_FILE_NAME;
> > > + }
> > > + filename = qemu_find_file(QEMU_FILE_TYPE_BIOS,
> > > kernel_filename);
> >
> > The commit withe Opal image should go in before this, no?
>
> Now this is a bit of an open discussion at the moment :-)
>
> The way OPAL is built on OPP machines today is by essentially building
> a complete flash image which contains HostBoot, OPAL and the petitboot-
> based bootloader which contains a Linux kernel etc...
>
> We could create a target without HB and with a slimmed down Linux but
> it would still probably be about 12MB I reckon, if not more. It feels a
> bit "big" to ship as a binary as part of qemu...
>
> We would also have to add code to qemu to "find" OPAL in that image,
> and then add a model for the flash controller.
>
> The other option is to bundle just OPAL itself. However that means
> you can't go anywhere without a -kernel argument, which would then
> be either a petitboot-based bootloader or your actual target kernel.
Hm, ok. But in order for this to be usable, we need some way to get a
suitable image. So medium term, I think it makes sense to include
both opal and PetitBoot, so you can boot the guest like a real
machine.
However, including only Opal and requiring -kernel would be a
reasonable interim step.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature
- Re: [Qemu-ppc] [PATCH 31/77] ppc/xics: Remove unused xics_set_irq_type(), (continued)
- [Qemu-ppc] [PATCH 35/77] ppc/xics: Move xics_set_nr_irqs() to xics_spapr.c and xics_kvm.c, Benjamin Herrenschmidt, 2015/11/10
- [Qemu-ppc] [PATCH 34/77] ppc/xics: An ICS with offset 0 is assumed to be uninitialized, Benjamin Herrenschmidt, 2015/11/10
- [Qemu-ppc] [PATCH 39/77] ppc/xics: Add xics to the monitor "info pic" command, Benjamin Herrenschmidt, 2015/11/10
- [Qemu-ppc] [PATCH 32/77] ppc/xics: Replace "icp" with "xics" in most places, Benjamin Herrenschmidt, 2015/11/10
- [Qemu-ppc] [PATCH 27/77] ppc/pnv: Add XSCOM infrastructure, Benjamin Herrenschmidt, 2015/11/10
- [Qemu-ppc] [PATCH 40/77] ppc/pnv: Wire up XICS native with PowerNV platform, Benjamin Herrenschmidt, 2015/11/10