qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 2/2] target/arm: kvm: Handle DABT with no valid ISS


From: Andrew Jones
Subject: Re: [PATCH v2 2/2] target/arm: kvm: Handle DABT with no valid ISS
Date: Fri, 7 Feb 2020 09:19:50 +0100

On Thu, Feb 06, 2020 at 09:48:05PM +0000, Beata Michalska wrote:
> On Wed, 5 Feb 2020 at 16:57, Andrew Jones <address@hidden> wrote:
> >
> > On Wed, Jan 29, 2020 at 08:24:41PM +0000, Beata Michalska wrote:
> > > On ARMv7 & ARMv8 some load/store instructions might trigger a data abort
> > > exception with no valid ISS info to be decoded. The lack of decode info
> > > makes it at least tricky to emulate those instruction which is one of the
> > > (many) reasons why KVM will not even try to do so.
> > >
> > > Add support for handling those by requesting KVM to inject external
> > > dabt into the quest.
> > >
> > > Signed-off-by: Beata Michalska <address@hidden>
> > > ---
> > >  target/arm/cpu.h     |  2 ++
> > >  target/arm/kvm.c     | 96 
> > > ++++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  target/arm/kvm32.c   |  3 ++
> > >  target/arm/kvm64.c   |  3 ++
> > >  target/arm/kvm_arm.h | 19 +++++++++++
> > >  5 files changed, 123 insertions(+)
> > >
> > > diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> > > index c1aedbe..e04a8d3 100644
> > > --- a/target/arm/cpu.h
> > > +++ b/target/arm/cpu.h
> > > @@ -558,6 +558,8 @@ typedef struct CPUARMState {
> > >          uint8_t has_esr;
> > >          uint64_t esr;
> > >      } serror;
> > > +    /* Status field for pending external dabt */
> > > +    uint8_t ext_dabt_pending;
> > >
> > >      /* State of our input IRQ/FIQ/VIRQ/VFIQ lines */
> > >      uint32_t irq_line_state;
> > > diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> > > index 8d82889..e7bc9b7 100644
> > > --- a/target/arm/kvm.c
> > > +++ b/target/arm/kvm.c
> > > @@ -37,6 +37,7 @@ const KVMCapabilityInfo 
> > > kvm_arch_required_capabilities[] = {
> > >
> > >  static bool cap_has_mp_state;
> > >  static bool cap_has_inject_serror_esr;
> > > +static bool cap_has_inject_ext_dabt; /* KVM_CAP_ARM_INJECT_EXT_DABT */
> >
> > nit: the KVM_CAP_ARM_INJECT_EXT_DABT comment is unnecessary
> 
> Might be - I just find it handy when looking for  related details.
> I will remove that one though.
> 
> >
> > >
> > >  static ARMHostCPUFeatures arm_host_cpu_features;
> > >
> > > @@ -62,6 +63,12 @@ void kvm_arm_init_serror_injection(CPUState *cs)
> > >                                      KVM_CAP_ARM_INJECT_SERROR_ESR);
> > >  }
> > >
> > > +void kvm_arm_init_ext_dabt_injection(CPUState *cs)
> > > +{
> > > +    cap_has_inject_ext_dabt = kvm_check_extension(cs->kvm_state,
> > > +                                    KVM_CAP_ARM_INJECT_EXT_DABT);
> > > +}
> > > +
> > >  bool kvm_arm_create_scratch_host_vcpu(const uint32_t *cpus_to_try,
> > >                                        int *fdarray,
> > >                                        struct kvm_vcpu_init *init)
> > > @@ -216,6 +223,11 @@ int kvm_arch_init(MachineState *ms, KVMState *s)
> > >          ret = -EINVAL;
> > >      }
> > >
> > > +    if (kvm_check_extension(s, KVM_CAP_ARM_NISV_TO_USER))
> > > +        if (kvm_vm_enable_cap(s, KVM_CAP_ARM_NISV_TO_USER, 0)) {
> > > +            warn_report("Failed to enable DABT NISV cap");
> > > +        }
> > > +
> >
> > Missing {} around the outer block.
> 
> Checkpatch didn't complain ...
> Will fix that.
> 
> >
> > As KVM_CAP_ARM_INJECT_EXT_DABT is a VM capability then I think we should
> > set cap_has_inject_ext_dabt here, like cap_has_mp_state gets set. I see
> > you've followed the pattern used for cap_has_inject_serror_esr, but that
> > looks wrong too since KVM_CAP_ARM_INJECT_SERROR_ESR is also a VM
> > capability. The way it is now we just keep setting
> > cap_has_inject_serror_esr to the same value, NR_VCPUS times.
> >
> You are totally right - I have completely missed that point! Thanks.
> 
> > >      return ret;
> > >  }
> > >
> > > @@ -598,6 +610,10 @@ int kvm_put_vcpu_events(ARMCPU *cpu)
> > >          events.exception.serror_esr = env->serror.esr;
> > >      }
> > >
> > > +    if (cap_has_inject_ext_dabt) {
> > > +        events.exception.ext_dabt_pending = env->ext_dabt_pending;
> > > +    }
> > > +
> > >      ret = kvm_vcpu_ioctl(CPU(cpu), KVM_SET_VCPU_EVENTS, &events);
> > >      if (ret) {
> > >          error_report("failed to put vcpu events");
> > > @@ -627,6 +643,8 @@ int kvm_get_vcpu_events(ARMCPU *cpu)
> > >      env->serror.has_esr = events.exception.serror_has_esr;
> > >      env->serror.esr = events.exception.serror_esr;
> > >
> > > +    env->ext_dabt_pending = events.exception.ext_dabt_pending;
> > > +
> >
> > afaict from Documentation/virt/kvm/api.txt and the KVM code you cannot
> > get this state. Therefore the above line (and extra stray blank line)
> > should be dropped.
> >
> That's true, though this is a lightweight way of resetting the vcpu state.
> We would have to do that otherwise to mark that this case has been handled
> and that the abort is no longer pending.

There's no reason to wait until get-vcpu-events time to reset this.
According to the KVM documentation (and the code) the event is
immediately delivered at vcpu_ioctl(KVM_SET_VCPU_EVENTS) time.
So I think the kvm_put_vcpu_events() patch should be

 if (env->ext_dabt_pending) {
     events.exception.ext_dabt_pending = env->ext_dabt_pending;
     env->ext_dabt_pending = 0;
 }

(env->ext_dabt_pending will only be non-zero if we have the capability)

> 
> > >      return 0;
> > >  }
> > >
> > > @@ -634,6 +652,7 @@ void kvm_arch_pre_run(CPUState *cs, struct kvm_run 
> > > *run)
> > >  {
> > >  }
> > >
> > > +
> >
> > stray blank line
> >
> > >  MemTxAttrs kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
> > >  {
> > >      ARMCPU *cpu;
> > > @@ -699,6 +718,11 @@ int kvm_arch_handle_exit(CPUState *cs, struct 
> > > kvm_run *run)
> > >              ret = EXCP_DEBUG;
> > >          } /* otherwise return to guest */
> > >          break;
> > > +    case KVM_EXIT_ARM_NISV:
> > > +        /* External DABT with no valid iss to decode */
> > > +        ret = kvm_arm_handle_dabt_nisv(cs, run->arm_nisv.esr_iss,
> > > +                                       run->arm_nisv.fault_ipa);
> > > +        break;
> > >      default:
> > >          qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
> > >                        __func__, run->exit_reason);
> > > @@ -833,3 +857,75 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
> > >  {
> > >      return (data - 32) & 0xffff;
> > >  }
> > > +
> > > +int kvm_arm_handle_dabt_nisv(CPUState *cs, uint64_t esr_iss,
> > > +                             uint64_t fault_ipa)
> > > +{
> > > +    ARMCPU *cpu = ARM_CPU(cs);
> > > +    CPUARMState *env = &cpu->env;
> > > +    uint32_t ins, ins_fetched;
> >
> > ins_fetched is a bool
> 
> Actually that's int (as per cpu_memory_rw_debug)

But that's not the way you're using it here. It you really want to
ensure the return value is converted to a bool, then you could do

 ins_fetched = !!cpu_memory_rw_debug()

> >
> > > +
> > > +    /*
> > > +     * Hacky workaround for kernels that for aarch32 guests, instead of 
> > > expected
> > > +     * external data abort, inject the IMPLEMENTATION DEFINED exception 
> > > with the
> > > +     * lock-down. This is actually handled by the guest which results in
> > > +     * re-running the faulting instruction.
> > > +     * This intends to break the vicious cycle.
> > > +     */
> >
> > This doesn't seem like the right thing to do. What happens on real
> > hardware in this case? If an OS would get into a "vicious cycle" on
> > real hardware then it should get into one on KVM too.
> >
> That's the problem. that would not happen on a real hardware.
> We might end up here due to a KVM bug (which has been fixed recently)
> when the injected
> abort is not the one expected (as of not the one that would be
> triggered by hw in this
> particular case).
> Details in : https://patchwork.kernel.org/patch/11358083/

If KVM can be fixed (and in this case already is fixed - 21aecdbd7f3a),
then doesn't it make more sense to just fix KVM, than to add a workaround
to QEMU?

> 
> > > +    if (!is_a64(env)) {
> > > +        static uint8_t setback;
> > > +
> > > +        /*
> > > +         * The state has not been synchronized yet, so if this is 
> > > re-occurrence
> > > +         * of the same abort triggered by guest, the status for pending 
> > > external
> > > +         * abort should not get cleared yet
> > > +         */
> > > +        if (unlikely(env->ext_dabt_pending)) {
> > > +            if (setback) {
> > > +                error_report("Most probably triggered kernel issue with"
> > > +                             " injecting external data abort.");
> > > +                error_printf("Giving up trying ...\n");
> > > +                /* Here is where the fun comes to an end */
> > > +                return -EINVAL;
> > > +            }
> > > +        }
> > > +        setback = env->ext_dabt_pending;
> > > +    }
> > > +
> > > +    kvm_cpu_synchronize_state(cs);
> > > +   /*
> > > +    * ISS [23:14] is invalid so there is a limited info
> > > +    * on what has just happened so the only *useful* thing that can
> > > +    * be retrieved from ISS is WnR & DFSC (though in some cases WnR
> > > +    * might be less of a value as well)
> > > +    */
> > > +
> > > +    /*
> > > +     * Get current PC before it will get updated to exception vector 
> > > entry
> > > +     */
> > > +    target_ulong ins_addr = is_a64(env) ? env->pc : env->regs[15];
> >
> > ins_addr should be declared above
> >
> > But what are we doing? pc is a guest virtual address. Oh, I see...
> >
> > > +
> > > +    /*
> > > +     * Get the faulting instruction
> > > +     * Instructions have a fixed length of 32 bits
> > > +     * and are always little-endian
> > > +     */
> > > +    ins_fetched = !cpu_memory_rw_debug(cs, ins_addr, (uint8_t *) &ins,
> > > +                                       sizeof(uint32_t), 0);
> >
> > ... we're trying to actual walk the KVM guest's page tables. That
> > seems a bit odd, and the "_debug" function name used for it makes
> > it seem even more odd.
> >
> > > +
> > > +    error_report("Data abort exception with no valid ISS generated by "
> > > +                 "guest memory access at physical address: 0x" 
> > > TARGET_FMT_lx,
> > > +                 (target_ulong)fault_ipa);
> > > +
> > > +    error_printf(ins_fetched ? "%s : 0x%" PRIx32 " (ins encoded)\n"  : 
> > > "%s\n",
> > > +                 "KVM unable to emulate faulting instruction",
> > > +                 (!ins_fetched ? 0 : ldl_le_p(&ins)));
> > > +    error_printf("Injecting a data abort exception into guest.\n");
> >
> > The guest shouldn't be able to generate three lines of errors on the host
> > whenever it wants. That's a security bug. One trace point here seems like
> > the most we should do. Or, after these three lines we should kill the
> > guest.
> >
> You have a point here, this can indeed be exploited by the malicious
> guest. Not sure if killing it is necessary here.
> I could limit the logging though that gets tricky for aborts triggered by
> user-space processes which would probably leave the guest running
> without much of an issue there.
> I can y get rid of logging the additional info and keep the single
> statement of an instruction not being emulated but that still leaves an
> open door for potential exploits...
> 
> > Actually, I don't think we should attempt to get the instruction at all.
> > We should do what the KVM documenation suggests we do when we get
> > this exit. We should either do a user selected action: one of suspend,
> > dump, restart, or inject a dabt (as is done below). The last choice hopes
> > that the guest will handle it in some graceful way, or that it'll crash
> > with all the information needed for a post-mortem crash investigation.
> >
> > And I don't think we should do the "very brave" option of trying to
> > emulate the instruction, even if we identify it as a valid MMIO address.
> > That just sounds like a huge can of worms.
> >
> > Does QEMU already have a way for users to select a
> > don't-know-how-to-handle-guest-exception behavior?
> >
> 
> The function is attempting to inject the external data abort. The rest is for
> the aftermath analysis to easy figuring out what has happened
> which will not be so easy in case of user-space process triggering
> the DABT with no valid ISS. There is no attempt of emulating the instruction
> just simply saying the guest tried to access this address with this 
> instruction
> which couldn't be handled. Might be excessive in some cases
> (like misbehaving kernel) but it also might be handy on those rare
> occasions when there is not much to analyze.
> But as per the issue you have raised above I will rework that.
> 
> Not sure if there is a mechanism to specify the 'preferred' behavior though.

I think a mechanism for a preferred behavior would be nice, because
all options make sense, depending on the use case. I'm not sure what
the right default is, though. Maybe this dabt injection, but there's
a good chance it'll just loop here. So maybe QEMU should just abort.
Aborting should be the safest default (from the host's PoV).

Thanks,
drew


> 
> BR
> 
> 
> Beata
> 
> 
> > > +    /*
> > > +     * Set pending ext dabt and trigger SET_EVENTS so that
> > > +     * KVM can inject the abort
> > > +     */
> > > +    env->ext_dabt_pending = 1;
> > > +
> > > +    return 0;
> > > +}
> >
> > Thanks,
> > drew
> >
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]