qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] i386/kvm: Set return value after handling KVM_EXIT_HYPERCALL


From: Sean Christopherson
Subject: Re: [PATCH] i386/kvm: Set return value after handling KVM_EXIT_HYPERCALL
Date: Thu, 12 Dec 2024 14:11:16 -0800

On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> On 12/12/24 20:13, Sean Christopherson wrote:
> > On Thu, Dec 12, 2024, Paolo Bonzini wrote:
> > > If ret is less than zero, will stop the VM anyway as
> > > RUN_STATE_INTERNAL_ERROR.
> > > 
> > > If this has to be fixed in QEMU, I think there's no need to set anything
> > > if ret != 0; also because kvm_convert_memory() returns -1 on error and
> > > that's not how the error would be passed to the guest.
> > > 
> > > However, I think the right fix should simply be this in KVM:
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 83fe0a78146f..e2118ba93ef6 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -10066,6 +10066,7 @@ unsigned long __kvm_emulate_hypercall(struct 
> > > kvm_vcpu *vcpu, unsigned long nr,
> > >                   }
> > >                   vcpu->run->exit_reason        = KVM_EXIT_HYPERCALL;
> > > +         vcpu->run->ret                = 0;
> > 
> >             vcpu->run->hypercall.ret
> > 
> > >                   vcpu->run->hypercall.nr       = KVM_HC_MAP_GPA_RANGE;
> > >                   vcpu->run->hypercall.args[0]  = gpa;
> > >                   vcpu->run->hypercall.args[1]  = npages;
> > > 
> > > While there is arguably a change in behavior of the kernel both with
> > > the patches in kvm-coco-queue and with the above one, _in practice_
> > > the above change is one that userspace will not notice.
> > 
> > I agree that KVM should initialize "ret", but I don't think '0' is the right
> > value.  KVM shouldn't assume userspace will successfully handle the 
> > hypercall.
> > What happens if KVM sets vcpu->run->hypercall.ret to a non-zero value, e.g. 
> > -KVM_ENOSYS?
> 
> Unfortunately QEMU is never writing vcpu->run->hypercall.ret, so the guest
> sees -KVM_ENOSYS; this is basically the same bug that Binbin is fixing, just
> with a different value passed to the guest.
> 
> In other words, the above one-liner is pulling the "don't break userspace"
> card.

But how is anything breaking userspace?  QEMU needs to opt-in to intercepting
KVM_HC_MAP_GPA_RANGE, and this has been KVM's behavior since commit 0dbb11230437
("KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall").

Ah, "ret" happens to be deep in the union and KVM zero allocates vcpu->run, so
QEMU gets lucky and "ret" happens to be zero because no other non-fatal 
userspace
exit on x86 happens to need as many bytes.  Hilarious.

FWIW, if TDX marshalls hypercall state into KVM's "normal" registers, then KVM's
shenanigans with vcpu->run->hypercall.ret might go away?  Though regardless of
what happens on that front, I think it makes to explicitly initialize "ret" to
*something*.

I checked our VMM, and it does the right thing, so I don't have any objection
to explicitly zeroing "ret".  Though it needs a comment explaining that it's a
terrible hack for broken userspace ;-)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]