qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally h


From: Matthew Ogilvie
Subject: Re: [Qemu-devel] [PATCH KVM v2 1/4] KVM: fix i8254 IRQ0 to be normally high
Date: Thu, 10 Jan 2013 23:40:07 -0700
User-agent: Mutt/1.5.21 (2010-09-15)

On Tue, Jan 08, 2013 at 09:31:19AM +0200, Gleb Natapov wrote:
> On Mon, Jan 07, 2013 at 06:17:22PM -0600, address@hidden wrote:
> > On Mon, 7 Jan 2013 11:39:18 +0200, Gleb Natapov <address@hidden> wrote:
> > > On Wed, Dec 26, 2012 at 10:39:53PM -0700, Matthew Ogilvie wrote:
> > >> Reading the spec, it is clear that most modes normally leave the IRQ
> > >> output line high, and only pulse it low to generate a leading edge.
> > >> Especially the most commonly used mode 2.
> > >> 
> > >> The KVM i8254 model does not try to emulate the duration of the pulse at
> > >> all, so just swap the high/low settings it to leave it high most of
> > >> the time.
> > >> 
> > >> This fix is a prerequisite to improving the i8259 model to handle
> > >> the trailing edge of an interupt request as indicated in its spec:
> > >> If it gets a trailing edge of an IRQ line before it starts to service
> > >> the interrupt, the request should be canceled.
> > >> 
> > >> See http://bochs.sourceforge.net/techspec/intel-82c54-timer.pdf.gz
> > >> or search the net for 23124406.pdf.
> > >> 
> > >> Risks:
> > >> 
> > >> There is a risk that migrating a running guest between versions
> > >> with and without this patch will lose or gain a single timer
> > >> interrupt during the migration process.  The only case where
> > > Can you elaborate on how exactly this can happen? Do not see it.
> > > 
> > 
> > KVM 8254: In the corrected model, when the count expires, the model
> > briefly pulses output low and then high again, with the low to high
> > transition being what triggers the interrupt.  In the old model,
> > when the count expires, the model expects the output line
> > to already be low, and briefly pulses it high (triggering the
> > interrupt) and then low again.  But if the line was already
> > high (because it migrated from the corrected model),
> > this won't generate a new leading edge (low to high) and won't
> > trigger a new interrupt (the first post-back-migration pulse turns
> > into a simple trailing edge instead of a pulse).
> > 
> > Unless there is something I'm missing?
> > 
> No, I missed that pic->last_irr/ioapic->irr  will be migrated as 1. But
> this means that next interrupt after migration from new to old will
> always be lost.  What about clearing pit bit from last_irr/irr before
> migration? Should not affect new->new migration and should fix new->old
> one. The only problem is that we may need to consult irq routing table
> to know how pit is connected to ioapic.

We should not clear both last_irr and irr.  That
cancels the interrupt early if the CPU hasn't started servicing
it yet:  If the guest CPU has interrupts disabled
and hasn't begun to service the interrupt, a new->new migration could
lose the interrupt much earlier in the countdown cycle than it otherwise
might be lost.

Potentially we could clear the last_irr bit only.  This effectively
makes migration behave like the old unfixed code.  But if this is
considered acceptable, I would be more inclined to not fix IRQ0 at all,
rather than make IRQ0 work subtly differently normally vs during
migration.  One of my earlier patch series (qemu v7 dated Nov 25)
attempts to use basically this strategy for the qemu model, at
least in the short term (and then potentially fix it properly
in the longer term), although the details of series might
be tweaked.

Or the minimal-risk strategy is to ONLY fix the cascade IRQ2's
trailing edge [the original i825* problem that spawned this whole
thing], leaving all other IRQs as-is.

> 
> Still do not see how can we gain one interrupt.

In most cases I don't think we get an extra interrupt from
a direct fix.  But some kinds of attempts to work around missing
interrupts during migration can cause cause extra interrupts in other
cases.  In the old qemu model (but perhaps not kvm), the mode 2 leading
edge occurs one clock tick earlier than it ought to.  So if you
try to be tricky with adjusting levels during migration, you
may introduce possible cases where it gets an interrupt in
the old model, then migrates and gets another interrupt one tick
later in the new model...

Also, it occurs to me that maybe there might be an off-by-one issue in
the kvm model of mode 2 that ought to be fixed as well?  Although the  
zero length pulse in kvm suggests that one-off issues in counters do  
not matter.  In the qemu model, the leading edge of OUT in mode 2
moves by one tick...

> 
> > The qemu 8254 model actually models each edge at independent
> > clock ticks instead of combining both into a very brief pulse at one time.
> > I've found it handy to draw out old and new timing diagrams on paper
> > (for each mode), and then carefully think about what happens with respect
> > to levels and edges when you transition back and forth between old and
> > new models at various points in the timing cycle.  (Note I've spent more
> > time examining the qemu models rather than the kvm models.)
> > 
> Yes, drawing it definitely helps :)
> 
> > >> this is likely to be serious is probably losing a single-shot (mode 4)
> > >> interrupt, but if my understanding of how things work is good, then
> > >> that should only be possible if a whole slew of conditions are
> > >> all met:
> > >> 
> > >>  1. The guest is configured to run in a "tickless" mode (like
> > >>     modern Linux).
> > >>  2. The guest is for some reason still using the i8254 rather
> > >>     than something more modern like an HPET.  (The combination
> > >>     of 1 and 2 should be rare.)
> > > This is not so rare. For performance reason it is better to not have
> > > HPET at all.  In fact -no-hpet is how I would advice anyone to run qemu.
> > 
> > In a later email you mention that Linux prefers a timer in the APIC.
> > I don't know much about the APIC (advanced interrupt controller), and
> > wasn't
> > even aware had it's own timer.
> > 
> > The big question is if we can safely just fix the i825* models, or if
> > we need something more subtle to avoid breaking commonly used guests
> > like modern Linux (support both corrected and old models,
> > or only fix IRQ2 instead of all IRQs, or similar subtlety).
> Migration may happen while guest is running firmaware. Who knows what
> those are doing. If the fix is as easy as I described above we should go
> for it.
> 
> > 
> > > 
> > >>  3. The migration is going from a fixed version back to the
> > >>     old version.  (Not sure how common this is, but it should
> > >>     be rarer than migrating from old to new.)
> > >>  4. There are not going to be any "timely" events/interrupts
> > >>     (keyboard, network, process sleeps, etc) that cause the guest
> > >>     to reset the PIT mode 4 one-shot counter "soon enough".
> > >> 
> > >> This combination should be rare enough that more complicated
> > >> solutions are not worth the effort.
> > >> 
> > >> Signed-off-by: Matthew Ogilvie <address@hidden>
> > >> ---
> > >>  arch/x86/kvm/i8254.c | 6 +++++-
> > >>  1 file changed, 5 insertions(+), 1 deletion(-)
> > >> 
> > >> diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
> > >> index c1d30b2..cd4ec60 100644
> > >> --- a/arch/x86/kvm/i8254.c
> > >> +++ b/arch/x86/kvm/i8254.c
> > >> @@ -290,8 +290,12 @@ static void pit_do_work(struct kthread_work *work)
> > >>          }
> > >>          spin_unlock(&ps->inject_lock);
> > >>          if (inject) {
> > >> -                kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > >> +                /* Clear previous interrupt, then create a rising
> > >> +                 * edge to request another interupt, and leave it at
> > >> +                 * level=1 until time to inject another one.
> > >> +                 */
> > >>                  kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0);
> > >> +                kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1);
> > >>  
> > >>                  /*
> > >>                   * Provides NMI watchdog support via Virtual Wire mode.
> > >> -- 
> > >> 1.7.10.2.484.gcd07cc5
> > > 
> > > --
> > >                   Gleb.
> 
> --
>                       Gleb.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]