qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] CentOS 5.x intermittently fails to boot on QEMU 1.7.0


From: Alex Bligh
Subject: Re: [Qemu-devel] CentOS 5.x intermittently fails to boot on QEMU 1.7.0
Date: Fri, 21 Feb 2014 13:27:23 +0000

On 21 Feb 2014, at 04:34, Matt Lupfer wrote:

> A git bisect points to this commit as the culprit:
> 
> b1bbfe7 aio / timers: On timer modification, qemu_notify or aio_notify
> 
> which was part of the Aug 2013 timer rewrite.  Reverting this hunk in
> particular makes the issue go away:
> 
> > @@ -522,9 +531,7 @@ void qemu_mod_timer_ns(QEMUTimer *ts, int64_t 
> > expire_time)
> >          }
> >          /* Interrupt execution to force deadline recalculation.  */
> >          qemu_clock_warp(ts->timer_list->clock);
> > -        if (use_icount) {
> > -            timerlist_notify(ts->timer_list);
> > -        }
> > +        timerlist_notify(ts->timer_list);
> >      }
> >  }
> 
> (Note this was later refactored into timerlist_rearm() in 1.7.0, so I
> mean that I modified timerlist_rearm() in 1.7.0 to read as that hunk
> did before the b1bbfe7 commit.)
> 
> This doesn't appear to be a solution, because with the timer rewrite, QEMU
> moves its periodic (1 ms) qemu_notify_event() call to break out of
> the main event loop from a SIGALRM handler to the rearm of a QEMU timer.
> Presumably QEMU is counting on these generic callbacks.

This is somewhat bizarre as the code you are reverting causes the main loop
to be broken out of *more*.

It's also happening only when someone calls qemu_mod_timer_ns. I'm
not sure what precisely the kernel is doing there, but perhaps it
is modifying a timer repeatedly and checking it fires within a given
time?

> It appears that in QEMU 1.7.0, QEMU/KVM doesn't inject timer interrupts, or
> alternatively the guest doesn't handle them, quickly enough to pass
> the timer check in the guest kernel reliably.

Yes that would suggest a latency type thing. The other thing that may
have happened is that the work done is being reprioritised, so rather
than respond to timer events immediately it's off doing some disk I/O
or similar, though frankly that's hard to understand when the kernel
is booting.

> 
> I've found that if I suppress the first 20ms of calls to timerlist_notify()
> in timerlist_rearm() by timers on the QEMU_CLOCK_VIRTUAL, the system is
> able to boot successfully and remains stable.  Not calling
> qemu_notify_event() on the first 20 ticks of QEMU_CLOCK_VIRTUAL seems to
> alter the timings enough to produce a reliable result.

Right but this isn't on the clock ticks, it's on a timer mod.

>  I tried this after
> realizing that the guest kernel enables the HPET, which enables the QEMU
> virtual clock, immediately before the guest timer check occurs.  I also
> observed that the kernel boots fine with the "nohpet" parameter, and
> I suspected that this could be a source of resource contention.
> 
> Finally, the QEMU options to disable KVM PIT IRQ reinjection and to disable
> the kvm kernel irqchip altogether result in less frequent panics, but the
> guest still panics within 100 boots.
> 
> Thanks for any assistance you can provide.

-- 
Alex Bligh







reply via email to

[Prev in Thread] Current Thread [Next in Thread]