savannah-hackers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [savannah-help-public] How to coordinate a VM reboot?


From: Bob Proulx
Subject: Re: [savannah-help-public] How to coordinate a VM reboot?
Date: Thu, 26 Dec 2013 16:16:54 -0700
User-agent: Mutt/1.5.21 (2010-09-15)

The VMs notably vcs and download were rebooted.  All is back up and
online afterward.

Things did not go completely smoothly.  Here is the post mortem of
today's operation.

No one was available who could update the FSF status location
https://pumprock.net/fsfstatus for us.  Hopefully that will get
resolved before next time as having an out of band status is useful.

The boot of vcs took twenty minutes to clear /tmp.  I had previously
captured a du listing of /tmp and discovered 1,377,775,616 bytes in
113,283 files there.  Mostly this resided in /tmp/loggerhead-cache-*
and some more in /tmp/cvs-srv12738 directories.  It appears that the
loggerhead cache is not getting pruned.  This should be looked into in
more detail.  Everything is reset fresh now but it will grow again.

To be honest I had forgotten about the /tmp disk space and didn't
figure it into the timeline.  At that point I just needed to wait out
the /tmp clearing which was taking a while since there were so many
files.

I hit a snag with the Xen captured management interface.  While
booting frontend I was looking at the grub boot screen and examining
the configuration.  Then a python backtrace appeared.  There isn't any
python in grub itself and this backtrace was coming from xen.  At that
point Xen had lost track of the frontend and could not control it.
Xen reported errors that it could not open the config file
frontend.savannah.gnu.org and then gave me an 'xm' usage dump.  I then
had no control over the VM and could not boot it.  Argh!  Note that I
only have access to xen-shell and not shell access to the dom0.

In theory there is no difference between theory and practice.  In
practice there is a difference.  In theory the xen-shell is a captured
interface and should prevent this type of breakage.  In practice I
broke it without trying.  It was a very stressful time!

Many thanks to "nully" our newest FSF admin who was in the hot seat on
site today.  She was able to rescue the configuration file and get xen
back online again.  With that was able to boot the system again.  This
excursion took us almost an hour to figure out before we got things
going again.  Therefore I decided to stop at that point, declare
victory, and withdraw to safety.  No fsck's were done.  However the
very long running vcs and download systems were rebooted which was the
major goal of the day.  That was accomplished successfully.

Savannah is back in normal service.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]