freeipmi-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freeipmi-devel] Re: Another FreeIPMI beta w/ BMC watchdog workaroun


From: Albert Chu
Subject: Re: [Freeipmi-devel] Re: Another FreeIPMI beta w/ BMC watchdog workaround for Sun machines
Date: Wed, 07 Jul 2010 09:43:45 -0700

Hey Frank,

That is really random stuff.  I am usually always willing to point the
finger to myself and say "bug in FreeIPMI", but this is way too random.
The "data not available" errors in the log indicate that the packets
returned from the BMC are actually malformed.

On the flip side, I've personally never tested on Solaris.  So who knows
if the "/dev/bmc" was really programmed correctly.  I may have screwed
up.

Is this only happening on one motherboard?  I have this feeling that
maybe the board is just busted.

Al

On Tue, 2010-07-06 at 23:52 -0700, Frank Steiner wrote:
> Albert Chu wrote
> 
> > Hey Frank,
> > 
> > This is indeed very strange.  I assume the reboots are because the timer
> > eventually times out, perhaps because the resets are no longer working
> > (lets say the BMC goes out to lunch).
> 
> I don't think so because in the tests I repeat the resets every second
> and I always see if they succeed or not. Many of them are rejected with
> some kind of error messages, but it never happens that all fail for more
> than one minute.
> 
> However, when I loop "bmc-watchdog -g" I get the strangest results with
> all fields showing complete nonsense, like 
> Initial Countdown:      6553 sec
> Present Countdown:      0 sec
> 
> and a second later
> 
> Initial Countdown:      900 sec
> Present Countdown:      24513 sec
> 
> and so on. Also the action field etc. change their values. If the timer 
> would just run down, the host would reset and not power-off. So I guess
> that the ILOM is just that buggy that it can get confused by polling
> or resetting it :-(
> 
> > Does the bmc-watchdog log say anything interesting?  Normally
> > it's /var/log/freeipmi/bmc-watchdog.log.
> 
> It says a lot, but nothing different just before shutting down that it
> hadn't showed before. E.g.:
> 
> [Jul 05 08:38:08]: _set_watchdog_timer_cmd: fill_cmd_set_watchdog_timer: 
> Invalid argument
> [Jul 05 08:38:18]: Get Cmd: ipmi_kcs_cmd: driver timeout
> [Jul 05 08:38:22]: Get Cmd: cmd error: 2h
> [Jul 05 08:38:38]: _get_watchdog_timer_cmd: fiid_obj_get: 'timeout_action': 
> data not available
> [Jul 05 08:38:38]: _set_watchdog_timer_cmd: fill_cmd_set_watchdog_timer: 
> Invalid argument
> [Jul 05 08:38:44]: Set Cmd: ipmi_kcs_cmd: driver timeout
> [Jul 05 08:38:50]: Set Cmd: ipmi_kcs_cmd: internal IPMI error
> [Jul 05 08:39:01]: Set Cmd: ipmi_kcs_cmd: internal IPMI error
> [Jul 05 08:39:23]: _get_watchdog_timer_cmd: fiid_obj_get: 'timeout_action': 
> data not available
> [Jul 05 08:39:27]: _get_watchdog_timer_cmd: fiid_obj_get: 
> 'initial_countdown_value': data not available
> [Jul 05 08:39:35]: _get_watchdog_timer_cmd: fiid_obj_get: 
> 'initial_countdown_value': data not available
> [Jul 05 08:39:51]: Get Cmd: cmd error: 80h
> [Jul 05 08:39:52]: Set Cmd: ipmi_kcs_cmd: internal IPMI error
> [Jul 05 08:40:21]: Get Cmd: ipmi_kcs_cmd: driver timeout
> 
> 
> Strange enough, the watchdog reacts a lot quicker and more stable when
> I poll it through the network interface by "ipmitool ... bmc watchdog reset"
> or "get".
> It immediately responds, always with correct values, and never shuts down.
> 
> Maybe that's because I don't have any special driver loaded on Linux?
> The sun driver is not available for Linux as far as I understood, so
> I'm just using "bmc-watchdog -g" without any drivers.
> 
> cu,
> Frank
> 
-- 
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory




reply via email to

[Prev in Thread] Current Thread [Next in Thread]