[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freeipmi-users] Decoding ram errors on supermicro
From: |
Albert Chu |
Subject: |
Re: [Freeipmi-users] Decoding ram errors on supermicro |
Date: |
Wed, 05 Dec 2018 10:48:05 -0800 |
On Wed, 2018-12-05 at 03:38 +0100, Tom Hetmer wrote:
> Alright, added to github.
>
> Here's the output from bmc-info for that particular board.
> Product ID : 2201
> [Mon Dec 3 12:08:13 2018] DMI: Supermicro X10DRH LN4/X10DRH-CLN4,
> BIOS 2.0 01/30/2016
>
>
> I guess you'll support it based on the product ID?
Yes! Thanks. I'll put these in the ticket too.
Al
> So if there are any other (X10) boards with different product ID but
> the same SEL output I'll have to send it again, correct?
>
>
> I have all kinds of numbers on other machines,
> ie.
> X10DRW-E => 2148
> X11SPi-TF => 2369
> X10SLL-F => 2049
> X10DRL-i => 2097
> X11DDW-NT => 2407
> X10SLH-F/X10SLM+-F/X10SLH-F/X10SLM+-F => 2051
>
>
> and so on.. I think we have at least 1/4 of the boards they
> manufacture.
> X9s are under 2000, X11 seems to be 23xx. But that's maybe too much
> reverse engineering to you ;)
> I can try to ping them and ask about details but I got no offical
> contact with Supermicro.
>
>
> Best,
> Tom Hetmer
>
>
> CDN77 Operations
> address@hidden / +44 (0) 20 3514 2399 / www.cdn77.com
>
> ----- Původní zpráva -----
> > Odesilatel: "Albert Chu" <address@hidden>
> > Příjemce: "Tom Hetmer" <address@hidden>, address@hidden
> > .org
> > Datum: 12/04/18 19:40
> > Předmět: Re: [Freeipmi-users] Decoding ram errors on supermicro
> >
> > On Tue, 2018-12-04 at 11:39 +0100, Tom Hetmer wrote:
> > > Sure. It seems there's a similar ticket
> > > already: https://github.com/chu11/freeipmi-mirror/issues/19
> >
> > Ahh, if you could, update it with info from ipmitool / ipmiutil. I
> > was
> > reluctant to add support based on reverse engineering. But if
> > other
> > tools have "official" interpretations from Supermicro, I'm more
> > confident in the addition.
> >
> > > Yep, that's the code. ipmitool and a few others decode it too.
> > >
> > >
> > > We have a *lot* of Supermicros so I can help with testing if
> > > needed -
> > > but we don't get that much CRC errors though :)
> >
> > The one thing I'll need is product ID numbers (you can get from
> > bmc-
> > info) and the name of the product. This goes into the
> > documentation
> > and some of the code.
> >
> > Thanks,
> >
> > Al
> >
> > > So I guess we'd have to wait till one pops up. But I hope the
> > > 'ver 2'
> > > method from ipmiutil works fine.
> > > We used ipmitool in our monitoring before and it was accurate but
> > > slow, that's why I rewrote it all to use freeipmi.
> > >
> > >
> > > Thanks!
> > >
> > >
> > > Best,
> > > Tom Hetmer
> > >
> > >
> > > CDN77 Operations
> > > address@hidden / +44 (0) 20 3514 2399 / www.cdn77.com
> > >
> > > ----- Původní zpráva -----
> > > > Odesilatel: "Albert Chu" <address@hidden>
> > > > Příjemce: "Tom Hetmer" <address@hidden>, freeipmi-users
> > > > @gnu
> > > > .org
> > > > Datum: 12/03/18 21:06
> > > > Předmět: Re: [Freeipmi-users] Decoding ram errors on
> > > > supermicro
> > > >
> > > > Hi Tom,
> > > >
> > > > Thanks for the pointer to ipmiutil's code. I assume you found
> > > > this
> > > > comment:
> > > >
> > > > ---
> > > > /* ver 2 method: 2A 80 = P1_DIMMB1
> > > > */
> > > >
> > > >
> > > > /* SuperMicro
> > > > says:
> > > >
> > > >
> > > > * pair: %c (data2 >> 4) + 0x40 + (data3 & 0x3) * 3,
> > > > (='B')
> > > >
> > > >
> > > > * dimm: %c (data2 & 0xf) +
> > > > 0x27,
> > > >
> > > >
> > > > * cpu: %x (data3 & 0x03) +
> > > > 1);
> > > >
> > > >
> > > > */
> > > > ---
> > > >
> > > > I can definitely add it to my todo list.
> > > >
> > > > Would you mind writing up an issue on github here?
> > > >
> > > > https://github.com/chu11/freeipmi-mirror
> > > >
> > > > Al
> > > >
> > > > On Mon, 2018-12-03 at 17:55 +0100, Tom Hetmer wrote:
> > > > > Hi,
> > > > >
> > > > > it'd be good if freeipmi supported decoding the supermicro
> > > > > ECC
> > > > > errors.
> > > > >
> > > > >
> > > > > Manufacturer: Supermicro
> > > > > Product Name: X10DRH LN4
> > > > > eg.
> > > > > freeipmi
> > > > > 1,Dec-01-2018,06:37:53,Sensor
> > > > > #0,Memory,Critical,Uncorrectable
> > > > > memory
> > > > > error ; OEM Event Data2 code = 3Ah ; OEM Event Data3 code =
> > > > > 81h
> > > > >
> > > > >
> > > > > web interface
> > > > > 1 | 12/01/2018 | 06:37:53 | Memory | Uncorrectable ECC
> > > > > (@DIMMG1(CPU2)) | Asserted
> > > > >
> > > > >
> > > > > something like this worked for me (stolen from ipmiutil)
> > > > >
> > > > >
> > > > > $cpu = ($data3 & 0x03) + 1;
> > > > >
> > > > >
> > > > > $NPAIRS = 26;
> > > > > $rgpairs = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
> > > > >
> > > > >
> > > > > $bdata = "0x".$data2.$data3;
> > > > > $bdata = hexdec($bdata);
> > > > > $pair = (($bdata & 0xF0) >> 4) - 1;
> > > > >
> > > > >
> > > > > if ($pair < 0) $pair = 0;
> > > > > if ($pair > $NPAIRS) $pair = $NPAIRS - 1;
> > > > >
> > > > >
> > > > > $pair = $rgpairs[$pair - 1];
> > > > >
> > > > >
> > > > > $dimm = $bdata & 0x0F;
> > > > >
> > > > >
> > > > > $dimm may be incorrect as the original code decrements 9, but
> > > > > on
> > > > > that
> > > > > board it was wrong so i changed it to get the right result -
> > > > > we'll
> > > > > see if it keeps getting the right values.
> > > > >
> > > > > Best,
> > > > > Tom Hetmer
> > > > >
> > > > >
> > > > > CDN77 Operations
> > > > > address@hidden / +44 (0) 20 3514 2399 / www.cdn77.com
> > > > >
> > > > > _______________________________________________
> > > > > Freeipmi-users mailing list
> > > > > address@hidden
> > > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users
> > > >
> > > > --
> > > > Albert Chu
> > > > address@hidden
> > > > Computer Scientist
> > > > High Performance Systems Division
> > > > Lawrence Livermore National Laboratory
> > >
> > > _______________________________________________
> > > Freeipmi-users mailing list
> > > address@hidden
> > > https://lists.gnu.org/mailman/listinfo/freeipmi-users
> >
> > --
> > Albert Chu
> > address@hidden
> > Computer Scientist
> > High Performance Systems Division
> > Lawrence Livermore National Laboratory
>
> _______________________________________________
> Freeipmi-users mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/freeipmi-users
--
Albert Chu
address@hidden
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
- [Freeipmi-users] Decoding ram errors on supermicro, Tom Hetmer, 2018/12/03
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Albert Chu, 2018/12/03
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Tom Hetmer, 2018/12/04
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Albert Chu, 2018/12/04
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Tom Hetmer, 2018/12/04
- Re: [Freeipmi-users] Decoding ram errors on supermicro,
Albert Chu <=
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Al Chu, 2018/12/10
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Tom Hetmer, 2018/12/11
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Tom Hetmer, 2018/12/11
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Tom Hetmer, 2018/12/11
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Albert Chu, 2018/12/11
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Tom Hetmer, 2018/12/12
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Albert Chu, 2018/12/12
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Albert Chu, 2018/12/12
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Tom Hetmer, 2018/12/13
- Re: [Freeipmi-users] Decoding ram errors on supermicro, Albert Chu, 2018/12/13