Failed to check for alert IPMISEL (and IPMISELSpaceLeft) alert mail
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Kai Groner April 20, 2020 at 6:49 PM(edited)
Former user I ran
and reset the BMC via web UI. During the reset, ipmitool produced a few different errors, sometimes I would see a non-zero exit status, but usually not. Here is the output from one reset sequence.
Here's a different one:
And another:
I think I saw a reset sequence without a non-zero status, but it's out of my scrollback buffer now.
I'll note that several "check failed" alerts were generated while running these tests, and the kernel error sequences also appeared (considerably shorter however).
Vladimir Vinogradenko April 20, 2020 at 5:52 PM
can you please try running these IPMI commands in the infinite loop to see how they crash? (ipmitool -c sel elist or ipmitool sel info)
Kai Groner April 20, 2020 at 5:23 PM
I've updated the IPMI firmware on this machine to see if the KCS channel errors go away.
If those errors are the root cause of my issue, I guess there are two things that FreeNAS could be doing that would make this better:
Report the error text from ipmitool somewhere. I'm guessing there's an I/O error, which would have been helpful to see.
If BMC flakiness is just a fact of life, maybe it should be possible to ignore it outright or require multiple consecutive failures before sending alerts. (I tried setting the IPMISELSpaceLeft alerting frequency to never, but it doesn't seem to affect these "check failed" alerts.)
I get these alerts sporadically, since upgrading to 11.3.
I'm able to run these commands from a login shell without errors.
I've tried lowering the severity of IPMI alerts as well as reducing the frequenecy to never, but it doesn't seem to make a difference.