freenas.local had an unscheduled system reboot.
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
I've finally looked through the kernel dumps in the provided debug, and in all cases I see the system is completely idle when NMIs were fired. It may mean two things: wither system could not handle some interrupts and was too idle, or this watchdog is broken and does not reset when told to. It is good to hear that your systems is OK now, because I am closing this ticket as unable to reproduce.
System is up and running without a unscheduled reboot
uptime is 10 days, 10:23
but following component s are removed
Intel X520-DA2
HYPER M.2 X16 CARD with 1 500gb Samsung NVME for testing
2x 8 TB Seagate exos 7e8
we are waiting for a Intel Xeon E5-2623 V3 delivery, after that we will put back all the devices and have anther test run.
Temporary replaced the Processor with Intel Xeon 2680 V3, and fresh Install of FreeNAS on two Mirror SSD's.
Also removed
Intel X520-DA2
HYPER M.2 X16 CARD with 1 500gb Samsung NVME for testing
2x(TB Seagate exos 7e8
Setting
Watchdog timer is OFF is BIOS
WATCHDOG JWD1 jumper to pin 2-3 to output NMI
Watchdogd service is running in FreeNAS
HI @Alexander Motin .
First i run the Memtest86 for 24 hours and with Errors
Run Breakin for about 40 minutes and no errors. Processor Temp were high but still no errors.
Update the Supermicro X10DRi bios to latest version
BIOS Version: 3.2a |
|
BIOS Build Time: 05/14/2020 |
Now
Watchdog timer is OFF is BIOS
WATCHDOG JWD1 jumper to pin 2-3 to output NMI
watchdogd service is running in FreeNAS
And now while ago again i got another unscheduled reboot.
This time a crash dump was generated and i am attaching the
, as i am not sure what to look and find following interesting bits.
1:
pid ppid pgrp uid state wmesg wchan cmd
1174 1 1174 0 Ss nanslp 0xffffffff82167f50 watchdogd
2:
------------------------------------------------------------------------------
Tracing command watchdogd pid 1174 tid 101559 td 0xfffff804c7ae2620
sched_switch() at sched_switch+0x88e/frame 0xfffffe202164f6f0
mi_switch() at mi_switch+0x181/frame 0xfffffe202164f720
sleepq_switch() at sleepq_switch+0x115/frame 0xfffffe202164f760
sleepq_catch_signals() at sleepq_catch_signals+0x3b6/frame 0xfffffe202164f7d0
sleepq_timedwait_sig() at sleepq_timedwait_sig+0x14/frame 0xfffffe202164f810
_sleep() at _sleep+0x33d/frame 0xfffffe202164f8c0
kern_clock_nanosleep() at kern_clock_nanosleep+0x1b6/frame 0xfffffe202164f940
sys_nanosleep() at sys_nanosleep+0x5f/frame 0xfffffe202164f980
amd64_syscall() at amd64_syscall+0x792/frame 0xfffffe202164fab0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe202164fab0
— syscall (240, FreeBSD ELF64, sys_nanosleep), rip = 0x800b1fbca, rsp = 0x7fffffffeb28, rbp = 0x7fffffffeb70 —
----------------------------------------------------------------------------------
3:
root watchdogd 1174 3 dgram -> /var/run/logpriv
4.
Aug 14 12:30:31 freenas pcib0: _OSC returned error 0x10
Aug 14 12:30:31 freenas pcib1: _OSC returned error 0x10
5.
[2020/08/14 12:30:17] (ERROR) middlewared.set_sysctl():407 - Failed to set sysctl '<module 'sysctl' from '/usr/local/lib/python3.7/site-packages/sysctl/_init_.py'>' to '': sysctl: unknown oid 'kern.cam.ctl.ha_peer'
i hope these dump have any thing useful , otherwise i will stop the watchdogd service in FreeNAS and wait for the freez.
Thanks @Alexander Motin ,
i will look how to use the IPMI tool meanwhile i recorded the whole session and can see that some thing is shown on the console before it is rebooted.
here is the video link
timeline
below system message 0.00 to 0:20
freeNAS boot text 2:00 to 3:30
manual shutdown 3:45 to 5:30
and text
----------------------------
Enter an option from 1-11: NMI/cpu5 ... going to debugger
[thread pid 11 tid 100000]
Stopped at acpi_cpu_idle_mwait+0x7c: testq %rsi, %rsi
db:0:kdb.enter.default>write cn_mute 1
cn_mute 0 + 0x1
db:0:kbd.enter.default> reset
cpu_reset: Restarting BSP
cpu_reset:Failed to restart BBSP
-------------------------------
and then system rebooted
------------------------------
the other issues i had from earlier is the SHUTDOWN
whenever i shut down the system is always paused after video from 3:45 to 5:30
GEOM_MIRROR: device destroyed
Link down
usbhub down
and after that it just keep there unless i powerdown the server from the power button.
Hi,
i have a freenas (11.3.U3.2) system which i am testing to go into production, but it is having unscheduled system reboot at least twice a week.
it has LSI SAS 3008 HBA, but what i remember the issues is fixed in the latest FreeNAS-11.3-U3.2
i attaching the logs as well.
let me know if more information is needed.