zfsd crash on device detach
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
I'd recommend to always run the latest available firmware version from the Broadcom site. And definitely check it whenever any problem arise. You don't really need HBA to have BIOS/OPROM unless you boot from it, so we recommend to disable OPROM for slots where it is not needed to speedup booting. TrueNAS includes the firmware flashing tools (sas2flash, sas3flash and storcli) to handle all LSI/Broadcom HBAs.
Thank you for the infos.
I will upgrade the HBA since I also noticed the old copright (2015) during boot-up. I bought this HBA new last year and kind of expected a recent firmware thereof.
Is there a good practice regarding BIOS/firmware-updates? I shy away from them if everything is working.
I will migrate the server to a bare-metal installation in the future.
The core file gave me such backtrace:
#0 strlen (str=0x0) at /truenas-releng/freenas/_BE/os/lib/libc/string/strlen.c:101
101 /truenas-releng/freenas/_BE/os/lib/libc/string/strlen.c: No such file or directory.
(gdb) bt
#0 strlen (str=0x0) at /truenas-releng/freenas/_BE/os/lib/libc/string/strlen.c:101
#1 0x000000080078c1b5 in std::__1::char_traits<char>::length (__s=0x0)
at /truenas-releng/freenas/_BE/os/contrib/llvm-project/libcxx/include/__string:253
#2 std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::assign (this=0x7fffffffea90, __s=0x0)
at /truenas-releng/freenas/_BE/os/contrib/llvm-project/libcxx/include/string:2401
#3 0x000000080035bd0e in std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >::operator= (
this=0x7fffffffea90, __s=0x0)
at /truenas-releng/freenas/_BE/objs/truenas-releng/freenas/_BE/os/amd64.amd64/tmp/usr/include/c++/v1/string:890
#4 DevdCtl::Event::DevPath (this=<optimized out>, path="") at /truenas-releng/freenas/_BE/os/lib/libdevdctl/event.cc:291
#5 0x0000000000216278 in ?? ()
#6 0x000000080035a2e2 in DevdCtl::Consumer::ProcessEvents (this=0x7fffffffec18)
at /truenas-releng/freenas/_BE/os/lib/libdevdctl/consumer.cc:213
I've localized it as lack of error checking in libdevdctl library in case of device being destroyed. Commit 9d52b595 adds missing check and should fix the problem.
But I think that original trigger is somewhere either in your hardware, or firmware (your HBA firmware is old and should be updated), or the fact you are running TrueNAS inside KVM, that we never recommended.
(Uploaded the .core file as private attachment, would maybe be helpful to have simple tutorial for more novice users - I used the integrated shell in the webGUI and SFTP after some search on the forums)
Can you SSH in and grab the files within /var/db/system/cores?
Following the prompt which tells me to send the Debug.
Not sure how to download/upload the core