The server has run various versions of freenas since Feb 16. No issues, system has been stable. Recently upgraded to FN11.2 U2.1 and experienced two instances where after a few hours of me reading data via SMB at 6-8Gbit/s system becomes unresponsive.
System has been stable up until the 11.2 U2 upgrade.I have had the exact same issue on another of my servers (almost identical config). Rolling back to 11.1 resolved this issue on that system.
I plan to leave this system on 11.2U2 to help diagnose the issue here - it is non-production, but has data on the array ready to become production. Im happy to provide any logs etc to help with this.
By unresponsive I mean:
- Netdata page freezes, no updates. 'chart not found on url' error if I refresh the page.
- SMB connections are dropped on clients.
- SSH connections can be made, but freeze after 'Welcome to freenas'. No prompt as such, can type in session but nothing actually happens. Any existing SSH connections made before issue also freeze when attempting to read from disk? ('ls' returned results, ie worked, 'smbstatus' froze' )
- connecting to the web admin interface fails.
- system responds to ping
This has just happened again:
1. I logged into the console, I saw some errors - screenshot attached.
I was able to run some commands at the console (ls. ifconfig and dmesg. When I tried 'zfs list' the console became unresponsive.)
2. I performed a reset via IPMI, when booted up, the Chelsio network adapter comes up with its ports in status: no carrier. doing a netif stop start lets me ping out, but obviously nothing else works.
When booting after the reset I see the errors:
Error1: Occurs more than 30 times, fills the screen repeatedly.
@Error: attempt to write a readonly database
[: : bad number@
To help describe where this occurs during boot - the next message after this error is: middlewared: loading completed
Error2: When importing pools, occurs more than 30times, fills the screen - not the same error. Here is a sample:
@condensing: txg 230668, msp  0xfffff802b977dc00, vdev id 0, spa C3-J0-V1@
Screenshot image 'import pools' attached. C3-J0-V1 etc are the pool names.
3. I performed a another reboot - network comes up, everything in working order.
Supermicro X10DRW Motherboard
CPU 2x Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz (2400.05-MHz K8-class CPU)
Memory: 256GB EEC registered
Disks: 2x SATA DOMs for boot vol. 90bay jbod (Supermicro SC946ED) attached via SAS3 (SAS3008 - LSI9300-8e). Multipathed.
Networking: Chelsio T520-CR. Ports aggregated. VLAN interface configured.
This issue has occurred twice on this system, and twice on another system that has since been downgraded. Downgrading resolved this issue on the other system. Im happy to wipe this system and install from fresh - but I thought the devs might want to know why an upgrade causes these crashes. I wouldn't have raised this but for the fact this has occurred on more than one server.