cxgbe(4) crash on hw.cxgbe.largest_rx_cluster set
Description
Problem/Justification
Impact
relates to
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Alexander Motin October 2, 2020 at 2:17 PM
https://svnweb.freebsd.org/changeset/base/366354
BTW, setting the tunable to 4000 was probably limiting clusters to 2048 instead of proper 4096.
Stilez October 2, 2020 at 1:20 AM(edited)
Confirming that disabling that tunable solved it for me, as well. RC1 booting, thank you, Alex!
See for why that tunable's there.
The Chelsio driver (FreeBSD or windows, not sure which, or both), still has some kind of issue, if larger frames are in use. As 9k MTU is quite common on LANs including 10G~40G LANs (efficiency), and in my case most devices use large MTU (including occasionally connected like laptops even!) (mixed Chelsio 10G and Intel 1G NIC at client side), the tunable acts as a ceiling to prevent issues with Chelsio driver on LANs where some clients may be set outside FreeBSD control to use MTU=9k, due to the Chelsio large MTU bug.........
I forget the reason but the workaround for Chelsio MTU bug issues was instructed to set that tunable, as well as/instead of/rather than just MTU <= 4000 in the interface config.
Alexander Motin October 2, 2020 at 12:56 AM
I've reported the problem upstream. Meanwhile lets hope that removal of the tunable will fix the issue, at least neither I can't reproduce it nor I've seen other reports.
Alexander Motin October 2, 2020 at 12:48 AM
I did reproduce the panic with the tunable set. Please remove it and be fine for now. I suspect there may be more in the cause of panic than the tunable, it may be a trigger, so I'll report the issue to Chelsio.
Alexander Motin October 1, 2020 at 8:22 PM
Could you please remove the tunable and try updating again?
Upgraded 12 beta 2.1 -> RC1, and system promptly unable to boot.
It hits a kernel panic at the same point each time, although slightly different messages. Seems to perhaps be related to the cxgbe (Chelsio T540-CR) driver, according to panic dump?
I tried rebooting several times, it's consistent.
I also tried single user+verbose modes - it rebooted abruptly midway through boot without any messaages or warning.
Finally I had to boot via the beta 2.1 boot environment, and it had no problem at all with that.
Attached files:
4 x reboot serial console captures
1 x debug file via beta 2.1 but includes apparently the latest textdumps from today, under RC1