Details
-
Type:
Bug
-
Status: Done (View Workflow)
-
Priority:
Low
-
Resolution: Complete
-
Affects Version/s: 11.2-U3
-
Fix Version/s: 11.2-U4
-
Component/s: OS
-
Epic Link:
Description
- Chassis: Supermicro AS-1113S-WN10RT
- Mainboard: H11SSW-NT
- Disk drives: 6x Intel SSDPE2KX010T8
With FreeNAS 11.2-U3 as soon as there are more than 4 of these drives in the system any moderate write load on the drives leads to errors like this:
Apr 12 13:42:16 freenas01 nvme6: aborting outstanding i/o
Apr 12 13:42:16 freenas01 nvme6: WRITE sqid:1 cid:117 nsid:1 lba:981825104 len:176
Apr 12 13:42:16 freenas01 nvme6: ABORTED - BY REQUEST (00/07) sqid:1 cid:117 cdw0:0
Apr 12 13:42:49 freenas01 nvme6: resetting controller
Apr 12 13:42:50 freenas01 nvme6: aborting outstanding i/o
Apr 12 13:42:50 freenas01 nvme6: WRITE sqid:1 cid:127 nsid:1 lba:984107936 len:96
Apr 12 13:42:50 freenas01 nvme6: ABORTED - BY REQUEST (00/07) sqid:1 cid:127 cdw0:0
Apr 12 13:43:35 freenas01 nvme6: resetting controller
In a discussion on freebsd-stable we came to suspect that the NVMe driver in FreeBSD 11 misses completion interrupts issued by the device when finishing a task and then runs into timeouts.
This leads to the system becoming unresponsive.
Tests with plain FreeBSD without FreeNAS show that 11-STABLE does exhibit the problem while 12-STABLE doesn't.
All hardware components have the latest BIOS/firmware as provided by the vendor.
There have been substantial changes in the NVMe subsystem in FreeBSD >=12, initially targeting endianess problems on e.g. Sparc64, but some code to specifically deal with missed interrupts was added to nvme_timeout() in nvme_qpair.c - with a 12-STABLE kernel my system loggs a "Missing interrupt" every half an hour or so under synthetic write load, but runs otherwise stable. An 11-STABLE system hangs seconds after I start my "dd" jobs.
More details can be found in the added links.
Kind regards,
Patrick
Attachments
Attachments
JEditor
Issue Links
- is cloned by
-
NAS-104047 cherry-picking of NVME fixes in FreeBSD
-
- Engineering Closed
-
- relates to
-
NAS-104094 Cherry-picking NVME improvements from upstream (FreeBSD)
-
- Engineering Closed
-