Fix kernel panic in case of L2ARC prefetch failure
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity

Alexander Motin December 1, 2019 at 1:58 AM
freenas/11-stable patch: https://github.com/freenas/os/commit/bb71a5b835560ab008e574d61d78bc2e5cb5fb39
freenas/11.2-stable patch: https://github.com/freenas/os/commit/5b4811a0edcd03358ef83df27df3ec89328b74ba

Alexander Motin November 28, 2019 at 6:50 PM
The FreeBSD head commit: https://svnweb.freebsd.org/changeset/base/355182

Alexander Motin November 28, 2019 at 5:23 PM
I think I found what is going on. I was able to trigger the same panic by corrupting data written to L2ARC and then making ZFS to read them with prefetch enabled. It causes use-after-free scenario, since while ZFS has to redirect the failed prefetch I/O to the main pool, other thread that need the same data tries to rise priority of L2ARC I/O that is not existing any more. Other platforms do not have this issue because by default L2ARC prefetch is blocked, while FN/TN autotune sets vfs.zfs.l2arc_noprefetch=0. Resetting that value back to 1 should workaround the issue, while real fix should not be hard.

Bill O'Hanlon November 4, 2019 at 7:01 PM
Fresh incident seen:
PHE-919-16008
Tracing pid 0 tid 102512 td 0xfffff804b4a80620
_sx_xlock_hard() at _sx_xlock_hard+0x15d/frame 0xfffffe20337675f0
zio_change_priority() at zio_change_priority+0x130/frame 0xfffffe2033767640
arc_read() at arc_read+0xf3/frame 0xfffffe20337676d0
dbuf_read() at dbuf_read+0x6f4/frame 0xfffffe2033767790
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x203/frame 0xfffffe2033767800
dmu_read_uio_dnode() at dmu_read_uio_dnode+0x37/frame 0xfffffe2033767870
zvol_read() at zvol_read+0x8b/frame 0xfffffe20337678b0
ctl_be_block_dispatch_zvol() at ctl_be_block_dispatch_zvol+0x1c4/frame 0xfffffe2033767940
ctl_be_block_worker() at ctl_be_block_worker+0x777/frame 0xfffffe20337679e0
taskqueue_run_locked() at taskqueue_run_locked+0x154/frame 0xfffffe2033767a40
taskqueue_thread_loop() at taskqueue_thread_loop+0x98/frame 0xfffffe2033767a70
fork_exit() at fork_exit+0x83/frame 0xfffffe2033767ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe2033767ab0
— trap 0, rip = 0, rsp = 0, rbp = 0 —
db:0:kdb.enter.default> show allpcpu

Bill O'Hanlon October 11, 2019 at 6:57 PMEdited
Customer reported the same panic: Ticket XOA-862-20343
Debug attached.
db:0:kdb.enter.default> bt [0/1905]
Tracing pid 5355 tid 100641 td 0xfffff802765d3000
_sx_xlock_hard() at _sx_xlock_hard+0x15d/frame 0xfffffe0a95a10ec0
zio_change_priority() at zio_change_priority+0x130/frame 0xfffffe0a95a10f10
arc_read() at arc_read+0xf3/frame 0xfffffe0a95a10fa0
dbuf_read() at dbuf_read+0x6f4/frame 0xfffffe0a95a11060
dmu_buf_hold_array_by_dnode() at dmu_buf_hold_array_by_dnode+0x203/frame 0xfffffe0a95a110d0
dmu_read_uio_dnode() at dmu_read_uio_dnode+0x37/frame 0xfffffe0a95a11140
dmu_read_uio_dbuf() at dmu_read_uio_dbuf+0x3b/frame 0xfffffe0a95a11170
zfs_freebsd_read() at zfs_freebsd_read+0x2d3/frame 0xfffffe0a95a11220
VOP_READ_APV() at VOP_READ_APV+0x7c/frame 0xfffffe0a95a11250
nfsvno_read() at nfsvno_read+0x36d/frame 0xfffffe0a95a11320
nfsrvd_read() at nfsrvd_read+0x5d3/frame 0xfffffe0a95a11570
nfsrvd_dorpc() at nfsrvd_dorpc+0x621/frame 0xfffffe0a95a11750
nfssvc_program() at nfssvc_program+0x580/frame 0xfffffe0a95a11920
svc_run_internal() at svc_run_internal+0xe19/frame 0xfffffe0a95a11a60
svc_thread_start() at svc_thread_start+0xb/frame 0xfffffe0a95a11a70
fork_exit() at fork_exit+0x83/frame 0xfffffe0a95a11ab0
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0a95a11ab0
— trap 0xc, rip = 0x80087015a, rsp = 0x7fffffffe3f8, rbp = 0x7fffffffe810 —
Details
Assignee
Alexander MotinAlexander MotinReporter
Brendon C.Brendon C.Support Ticket
XOA-862-20343 OSS-410-96482 PHE-919-16008 AOH-876-52386 WFY-120-49676Components
Fix versions
Affects versions
Priority
High
Details
Details
Assignee

Reporter

I have a FreeNAS mini I'm using for remote backups. This is a fairly low utilization system that also gets used for Samba shares, AFP shares, and some open source Bittorrent seeding. Specs:
FreeNAS Mini
Motherboard: ASRock C2750D4I
OS disk: 16GB SATA DOM
Data Disks: 4 x WD Red 8TB
Memory: 32GB
Official 64GB ZIL and 128GB L2ARC units installed.
Pool is encrypted.
FreeNAS 11.2-U3
After upgrading to 11.2 I started seeing what appears to be the pool freezing up completely every few days / once a week or so. I can still ping the system. I've been able to ssh in sometimes and I can pull the console up over IPMI. Any type of filesystem access to the pool, however, freezes indefinitely. The only solution is a reboot.
I've scoured the system for any type of log entry but have found nothing. There is absolutely no indication as to what's going on that I can find.
Here's my thread in the forums:
https://forums.freenas.org/index.php?threads/freenas-mini-on-11-2-freezes-locks-up.73353/
Here's another thread opened today by someone else who appears to be having the same issue:
https://forums.freenas.org/index.php?threads/11-2u1-not-responding-after-a-week-of-use-repeatedly.74002/
This happened in 11.2-U2 and now U3.
Another thing I noticed: when I have to hard reset the system, it seems to lose its config. The hostname reverts to "freenas" and I think I've even had it lose network info. Another reboot fixes this.
This system is in a cold closet. Right now it's usually around 70f / 21c all day. Someone brought up heat issues but I don't think it's that. I ran stress on the system for hours while closely monitoring temps and they never got close to concerning (nor did the system freeze).
This system has been in production for years now with few issues until upgrading to 11.2.
This is a recent fresh install of 11.2-U2 (I restored the config). I just upgraded to U3 last night and had it freeze today.
I have three other systems in production, one running a fairly high load, all without issue. I'm not sure why this particular system is freezing.
I can generate a debug package but would prefer not to post that publicly. Let me know if I can provide anything else.