snapshot causes kernel panic

Description

I posted this to the forum at https://www.ixsystems.com/community/threads/snapshot-causes-kernel-panic.75700/ amd was suggested to make a bug report, so here we are.

FreeNAS-11.2-RELEASE-U1 (I know, it's not the latest, but I'm feeling a bit gun shy after the 11.2 data loss escapade.)

zpool get version shows a - for the value, so it's possible I'm not running the 'latest' pool version for the installed release.

I've been able to reproduce this multiple times. System is otherwise stable.

Once a snapshot is attempted, manually from the GUI or via a scheduled task, the system kernel panics and reboots.

This is what I was able to capture via remote console screen recording. The system will just be sitting there until the snapshot is attempted:

panic: solaris assert: zap_add(mos, desl_dataset_phys(ds)->ds_snapnames_zapobj, snapname, 8, 1, &dsobj, tx) == 0 (0x5 ==0x0), file: /freenas-releng-final/freenas/_BE/os/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dsl_dataset.c, line: 1534
cpuid = 3
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0469cbc3d0
vpanic() at vpanic+0x177/frame 0xfffffe0469cbc430
panic() at panic+0x43/frame 0xfffffe0469cbc490
assfail3() at assfail3+0x2c/frame 0xfffffe0469cbc4b0
dsl_dataset_snapshot_sync_impl() at dsl_dataset_snapshot_sync_impl+0x628/frame 0xfffffe0469cbc560
dsl_dataset_snapshot_sync_impl() at dsl_dataset_snapshot_sync_impl+0f7/frame 0xfffffe0469cbc6c0
dsl_sync_task_sync() at dsl_sync_task_sync+0xae/frame 0xfffffe0469cbc6f0
dsl_pool_sync() at dsl_sync_task_sync+0x3b/frame 0xfffffe0469cbc770
spa_sync() at spa_sync+0xad5/frame 0xfffffe0469cbc9a0
txg_sync_thread() at tgx_sync_thread+0x208/frame 0xfffffe0469cbcab0
fork_exit() at fork)exit+0x83/frame 0xfffffe0469cbcab0
fork_trampoline() at form_exit+0x83/frame 0xfffffe0469cbcab0
— trap 0, rip = 0, rsp = 0, rbp = 0 —
KDB: enter: panic
[ thread pid 15 tid 101395 ]
stopped at kdb_enter+0x3b: movq $0,kdb_why
db:0:kdb.enter.deafault> wrtie cn_mute 1
cn_mute 0 = 0,x1
db:0:kbd.enter.default> reset
cpu_reset: Restarting BSP
cpu_reset_proxy: Stopped CPU 3

Current system specs:

Supermicro X9SCL+-F

Xeon E3-1230

16 gig ECC

LSI 9211-8i

2x Crossflashed Dell H310

Firmware 20.00.07.00 on all three adapters (covers all available drive bays with an extra port on one of the cards)

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Alexander Motin 
February 11, 2020 at 9:59 PM

This looks like a metadata corruption, not detected by checksums.  It would be really good to find out how it happened, but once it did, there is nothing more to do.

Justin D'Cynical 
June 29, 2019 at 11:46 AM

Well, it's been about three days with only certain volumes set for recursive snapshots, and it's stable so far.  I attempted a snapshot of the entire pool, non-recursive, and it succeeded, but add in that recursive flag and bad things happen.

 

 

Alexander Motin 
June 26, 2019 at 2:48 PM

Error 5 reported there is probably a EIO – either I/O arror or some sort of metadata corruption. I am surprised that it passes scrub, but it may be some corruption that happened before checksums were calculated.  I suppose panic happen when you try to snapshot some specific dataset. I doubt it depend whether it is recursive, but let us know if I am wrong, it would be interesting.

Justin D'Cynical 
June 26, 2019 at 9:24 AM

Just updated to FreeNAS-11.2-U5. Interestingly, making a snapshot of a low use filesystem works, but a recursive of the entire pool causes a KP.

I've made a few individual recursive snapshot tasks vs trying to set up one pool sized recursive task, will see if that is stable

Justin D'Cynical 
June 23, 2019 at 10:56 PM

So, finally got a chance to look at this again, my apologies for the long delay.

 

Updated to FreeNAS-11.2-U4.1, trying to do a snapshot from the GUI or CLI still causes the KP.  Debug tarball from system -> advanced -> Save Debug has been attached.

Cannot Reproduce

Details

Assignee

Reporter

Labels

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created April 16, 2019 at 3:21 AM
Updated July 1, 2022 at 4:32 PM
Resolved February 11, 2020 at 9:59 PM