Replicating certain files reliably crashes TrueNAS

Description

I experienced sudden crashes/reboots of TrueNAS 12.0-RELEASE during replication of a dataset to a USB-attached local pool consisting of a single disk vdev.

I ruled out hardware issues by replicating the issue on a virtualized TrueNAS instance with virtualized disks on separate hardware. No hardware implicated in the original crash was used in the virtualized instance.

I have isolated the files that cause the crash/reboot to two (broken) symlinks in a particular dataset and can provide those files for inspection. Other broken symlinks do not cause this issue.

Please see https://www.truenas.com/community/threads/zfs-replication-to-usb-drive-caused-nas-reboot.89152/ for more details.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Show:

Winnie Linnie 
December 22, 2020 at 11:56 PM

This is the first time I've heard of a "spill block". Thank you for explaining this further, .

 

For the meantime, I cannot send/recv my encrypted dataset, lest I crash my entire system again. (TrueNAS 12.0-U1)

Alexander Motin 
December 21, 2020 at 5:26 PM

Symlink value is stored not as data, but as special kind of system attribute.  Depending on size, system attributes can be stored in the dnode itself, or leak to separately addressed "spill block".  Recordsize is not used there.

Winnie Linnie 
December 21, 2020 at 4:55 PM

"(I realize these questions are veering away from the purpose of this ticket, so I'll try to keep from going down a rabbit hole.)"

 

In my opinion, these questions are important. Native ZFS encryption is still in its infancy compared to other long-established alternatives. Who would have guessed that the path length woud cause kernel panics when sending raw encrypted streams?

 

"The longer path results in a different type of block being used."

 

To understand this correctly, you mean there is a special type of block used when a path exceeds X many characters?
https://jira.ixsystems.com/secure/AddComment!default.jspa?id=43557

HenchRat 
December 21, 2020 at 4:52 PM

Ah, I see, thanks.  Is the size of the block defined by the ZFS recordsize?  IOW, if I'd had a 1MiB recordsize on that dataset, would I not have triggered the bug because the path would have fit in a single block?

Is the "separate objects" you mention the "ZFS Structure" that is not encrypted per Tom Caputi's explanation of ZFS encryption?

(I realize these questions are veering away from the purpose of this ticket, so I'll try to keep from going down a rabbit hole.)

Alexander Motin 
December 21, 2020 at 4:09 PM

Encryption hides the data itself, but the data size still affects how it is processed.  The replication is still aware of separate objects within dataset, even though having no idea what they are and what's inside.

Duplicate

Details

Assignee

Reporter

Labels

Impact

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created December 6, 2020 at 7:11 PM
Updated July 1, 2022 at 4:59 PM
Resolved December 8, 2020 at 3:34 PM