File incorrectly zeroed when receiving incremental stream that toggles -L

Description

Following https://www.ixsystems.com/community/threads/replicated-pool-zeroed-files-corruption-undetected.87174/post-605351

This is a bug report of a system affected by https://github.com/openzfs/zfs/issues/6224

I do not remember how or when I triggered this bug since I noticed the problem when it was already too late. Given the extent of the corruption of my pool, I speculate this happened when I moved from pushing replication from my 11.2-running source system to this 11.3-running target, to pulling the other way around (as I was keen on benefitting from the implementation of resume transfers, a.k.a "send -t" which is natively supported in 11.3).

As stated in the forum, that's the maximum extent of the information I'm able to provide. Attached is the source and target debug information, although this bug was probably initially triggered many months ago.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Alexander Motin 
January 21, 2021 at 9:26 PM

I have too much other work to port this to 11.3 now, when we plan no new releases there.

Thibaut 
September 11, 2020 at 2:32 PM

Purely FYI I just noticed that in anonymous (not logged in) mobile view (Safari on iPhone), the private attachments are visible (full name, file size and age). Clicking one appears to download an empty file, but still, doesn't look good :-/

Alexander Motin 
September 11, 2020 at 2:18 PM

I've created for the UI changes.

Alexander Motin 
September 9, 2020 at 12:43 PM

Yes, we'll look on UI side too.  Unfortunately it won't guarantee anything by itself.

Thibaut 
September 9, 2020 at 9:22 AM

Alexander, as detailed in the upstream tracker the bug happens as soon as the largeblock transfer switch is toggled, whether from off to on (as in my case) or from on to off. Because 11.3 now allows to toggle that setting from the GUI, the likelihood of this bug happening for users who have datasets with recordsize>128K (not an uncommon scenario, especially when storing large files) is dramatically increased.

If this bug cannot be fixed in 11.3 or until it can be, I would respectfully urge you to add a very big warning in both the GUI help tool in 11.3 and the online documentation (possibly in all versions up to 11.3 since they are all potentially affected via command line) about this situation, as the bug is extremely pernicious since it's completely silent and lethal to the replicated data.

In my case, the damage is very significant because I use my NAS device mostly for archival purposes, and since this bug destroys data that existed in snapshots created prior to the toggling of largeblock transfer, this essentially means all my older archived data is at risk: assuming all data prior to my upgrade to 11.3 is concerned, that's approximately 90% of the stored data. And in fact from my current assessment of the situation, it looks like the majority of it has already been damaged.

Complete

Details

Assignee

Reporter

Labels

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created September 8, 2020 at 3:36 PM
Updated July 1, 2022 at 2:54 PM
Resolved January 21, 2021 at 9:26 PM