replication target crashes reliably with dva_get_dsize_sync(): bad DVA

Description

Customer is replicating from TrueNAS to FreeNAS.

TrueNAS is running TrueNAS-11.1-U7.1

FreeNAS is running FreeNAS-11.2-U5

FreeNAS system crashes with dva_get_dsize_sync(): bad DVA

FreeNAS datasets have been restarted and problem continued.

I'm thinking this is an interesting case because of the relative newness of both releases.

Debugs from both systems attached.

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Alexander Motin 
May 14, 2020 at 2:46 PM

I guess I could make ZFS to not panic on this specific corruption, ignoring/leaking the spill block with the corrupted pointer.  May be after that ACL modification for the specific file could write fix onto disks. It would not fix existing snapshots, though they could be deleted later.  But I am not happy about this path, since it is difficult to guarantee the result of such a dirty hack without spending a lot of time testing it.

Bill O'Hanlon 
May 14, 2020 at 1:35 PM

I was afraid you'd say something like that.  Okay.  Any way to fix that pool?  I'm guessing there isn't.

 

Alexander Motin 
May 14, 2020 at 1:22 PM
(edited)

Scrub crash means corruption is on the pool already.  So it may be that panics you seen on replication are not caused by new corruption, but old one before update.

Bill O'Hanlon 
May 12, 2020 at 12:00 PM

11.2-U8.2 is on both send and receive sides, but it looks like it was not scrubbed after update.   I'll get that done.

Alexander Motin 
May 11, 2020 at 3:00 PM

Is TN 11.2-U8.2 on both sides and particularly receive? Was that dataset recreated at at least successfully scrubbed on receive side after update?

Complete

Details

Assignee

Reporter

Due date

Support Ticket

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created July 17, 2019 at 7:45 PM
Updated July 1, 2022 at 4:34 PM
Resolved April 10, 2020 at 12:33 PM