ZFS checksum errors with PMC8003/pms(4)

Description

Translated via google translate

We use a Sierra PMC 8003 controller with a NETAPP 4243 shell.

Everything worked perfectly. Only after the update to TrueNAS12 U2.1 are all HDDs marked with checksum errors, namely all of them.

After returning to Freennas 11.3U5, all hard drives are also treated with chksum errors. All files written by Truenas 12 are defective. Resilvering took 18 hours after that everything was ok again below 11.3U5.

In any case, there is a driver error with regard to the PMC8003 controller. Maybe old version compiled in? Because the system works fine with 11.3U5.

Is there perhaps another problem here that there is a more aggressive type of memory use where errors may be noticed differently than with version 11.3U5. Just a theory.

Memory I tested everything everything OK. 11.3U5 does not run any defective files or checksum errors.

Please turn off the problem here. Others who switched to a different controller avoided the problem. So please check the drivers here what is wrong because then the errors are not present.

TrueNAS 12 looks good, but if it ends in data scrap it doesn't get us anywhere. Very annoying.

Many Thanks

See here the problem is also described https://forum.level1techs.com/t/new-ds4243-truenas-core-checksum-errors-across-every-disk-solved/167952/28

Wie verwenden einen Sierra PMC 8003 Controller mit einer NETAPP 4243 Shell.

Alles funktionierte einwandfrei. Nur nach dem Update auf TrueNAS12 U2.1 werden alle HDDs mit Checksumfehlern gekennzeichenet und zwar alle.

Nach der Rückkehr auf Freennas 11.3U5 werden alle Festplatten ebenfalls mit Chksum Fehlern behandelt. Alle Dateien die Truenas 12 geschrieben hat sind defekt. Resilvering hat 18 Stunden gebraucht danach war alles wieder ok unter 11.3U5.

Hier liegt auf jeden Fall ein Treiberfehler bezüglich des PMC8003 Controller vor. Vielleicht alte Version einkompiliert ? Da das System mit 11.3U5 einwandfrei funktioniert.

Gibt es hier vielleicht eine andere Problematik das hier eine aggressivere Speicherbeansprungsart vorliegt wo Fehler vielleicht anders auffallen als bei der Version 11.3U5. Nur so eine Theorie.

Speicher habe ich alles getestet alles Ok. 11.3U5 läuft keine defekten Dateinen auch keine Checksumerrors.

Bitte abstellen beheben sie hier das Problem. Andere die auf einen anderen Controller gewechslt sind haben damit das Problem umgangen. Also bitte mal die Treiber prüfen hier stimmt was nicht da dann die Fehler nicht vorhanden sind.

TrueNAS 12 sieht gut aus aber wenn in Datenschrott endet bring uns das nicht weiter. Sehr ärgerlich.

Vielen Dank

Siehe hier ist das Problem ebenfalls beschrieben

https://forum.level1techs.com/t/new-ds4243-truenas-core-checksum-errors-across-every-disk-solved/167952/28

Problem/Justification

None

Impact

None

SmartDraw Connector

Katalon Manual Tests (BETA)

Activity

Joe Maloney 
April 30, 2021 at 1:10 PM

Cherry-pick for hotfix is in the truenas/12.0-u3-stable branch https://github.com/truenas/os/tree/truenas/12.0-u3-stable.

Vassili (Bill) Anastasis 
April 23, 2021 at 11:06 PM
(edited)

Thank you Alexander,

I too have been testing for days and have had no issues as of yet, if anything changes I'll report it here.

Thank you for the great work.

Burkhardt 
April 23, 2021 at 9:14 PM

Thank You Alexander Motin.

It seems to be it works. 1 copied 1.5TB of Data in little and great datasize files. I have not seed Checksum errors so far. I will test it more. But in the past it takes only minutes to happen. Now it runs serveral hours with no problems. If i find something i will write to you. Thank you.

Greetings

Burkhardt

Burkhardt 
April 17, 2021 at 6:51 PM

Yes no Problem i will try it. I also have another Problem maybe also on the Drivers. I have pool 1 with DA1- DA12. But suddenly DA12 failed. The drive in another POOL workes well.

The Drive DA12 in another SLOT (then DA24) (same pool) makes also same errors. Suddenly it becames CHK errors. an the system shut it down. After Reboot it works well. RAM Testet all ok. HDD in other POOL drives with no problem with stripe set of 4 drives no errors Its every time the last drive in pool of 12 Drives.

Perhaps error also of driver. This happens in Freenas 11.3 U5.

Thank you.

Alexander Motin 
April 17, 2021 at 6:11 PM

You can try updating to the latest TrueNAS 12.0-U3 and then unpack the attached archive into /boot directory instead of the old kernel with `cd /boot && tar -xzvf /path/to/kernel.tgz`.  Make sure to create boot environment and backup your data in case something still go wrong.

Complete

Details

Assignee

Reporter

Labels

Impact

Components

Fix versions

Affects versions

Priority

More fields

Katalon Platform

Created March 23, 2021 at 12:28 PM
Updated July 1, 2022 at 5:12 PM
Resolved April 16, 2021 at 8:05 PM