Uploaded image for project: 'TrueNAS'
  1. TrueNAS
  2. NAS-109028

Reboot automatically clears ZFS alerts



    • Type: Bug
    • Status: Engineering Closed (View Workflow)
    • Priority: Low
    • Resolution: Third Party to Resolve
    • Affects Version/s: 12.0-U1.1
    • Fix Version/s: N/A
    • Component/s: System



      Perhaps I'm missing something and perhaps this is what's supposed to happen, but it does seem very weird to me...

      Last couple months I'm struggling with a huge data corruption issue (I have created another ticket for that issue) and I've run 50+ scrubs on my 30TB of data in meanwhile. I noticed some weird behaviour of scrub (of which the most important one is in the subjuct):
      1) There is no log of scrub anywhere. I had to create my own script for this functionality:
      2) Scrub CKSUM errors are cleared on reboot. Suddenly / magically your pool is healthy again after a reboot (but in reality, it may not be)
      3) I once also had a SATA cable with a bad connection, which threw 1000+ READ and WRITE errors during scrub, causing scrub to flag the disk as broken and the pool as degraded. Also this was cleared during a reboot!! Without me taking any action in TrueNAS, a reboot caused TrueNAS to start resilvering the pool, with the same disk still in the pool! Luckily I did re-attach the SATA cable, so that resilver was actually ok for me. But that it started doing this without me saying that I fixed the problem, does seem very wrong to me...
      4) I'm not sure how scrub works and this could very well be correct behaviour, but I noticed that after a reboot (and after unlocking the pool), scrub automatically "resumes". But it doesn't resume where it left of... for example, once my system rebooted around 70% scrub complete and then scrub resumed at 50%. But it is not always 20% that it jumps back or even not always at the same % that it jumps back. It seems very random. Together with the lack of logging, I also find documentation on the whole scrub process very lacking. I had to figure out with trial and error...






                releng Triage Team
                Mastakilla Mastakilla
                0 Vote for this issue
                3 Start watching this issue