Uploaded image for project: 'FreeNAS / TrueNAS'
  1. FreeNAS / TrueNAS
  2. NAS-108819

Update 12.0 to U1 causes Command timeouts and multiple zpools to become unavailable

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Engineering Closed (View Workflow)
    • Priority: Low
    • Resolution: Duplicate
    • Affects Version/s: 12.0-U1
    • Fix Version/s: N/A
    • Component/s: ZFS
    • Labels:
      None
    • Impact:
      Critical

      Description

      System is stable running 12.0 base. Shortly after update to U1, received emails notifying of known disk issues related to FAILED SMART self-checks (in process of replacing shortly) such as

      New alerts:
      * Device: /dev/da13 [SAT], ATA error count increased from 9 to 10.

      Current alerts:
      * Device: /dev/da3 [SAT], 4 Currently unreadable (pending) sectors.
      * Device: /dev/da3 [SAT], 1 Offline uncorrectable sectors.
      * Device: /dev/da8 [SAT], FAILED SMART self-check. BACK UP DATA NOW!.
      * Device: /dev/da8 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct..
      * Device: /dev/da0 [SAT], not capable of SMART self-check.
      * Device: /dev/da3 [SAT], FAILED SMART self-check. BACK UP DATA NOW!.
      * Device: /dev/da3 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct..
      * Device: /dev/da3 [SAT], Self-Test Log error count increased from 17 to 21.
      * Device: /dev/da13 [SAT], ATA error count increased from 9 to 10.

      but sometime shortly after (from minutes to just about an hour) I get an updated email notifying of UNAVAIL pool(s).

      New alerts:
      * Pool Pool-01_2x3TB_Z1 state is UNAVAIL: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
      The following devices are not healthy:
      Disk 13469299449177422213 is REMOVED
      Disk 18225843187637751625 is REMOVED


      Current alerts:
      * Device: /dev/da3 [SAT], 4 Currently unreadable (pending) sectors.
      * Device: /dev/da3 [SAT], 1 Offline uncorrectable sectors.
      * Device: /dev/da8 [SAT], FAILED SMART self-check. BACK UP DATA NOW!.
      * Device: /dev/da8 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct..
      * Device: /dev/da0 [SAT], not capable of SMART self-check.
      * Device: /dev/da3 [SAT], FAILED SMART self-check. BACK UP DATA NOW!.
      * Device: /dev/da3 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct..
      * Device: /dev/da3 [SAT], Self-Test Log error count increased from 17 to 21.
      * Device: /dev/da13 [SAT], ATA error count increased from 9 to 10.
      * Pool Pool-01_2x3TB_Z1 state is UNAVAIL: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
      The following devices are not healthy:
      Disk 13469299449177422213 is REMOVED
      Disk 18225843187637751625 is REMOVED

      Checking /var/log/messages is this:

      Dec 26 10:12:42 freenas-rms     (da7:mps0:0:16:0): READ(16). CDB: 88 00 00 00 00 01 68 4a e9 60 00 00 00 08 00 00 length 4096 SMID 645 Command timeout on target 16(0x0010) 60000 set, 60.202389277 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 16 for SMID 645
      Dec 26 10:12:42 freenas-rms     (da7:mps0:0:16:0): READ(16). CDB: 88 00 00 00 00 01 68 4a e9 60 00 00 00 08 00 00 length 4096 SMID 645 Aborting command 0xfffffe010db2e2b8
      Dec 26 10:12:42 freenas-rms     (da7:mps0:0:16:0): WRITE(10). CDB: 2a 00 f5 7a 41 d8 00 04 d0 00 length 630784 SMID 822 Command timeout on target 16(0x0010) 60000 set, 60.211860396 elapsed
      Dec 26 10:12:42 freenas-rms     (da8:mps0:0:18:0): READ(10). CDB: 28 00 75 95 a2 88 00 00 80 00 length 65536 SMID 816 Command timeout on target 18(0x000a) 60000 set, 60.211873174 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 18 for SMID 816
      Dec 26 10:12:42 freenas-rms     (da8:mps0:0:18:0): READ(10). CDB: 28 00 75 95 a2 88 00 00 80 00 length 65536 SMID 816 Aborting command 0xfffffe010db3c880
      Dec 26 10:12:42 freenas-rms     (da8:mps0:0:18:0): READ(10). CDB: 28 00 75 95 a2 08 00 00 80 00 length 65536 SMID 698 Command timeout on target 18(0x000a) 60000 set, 60.340101584 elapsed
      Dec 26 10:12:42 freenas-rms     (da8:mps0:0:18:0): READ(10). CDB: 28 00 75 95 a1 88 00 00 80 00 length 65536 SMID 489 Command timeout on target 18(0x000a) 60000 set, 60.398693933 elapsed
      Dec 26 10:12:42 freenas-rms     (da4:mps0:0:13:0): READ(10). CDB: 28 00 75 95 a4 88 00 00 80 00 length 65536 SMID 559 Command timeout on target 13(0x000b) 60000 set, 60.414102026 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 13 for SMID 559
      Dec 26 10:12:42 freenas-rms     (da4:mps0:0:13:0): READ(10). CDB: 28 00 75 95 a4 88 00 00 80 00 length 65536 SMID 559 Aborting command 0xfffffe010db26f28
      Dec 26 10:12:42 freenas-rms     (da3:mps0:0:12:0): READ(10). CDB: 28 00 75 95 a4 90 00 00 80 00 length 65536 SMID 602 Command timeout on target 12(0x000c) 60000 set, 60.536237026 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 12 for SMID 602
      Dec 26 10:12:42 freenas-rms     (da3:mps0:0:12:0): READ(10). CDB: 28 00 75 95 a4 90 00 00 80 00 length 65536 SMID 602 Aborting command 0xfffffe010db2a8f0
      Dec 26 10:12:42 freenas-rms     (da4:mps0:0:13:0): READ(10). CDB: 28 00 75 95 a4 08 00 00 80 00 length 65536 SMID 697 Command timeout on target 13(0x000b) 60000 set, 60.658369333 elapsed
      Dec 26 10:12:42 freenas-rms     (da3:mps0:0:12:0): READ(10). CDB: 28 00 75 95 a4 10 00 00 80 00 length 65536 SMID 859 Command timeout on target 12(0x000c) 60000 set, 60.716850790 elapsed
      Dec 26 10:12:42 freenas-rms     (da5:mps0:0:14:0): WRITE(10). CDB: 2a 00 cf 8e 90 a8 00 00 58 00 length 45056 SMID 446 Command timeout on target 14(0x0011) 60000 set, 60.781890314 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 14 for SMID 446
      Dec 26 10:12:42 freenas-rms     (da5:mps0:0:14:0): WRITE(10). CDB: 2a 00 cf 8e 90 a8 00 00 58 00 length 45056 SMID 446 Aborting command 0xfffffe010db1d750
      Dec 26 10:12:42 freenas-rms     (da14:mps0:0:27:0): WRITE(10). CDB: 2a 00 cf 8e 90 48 00 00 58 00 length 45056 SMID 649 Command timeout on target 27(0x0012) 60000 set, 60.904812950 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 27 for SMID 649
      Dec 26 10:12:42 freenas-rms     (da14:mps0:0:27:0): WRITE(10). CDB: 2a 00 cf 8e 90 48 00 00 58 00 length 45056 SMID 649 Aborting command 0xfffffe010db2e818
      Dec 26 10:12:42 freenas-rms     (da6:mps0:0:15:0): WRITE(10). CDB: 2a 00 cf 8e 88 60 00 07 e8 00 length 1036288 SMID 583 Command timeout on target 15(0x0013) 60000 set, 60.1029242264 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 15 for SMID 583
      Dec 26 10:12:42 freenas-rms     (da6:mps0:0:15:0): WRITE(10). CDB: 2a 00 cf 8e 88 60 00 07 e8 00 length 1036288 SMID 583 Aborting command 0xfffffe010db28f68
      Dec 26 10:12:42 freenas-rms     (da4:mps0:0:13:0): READ(10). CDB: 28 00 75 95 a3 88 00 00 80 00 length 65536 SMID 675 Command timeout on target 13(0x000b) 60000 set, 60.1155205855 elapsed
      Dec 26 10:12:42 freenas-rms     (da3:mps0:0:12:0): READ(10). CDB: 28 00 75 95 a3 90 00 00 80 00 length 65536 SMID 775 Command timeout on target 12(0x000c) 60000 set, 60.1214052189 elapsed
      Dec 26 10:12:42 freenas-rms     (da1:mps0:0:10:0): READ(10). CDB: 28 00 fa ec 41 20 00 00 80 00 length 65536 SMID 767 Command timeout on target 10(0x000e) 60000 set, 60.1244367865 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 10 for SMID 767
      Dec 26 10:12:42 freenas-rms     (da1:mps0:0:10:0): READ(10). CDB: 28 00 fa ec 41 20 00 00 80 00 length 65536 SMID 767 Aborting command 0xfffffe010db386a8
      Dec 26 10:12:42 freenas-rms     (da2:mps0:0:11:0): READ(10). CDB: 28 00 fa ec 41 20 00 00 80 00 length 65536 SMID 578 Command timeout on target 11(0x000d) 60000 set, 60.1366931409 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 11 for SMID 578
      Dec 26 10:12:42 freenas-rms     (da2:mps0:0:11:0): READ(10). CDB: 28 00 fa ec 41 20 00 00 80 00 length 65536 SMID 578 Aborting command 0xfffffe010db288b0
      Dec 26 10:12:42 freenas-rms     (da1:mps0:0:10:0): READ(10). CDB: 28 00 fa ec 41 a0 00 00 80 00 length 65536 SMID 801 Command timeout on target 10(0x000e) 60000 set, 60.1496735489 elapsed
      Dec 26 10:12:42 freenas-rms     (da2:mps0:0:11:0): READ(10). CDB: 28 00 fa ec 41 a0 00 00 80 00 length 65536 SMID 892 Command timeout on target 11(0x000d) 60000 set, 60.1555541084 elapsed
      Dec 26 10:12:42 freenas-rms     (da0:mps0:0:9:0): READ(10). CDB: 28 00 fa ec 43 a0 00 00 80 00 length 65536 SMID 810 Command timeout on target 9(0x000f) 60000 set, 60.1599611643 elapsed
      Dec 26 10:12:42 freenas-rms mps0: Sending abort to target 9 for SMID 810
      Dec 26 10:12:42 freenas-rms     (da0:mps0:0:9:0): READ(10). CDB: 28 00 fa ec 43 a0 00 00 80 00 length 65536 SMID 810 Aborting command 0xfffffe010db3c070
      Dec 26 10:12:42 freenas-rms     (da1:mps0:0:10:0): READ(10). CDB: 28 00 fa ec 42 a0 00 00 80 00 length 65536 SMID 752 Command timeout on target 10(0x000e) 60000 set, 60.1720593886 elapsed
      Dec 26 10:12:42 freenas-rms     (da2:mps0:0:11:0): READ(10). CDB: 28 00 fa ec 42 a0 00 00 80 00 length 65536 SMID 808 Command timeout on target 11(0x000d) 60000 set, 60.1779450090 elapsed
      Dec 26 10:12:42 freenas-rms     (da0:mps0:0:9:0): READ(10). CDB: 28 00 fa ec 44 a0 00 00 80 00 length 65536 SMID 426 Command timeout on target 9(0x000f) 60000 set, 60.1826374956 elapsed
      Dec 26 10:12:42 freenas-rms     (da0:mps0:0:9:0): READ(10). CDB: 28 00 fa ec 44 20 00 00 80 00 length 65536 SMID 700 Command timeout on target 9(0x000f) 60000 set, 60.1884557776 elapsed
      Dec 26 10:12:43 freenas-rms     (da0:mps0:0:9:0): WRITE(10). CDB: 2a 00 70 40 27 50 00 00 08 00 length 4096 SMID 916 Command timeout on target 9(0x000f) 60000 set, 60.827126265 elapsed
      Dec 26 10:12:43 freenas-rms     (da2:mps0:0:11:0): WRITE(10). CDB: 2a 00 70 40 27 48 00 00 08 00 length 4096 SMID 854 Command timeout on target 11(0x000d) 60000 set, 60.884782873 elapsed
      Dec 26 10:12:43 freenas-rms     (da1:mps0:0:10:0): WRITE(10). CDB: 2a 00 70 40 27 48 00 00 08 00 length 4096 SMID 623 Command timeout on target 10(0x000e) 60000 set, 60.943332036 elapsed
      Dec 26 10:12:43 freenas-rms     (da8:mps0:0:18:0): WRITE(10). CDB: 2a 00 e5 95 a0 f0 00 00 08 00 length 4096 SMID 528 Command timeout on target 18(0x000a) 60000 set, 60.1001827882 elapsed
      Dec 26 10:12:43 freenas-rms     (da4:mps0:0:13:0): WRITE(10). CDB: 2a 00 e5 95 a0 f0 00 00 08 00 length 4096 SMID 420 Command timeout on target 13(0x000b) 60000 set, 60.1060720636 elapsed
      Dec 26 10:12:43 freenas-rms     (da3:mps0:0:12:0): WRITE(10). CDB: 2a 00 e5 95 a0 f0 00 00 08 00 length 4096 SMID 883 Command timeout on target 12(0x000c) 60000 set, 60.1119711278 elapsed
      Dec 26 10:12:46 freenas-rms     (da6:mps0:0:15:0): WRITE(16). CDB: 8a 00 00 00 00 01 b0 4d 9e d8 00 00 00 28 00 00 length 20480 SMID 929 Command timeout on target 15(0x0013) 60000 set, 60.38550836 elapsed
      Dec 26 10:12:46 freenas-rms     (da5:mps0:0:14:0): WRITE(16). CDB: 8a 00 00 00 00 01 b0 4d 9e d8 00 00 00 30 00 00 length 24576 SMID 386 Command timeout on target 14(0x0011) 60000 set, 60.103505142 elapsed
      Dec 26 10:12:46 freenas-rms     (da7:mps0:0:16:0): WRITE(16). CDB: 8a 00 00 00 00 01 b0 4d 9e d0 00 00 00 30 00 00 length 24576 SMID 692 Command timeout on target 16(0x0010) 60000 set, 60.169085370 elapsed
      Dec 26 10:12:46 freenas-rms     (da14:mps0:0:27:0): WRITE(16). CDB: 8a 00 00 00 00 01 b0 4d 9e d0 00 00 00 30 00 00 length 24576 SMID 786 Command timeout on target 27(0x0012) 60000 set, 60.234615252 elapsed
      Dec 26 10:12:48 freenas-rms     (xpt0:mps0:0:16:0): SMID 1 task mgmt 0xfffffe010daf8158 timed out
      Dec 26 10:12:48 freenas-rms mps0: Reinitializing controller
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 16
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 18
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 13
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 12
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 14
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 27
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 15
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 10
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 11
      Dec 26 10:12:48 freenas-rms mps0: Unfreezing devq for target ID 9
      Dec 26 10:12:48 freenas-rms mps0: Firmware: 20.00.07.00, Driver: 21.02.00.00-fbsd
      Dec 26 10:12:48 freenas-rms mps0: IOCCapabilities: 1285c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,HostDisc>
      Dec 26 10:12:50 freenas-rms mps0: Error reading device 0xa SATA PASSTHRU; iocstatus= 0x47
      Dec 26 10:12:56 freenas-rms mps0[3392]: Last message 'Error reading device' repeated 4 times, suppressed by syslog-ng on freenas-rms.rmscomputersystems.com
      Dec 26 10:12:56 freenas-rms (da8:mps0:0:18:0): Invalidating pack
      Dec 26 10:12:56 freenas-rms da8 at mps0 bus 0 scbus0 target 18 lun 0
      Dec 26 10:12:56 freenas-rms da8: <ATA WDC WD30EZRX-00D 0A80>  s/n WD-WMC1T1274257 detached
      Dec 26 10:12:56 freenas-rms GEOM_MIRROR: Device swap2: provider da8p1 disconnected.
      Dec 26 10:12:56 freenas-rms da3 at mps0 bus 0 scbus0 target 12 lun 0
      Dec 26 10:12:56 freenas-rms da3: <ATA WDC WD30EZRX-00D 0A80>  s/n WD-WMC1T1274257 detached
      Dec 26 10:12:56 freenas-rms ses0: da3,pass4,da8 in 'Slot 01', SAS Slot: 1 phys at slot 0
      Dec 26 10:12:56 freenas-rms ses0:  phy 0: SATA device
      Dec 26 10:12:56 freenas-rms ses0:  phy 0: parent 5003048000b626bf addr 5003048000b6268c
      Dec 26 10:12:57 freenas-rms GEOM_MIRROR: Device swap3: provider da3p1 disconnected.
      Dec 26 10:12:57 freenas-rms (da3:mps0:0:12:0): Periph destroyed
      Dec 26 10:12:57 freenas-rms da3 at mps0 bus 0 scbus0 target 12 lun 0
      Dec 26 10:12:57 freenas-rms da3: <ATA WDC WD30EZRX-00D 0A80> Fixed Direct Access SPC-4 SCSI device
      Dec 26 10:12:57 freenas-rms da3: Serial Number WD-WMC1T1274257
      Dec 26 10:12:57 freenas-rms da3: 600.000MB/s transfers
      Dec 26 10:12:57 freenas-rms da3: Command Queueing enabled
      Dec 26 10:12:57 freenas-rms da3: 2861588MB (5860533168 512 byte sectors)
      Dec 26 10:12:57 freenas-rms da3: quirks=0x8<4K>
      Dec 26 10:12:57 freenas-rms (da8:mps0:0:18:0): Periph destroyed
      Dec 26 10:13:56 freenas-rms 1 2020-12-26T10:13:56.670542-05:00 freenas-rms.rmscomputersystems.com smartd 7937 - - Device: /dev/da3 [SAT], FAILED SMART self-check. BACK UP DATA NOW!
      Dec 26 10:13:56 freenas-rms 1 2020-12-26T10:13:56.993394-05:00 freenas-rms.rmscomputersystems.com smartd 7937 - - Device: /dev/da3 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.
      Dec 26 10:13:57 freenas-rms 1 2020-12-26T10:13:57.370463-05:00 freenas-rms.rmscomputersystems.com smartd 7937 - - Device: /dev/da3 [SAT], FAILED SMART self-check. BACK UP DATA NOW!
      Dec 26 10:13:57 freenas-rms 1 2020-12-26T10:13:57.855543-05:00 freenas-rms.rmscomputersystems.com smartd 7937 - - Device: /dev/da3 [SAT], Failed SMART usage Attribute: 5 Reallocated_Sector_Ct.

      Da3 and da8 are in the same z1 pool (Pool-01_2x3TB_Z1), so I do understand the why the pool is pushed offline, but at issue is that 12.0 base does NOT have this issue, it is the application of 12.0-U1 which has introduced this instability/issue (possibly something within the change to OpenZFS?). I have reverted back to 12.0 base and this issue has disappeared (the da3 and da8 SMART errors continue to be notified, but I do not see any of the command timeouts and pools being pushed offline.)

        Attachments

          Attachments

            JEditor

              Issue Links

                Activity

                  People

                  Assignee:
                  releng Release Council
                  Reporter:
                  Cerberus128 Peter Scordamaglia
                  Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                    Dates

                    Created:
                    Updated:
                    Resolved: