Uploaded image for project: 'FreeNAS / TrueNAS'
  1. FreeNAS / TrueNAS
  2. NAS-101181

Use cam to get temperature for SCSI disks

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Done (View Workflow)
    • Priority: Low
    • Resolution: Complete
    • Affects Version/s: None
    • Fix Version/s: 11.2-U4
    • Component/s: Middleware
    • Labels:
      None

      Description

      On an M50 with 150 disks, running version TrueNAS-11.2-MASTER-201903200659 the CPU goes down to 65% idle when the box is sitting there with no I/O coming into the box. Upon investigation, collectd is destroying the system by running
      smartctl -a -n standby /dev/da**
      . My guess is that it's running this command on all disks, in parallel which is causing unnecessary I/O to the disks and also unnecessary CPU time. Below is the output of commands taken when the box goes into this state.
      last pid: 57756;  load averages:  6.37,  5.82,  5.62                                                                                                                                                                                                       up 0+18:28:51  08:40:54
      3673 processes:47 running, 3347 sleeping, 279 waiting
      CPU:  0.4% user,  0.0% nice, 28.9% system,  0.0% interrupt, 70.7% idle
      Mem: 687M Active, 1069M Inact, 8259M Wired, 239G Free
      ARC: 2664M Total, 528M MFU, 1995M MRU, 726K Anon, 25M Header, 115M Other
           781M Compressed, 1819M Uncompressed, 2.33:1 Ratio
      Swap: 16G Total, 16G Free
      
        PID USERNAME   PRI NICE   SIZE    RES STATE   C   TIME    WCPU COMMAND
      13723 root        35    0   124M 63956K umtxn   0  45:24  86.58% collectd{writer#3}
      13723 root        86    0   124M 63956K CPU6    6   0:07  86.47% collectd{collectd}
      96006 root        52    0   301M   226M uwait  34   8:57  84.27% python3.6{python3.6}
      13723 root        52    0   124M 63956K uwait  22   0:07  76.73% collectd{collectd}
      13723 root        52    0   124M 63956K uwait  29   0:08  75.67% collectd{collectd}
      13723 root        52    0   124M 63956K uwait  38   0:07  58.29% collectd{collectd}
      13723 root        52    0   124M 63956K uwait  19   0:07  57.27% collectd{collectd}
      13723 root        52    0   124M 63956K uwait  36   0:07  54.12% collectd{collectd}
      13723 root        52    0   124M 63956K uwait  33   0:08  53.54% collectd{collectd}
      13723 root        52    0   124M 63956K RUN    21   0:07  53.53% collectd{collectd}
      13723 root        52    0   124M 63956K uwait  33   0:07  53.33% collectd{collectd}
      13723 root        52    0   124M 63956K uwait  23   0:07  52.82% collectd{collectd}
      96006 root        52    0   301M   226M usem   23   0:00  47.07% python3.6{python3.6}
      13723 root        40    0   124M 63956K CPU30  30  45:10  47.02% collectd{writer#1}
      96006 root        52    0   301M   226M uwait   1   0:00  46.97% python3.6{python3.6}
      96006 root        85    0   301M   226M CPU24  24   0:00  46.58% python3.6{python3.6}
      13723 root        31    0   124M 63956K umtxn   0  44:45  46.20% collectd{writer#2}
      13723 root        35    0   124M 63956K umtxn  21  45:16  29.50% collectd{writer#4}
      13723 root        41    0   124M 63956K usem   34  10:59  24.89% collectd{reader#2}
      13723 root        52    0   124M 63956K nanslp  8  24:21  21.59% collectd{collectd}
      96006 root        52    0   301M   226M usem   17   0:00  20.37% python3.6{python3.6}
      13723 root        26    0   124M 63956K vm map 23  45:55  16.86% collectd{writer#0}
         25 root       -16    -     0K    48K psleep  8  11:01  14.64% pagedaemon{dom0}
       5390 root        20    0   426M   362M umtxn   6   4:04   3.40% uwsgi-3.6{uwsgi-3.6}
      13723 root        25    0   124M 63956K uwait  39   9:51   3.34% collectd{reader#3}
      96006 root        52    0   301M   226M usem   22   0:00   1.86% python3.6{python3.6}
      57750 root        20    0 18180K 11564K CPU9    9   0:00   1.25% top
      11525 nobody      20    0  6928K  3476K select 15   1:01   0.22% mdnsd
       5063 root        20    0 40128K 18464K select 25   0:16   0.21% rrdcached{rrdcached}
         31 root        16    -     0K    16K syncer 30   0:05   0.21% syncer
       3956 nobody      20    0  6928K  3496K select 37   1:51   0.12% mdnsd
         12 root       -88    -     0K  4464K WAIT   37   0:19   0.08% intr{irq493: mpr1}
         17 root       -16    -     0K    16K -       6   0:31   0.03% rand_harvestq
         12 root       -60    -     0K  4464K WAIT    0   7:26   0.03% intr{swi4: clock (0)}
      83659 www         20    0 14344K  8588K kqread 34   0:10   0.03% nginx
          4 root       -16    -     0K   128K -       4   0:42   0.03% cam{doneq2}
      root@tn11b:~ # procstat -akk | grep collectd
      13723 101245 collectd            -                   _mtx_lock_spin_cookie+0xc1 smp_targeted_tlb_shootdown+0x3ae smp_masked_invltlb+0x3d pmap_invalidate_all+0x232 pmap_protect+0x6e6 vmspace_fork+0xa55 fork1+0x47a sys_fork+0x39 amd64_syscall+0xa38 fast_syscall_common+0x101 
      13723 101519 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102287 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102512 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102698 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102702 collectd            writer#0            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102704 collectd            writer#1            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102705 collectd            writer#2            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102706 collectd            writer#3            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102717 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102752 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102817 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102862 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102863 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102872 collectd            -                   _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102903 collectd            writer#4            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 102985 collectd            reader#0            mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _sleep+0x20c umtxq_sleep+0x143 do_wait+0x427 __umtx_op_wait_uint_private+0x53 amd64_syscall+0xa38 fast_syscall_common+0x101 
      13723 103478 collectd            reader#1            mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _sleep+0x20c umtxq_sleep+0x143 do_wait+0x427 __umtx_op_wait_uint_private+0x53 amd64_syscall+0xa38 fast_syscall_common+0x101 
      13723 103545 collectd            reader#2            _sx_slock_hard+0x128 vm_map_lookup+0xce vm_fault_hold+0x66 vm_fault+0x75 trap_pfault+0x14c trap+0x353 calltrap+0x8 
      13723 103723 collectd            reader#3            mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _sleep+0x20c umtxq_sleep+0x143 do_wait+0x427 __umtx_op_wait_uint_private+0x53 amd64_syscall+0xa38 fast_syscall_common+0x101 
      13723 105056 collectd            reader#4            mi_switch+0xe6 sleepq_catch_signals+0x40c sleepq_timedwait_sig+0x14 _sleep+0x20c umtxq_sleep+0x143 do_wait+0x427 __umtx_op_wait_uint_private+0x53 amd64_syscall+0xa38 fast_syscall_common+0x101
      root       13723  408.1  0.0 127260  63956  -  Ss   18:25    1029:43.19 |-- /usr/local/sbin/collectd
      root       71741   44.4  0.0   8664   4884  -  RL   08:45       0:01.11 | |-- /usr/local/sbin/smartctl -a -n standby /dev/da33

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              william William Grzybowski
              Reporter:
              caleb Caleb St. John
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Summary Panel