Uploaded image for project: 'TrueNAS'
  1. TrueNAS
  2. NAS-109320

TrueNAS 12.0-U1.1 freezes (on two systems)

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Engineering Closed (View Workflow)
    • Priority: Low
    • Resolution: Duplicate
    • Affects Version/s: 12.0-U1.1
    • Fix Version/s: N/A
    • Component/s: System
    • Labels:
      None
    • Impact:
      High

      Description

      We updated two of our four TrueNAS-Systems from 12.0-RELEASE to 12.0-U1 (2021-01-12) and then to 12.0-U1.1 (2021-01-18) - the other two still running 12.0-RELEASE.

      From that moment on those Systems on U1 freeze regularly. Only a push of the power-button (or Power-Reset via IPMI) resolves it. This happened every week for System nas-03 and two times in that period for nas-04. As i recall the system freeze almost always happened on the weekends! Maybe triggered through a system-cronjob? With 12.0-RELEASE there were no stability-issues with those systems

      I am not really sure what happens here - i suspected a hardware issue for nas-03 at first. But as it happened on both U1 systems i tend more to a software-issue

       

      After the hard reboot (occasionally) those mails arrive:

      From nas-03 (FUJITSU D3402-B2)

      TrueNAS @ nas-03...
      
      New alerts:
      * Failed to check for alert Quota:
      concurrent.futures.process._RemoteTraceback:
      """
      Traceback (most recent call last):
      File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
      r = call_item.fn(*call_item.args, **call_item.kwargs)
      File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 91, in main_worker
      res = MIDDLEWARE._run(*call_args)
      File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 45, in _run
      return self._call(name, serviceobj, methodobj, args, job=job)
      File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 33, in _call
      with Client('ws+unix:///var/run/middlewared-internal.sock', py_exceptions=True) as c:
      File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 284, in __init__
      raise ClientException('Failed connection handshake')
      middlewared.client.client.ClientException: Failed connection handshake
      """
      
      The above exception was the direct cause of the following exception:
      
      Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/alert.py", line 706, in __run_source
      alerts = (await alert_source.check()) or []
      File "/usr/local/lib/python3.8/site-packages/middlewared/alert/base.py", line 210, in check
      return await self.middleware.run_in_thread(self.check_sync)
      File "/usr/local/lib/python3.8/site-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread
      return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))
      File "/usr/local/lib/python3.8/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
      result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.8/site-packages/middlewared/alert/source/quota.py", line 38, in check_sync
      datasets = self.middleware.call_sync("zfs.dataset.query_for_quota_alert")
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1258, in call_sync
      return self.run_coroutine(self._call_worker(name, *prepared_call.args))
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1294, in run_coroutine
      return fut.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
      return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
      raise self._exception
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1209, in _call_worker
      return await self.run_in_proc(main_worker, name, args, job)
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1136, in run_in_proc
      return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1110, in run_in_executor
      return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
      middlewared.client.client.ClientException: Failed connection handshake
      
      
      Current alerts:
      * nas-03... had an unscheduled system reboot.
      The operating system successfully came back online at Mon Jan 25 08:24:04 2021.
      
      * nas-03... had an unscheduled system reboot.
      The operating system successfully came back online at Mon Feb 1 07:58:21 2021.
      
      * Device: /dev/nvme0, Critical Warning (0x04): Reliability.
      * Failed to check for alert Quota:
      concurrent.futures.process._RemoteTraceback:
      """
      Traceback (most recent call last):
      File "/usr/local/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
      r = call_item.fn(*call_item.args, **call_item.kwargs)
      File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 91, in main_worker
      res = MIDDLEWARE._run(*call_args)
      File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 45, in _run
      return self._call(name, serviceobj, methodobj, args, job=job)
      File "/usr/local/lib/python3.8/site-packages/middlewared/worker.py", line 33, in _call
      with Client('ws+unix:///var/run/middlewared-internal.sock', py_exceptions=True) as c:
      File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 284, in __init__
      raise ClientException('Failed connection handshake')
      middlewared.client.client.ClientException: Failed connection handshake
      """
      
      The above exception was the direct cause of the following exception:
      
      Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/alert.py", line 706, in __run_source
      alerts = (await alert_source.check()) or []
      File "/usr/local/lib/python3.8/site-packages/middlewared/alert/base.py", line 210, in check
      return await self.middleware.run_in_thread(self.check_sync)
      File "/usr/local/lib/python3.8/site-packages/middlewared/utils/run_in_thread.py", line 10, in run_in_thread
      return await self.loop.run_in_executor(self.run_in_thread_executor, functools.partial(method, *args, **kwargs))
      File "/usr/local/lib/python3.8/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
      result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.8/site-packages/middlewared/alert/source/quota.py", line 38, in check_sync
      datasets = self.middleware.call_sync("zfs.dataset.query_for_quota_alert")
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1258, in call_sync
      return self.run_coroutine(self._call_worker(name, *prepared_call.args))
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1294, in run_coroutine
      return fut.result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 432, in result
      return self.__get_result()
      File "/usr/local/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
      raise self._exception
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1209, in _call_worker
      return await self.run_in_proc(main_worker, name, args, job)
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1136, in run_in_proc
      return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1110, in run_in_executor
      return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
      middlewared.client.client.ClientException: Failed connection handshake

       

      From nas-04 (Supermicro X11SSH-F)

      TrueNAS @ nas-04...
      
      New alerts:
      * Failed to check for alert VolumeStatus:
      Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/alert.py", line 706, in __run_source
      alerts = (await alert_source.check()) or []
      File "/usr/local/lib/python3.8/site-packages/middlewared/alert/source/volume_status.py", line 31, in check
      for vdev in await self.middleware.call("pool.flatten_topology", pool["topology"]):
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1238, in call
      return await self._call(
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1206, in _call
      return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1110, in run_in_executor
      return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
      File "/usr/local/lib/python3.8/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
      result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool.py", line 438, in flatten_topology
      d = deque(sum(topology.values(), []))
      AttributeError: 'NoneType' object has no attribute 'values'
      
      
      Current alerts:
      * nas-04... had an unscheduled system reboot.
      The operating system successfully came back online at Sun Feb 7 09:33:10 2021.
      
      * Failed to check for alert VolumeStatus:
      Traceback (most recent call last):
      File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/alert.py", line 706, in __run_source
      alerts = (await alert_source.check()) or []
      File "/usr/local/lib/python3.8/site-packages/middlewared/alert/source/volume_status.py", line 31, in check
      for vdev in await self.middleware.call("pool.flatten_topology", pool["topology"]):
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1238, in call
      return await self._call(
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1206, in _call
      return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
      File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1110, in run_in_executor
      return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
      File "/usr/local/lib/python3.8/site-packages/middlewared/utils/io_thread_pool_executor.py", line 25, in run
      result = self.fn(*self.args, **self.kwargs)
      File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/pool.py", line 438, in flatten_topology
      d = deque(sum(topology.values(), []))
      AttributeError: 'NoneType' object has no attribute 'values'

      How can i provide the TrueNAS Debug info for those two systemsprivately for the developers?

        Attachments

        1. 2021-02-18 07.53.04.jpg
          2021-02-18 07.53.04.jpg
          303 kB
        2. 20210227_nas-03-bios-controller-visible.png
          20210227_nas-03-bios-controller-visible.png
          73 kB
        3. 20210227_nas-03-out-of-swap-space.png
          20210227_nas-03-out-of-swap-space.png
          176 kB
        4. 20210227_nas-03-reinitializencontroller-mps0.png
          20210227_nas-03-reinitializencontroller-mps0.png
          131 kB
        5. 20210304_2200_freenas-debug.log
          5.15 MB
        6. 20210304_2230_freenas-debug.log
          5.23 MB
        7. 20210304-nas-03-debug-cronjobs.png
          20210304-nas-03-debug-cronjobs.png
          26 kB
        8. 20210308_vmstat-m.log
          69 kB
        9. 20210309_vmstat-m.log
          69 kB
        10. 20210310_vmstat-m.log
          52 kB
        11. 20210311_vmstat-m.log
          39 kB
        12. 20210312_vmstat-m.log
          69 kB
        13. 20210313_vmstat-m.log
          69 kB
        14. 20210314_vmstat-m.log
          69 kB
        15. 20210315_2300_freenas-debug.log
          2.78 MB
        16. 20210315_2300_top.log
          0.6 kB
        17. 20210315_2315_freenas-debug.log
          2.77 MB
        18. 20210315_2315_top.log
          0.6 kB
        19. 20210315_2330_freenas-debug.log
          2.77 MB
        20. 20210315_2330_top.log
          0.6 kB
        21. 20210315_2345_freenas-debug.log
          2.74 MB
        22. 20210315_2345_top.log
          0.6 kB
        23. 20210315_vmstat-m.log
          69 kB
        24. 20210316_0000_freenas-debug.log
          2.76 MB
        25. 20210316_0000_top.log
          0.6 kB
        26. 20210316_0730_freenas-debug.log
          1.49 MB
        27. 20210316_0730_top.log
          0.6 kB
        28. 20210316_vmstat-m.log
          3 kB
        29. 20210317_vmstat-m.log
          67 kB
        30. 20210318_vmstat-m.log
          69 kB
        31. 20210319_0900_freenas-debug.log
          2.10 MB
        32. 20210319_vmstat-m.log
          26 kB
        33. 20210322_2145_freenas-debug.log
          3.04 MB
        34. 20210322_vmstat-m.log
          62 kB
        35. 20210323_0045_freenas-debug.log
          1.49 MB
        36. 20210323_vmstat-m.log
          27 kB
        37. debug-nas-03-20210209124537.tgz
          817 kB
        38. debug-nas-03-20210218085417.tgz
          855 kB
        39. debug-nas-03-20210227105555.tgz
          843 kB
        40. debug-nas-04-20210209124643.tgz
          850 kB
        41. image-2021-03-04-19-33-37-708.png
          image-2021-03-04-19-33-37-708.png
          26 kB
        42. image-2021-03-17-08-27-55-673.png
          image-2021-03-17-08-27-55-673.png
          4 kB

          Attachments

            JEditor

              Issue Links

                Activity

                  People

                  Assignee:
                  releng Triage Team
                  Reporter:
                  wolfgangd Wolfgang Demeter
                  Votes:
                  0 Vote for this issue
                  Watchers:
                  4 Start watching this issue

                    Dates

                    Created:
                    Updated:
                    Resolved: