Uploaded image for project: 'TrueNAS'
  1. TrueNAS
  2. NAS-108890

python3.8 core dump in TrueNAS 12.0U1

    XMLWordPrintable

    Details

    • Impact:
      Critical

      Description

      On my install, the version of TrueNAS listed above seems to have problems with python3.8 dumping core at a certain point, which in turn stops the GUI from running.

      The web GUI can be manually restarted by ssh'ing into the box and running "service middewared restart" as root.

      The crashes are restricted to the python3.8 executable, which makes me think this is a software bug and not a hardware one. The hardware survives multiple MemTest86 tests, which makes me think it's a software issue and not a hardware one.

      It's a little disturbing to see python itself dumping core. Python 3.8.5 has been known to dump core -- see for example https://bugs.python.org/issue37135 .

      I don't know how the developers are rolling python3.8 for TrueNAS, but I might suggest making sure that python3.8 is built from the latest possible sources. 12.0U1 seems to be stuck on 3.8.5.  Additionally, it might be good to have a watchdog process catch when python3.8 has crashed, and restart the middlewared process or at least report the crash. Thoughts on how I can get more debug info to the developers?

      /var/log/message:

      Dec 30 04:18:47 truenas 1 2020-12-30T04:18:47.118906-08:00 truenas collectd 3353 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
      Dec 30 09:24:21 truenas kernel: pid 506 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
      Dec 30 09:24:21 truenas kernel: pid 17779 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
      Dec 30 09:24:24 truenas kernel: pid 17789 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
      Dec 30 12:14:38 truenas kernel: pid 18068 (python3.8), jid 0, uid 0: exited on signal 11 (core dumped)
      Dec 30 15:41:27 truenas 1 2020-12-30T15:41:27.133678-08:00 truenas collectd 3353 - - nut plugin: nut_connect: upscli_connect (localhost, 3493) failed: Connection failure: Connection refused
      Dec 30 16:57:12 truenas 1 2020-12-31T00:57:12.022530+00:00 truenas devd 429 - - notify_clients: send() failed; dropping unresponsive client
      Dec 30 16:57:12 truenas kernel: pid 446 (python3.8), jid 0, uid 0: exited on signal 4 (core dumped)
      Dec 30 17:01:27 truenas 1 2020-12-30T17:01:27.199768-08:00 truenas collectd 3353 - - Traceback (most recent call last):
      File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
      with Client() as c:
      File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
      self._ws.connect()
      File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
      rv = super(WSClient, self).connect()
      File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 216, in connect
      self.sock.connect(self.bind_addr)
      ConnectionRefusedError: [Errno 61] Connection refused
      Dec 30 17:06:27 truenas 1 2020-12-30T17:06:27.199002-08:00 truenas collectd 3353 - - Traceback (most recent call last):
      File "/usr/local/lib/collectd_pyplugins/disktemp.py", line 62, in read
      with Client() as c:
      File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 281, in __init__
      self._ws.connect()
      File "/usr/local/lib/python3.8/site-packages/middlewared/client/client.py", line 124, in connect
      rv = super(WSClient, self).connect()
      File "/usr/local/lib/python3.8/site-packages/ws4py/client/__init__.py", line 216, in connect
      self.sock.connect(self.bind_addr)
      ConnectionRefusedError: [Errno 61] Connection refused

       

      gdb debugging session, after core dump:

      # gdb /usr/local/bin/python3.8 /python3.8.core
      GNU gdb (GDB) 9.1 [GDB v9.1 for FreeBSD]
      Copyright (C) 2020 Free Software Foundation, Inc.
      License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
      Type "show copying" and "show warranty" for details.
      This GDB was configured as "x86_64-portbld-freebsd12.0".
      Type "show configuration" for configuration details.
      For bug reporting instructions, please see:
      <http://www.gnu.org/software/gdb/bugs/>.
      Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.

      For help, type "help".
      Type "apropos word" to search for commands related to "word"...
      Reading symbols from /usr/local/bin/python3.8...
      (No debugging symbols found in /usr/local/bin/python3.8)
      [New LWP 100866]

      warning: Section `.reg-xstate/100866' in core file too small.
      Core was generated by `/usr/local/bin/python3.8 /usr/local/bin/midclt call --job true --job-print descr'.
      Program terminated with signal SIGILL, Illegal instruction.

      warning: Section `.reg-xstate/100866' in core file too small.
      #0 0x0000000800472524 in _PyEval_EvalFrameDefault () from /usr/local/lib/libpython3.8.so.1.0
      (gdb) bt
      #0 0x0000000800472524 in _PyEval_EvalFrameDefault () from /usr/local/lib/libpython3.8.so.1.0
      #1 0x00000008012a6410 in ?? ()
      #2 0x0000000800aae9f0 in ?? ()
      #3 0x0000000000000004 in ?? ()
      #4 0x00000008003fe788 in ?? () from /usr/local/lib/libpython3.8.so.1.0
      #5 0x00000008004768d0 in _PyEval_EvalCodeWithName () from /usr/local/lib/libpython3.8.so.1.0
      #6 0x00000008003aa26b in _PyFunction_Vectorcall () from /usr/local/lib/libpython3.8.so.1.0
      #7 0x0000000800475c04 in ?? () from /usr/local/lib/libpython3.8.so.1.0
      #8 0x0000000800472e12 in _PyEval_EvalFrameDefault () from /usr/local/lib/libpython3.8.so.1.0
      #9 0x00000008004768d0 in _PyEval_EvalCodeWithName () from /usr/local/lib/libpython3.8.so.1.0
      #10 0x00000008003aa26b in _PyFunction_Vectorcall () from /usr/local/lib/libpython3.8.so.1.0
      [snip]
      #141 0x000000080046c593 in PyEval_EvalCode () from /usr/local/lib/libpython3.8.so.1.0
      #142 0x00000008004b81ae in ?? () from /usr/local/lib/libpython3.8.so.1.0
      #143 0x00000008004b6dbc in PyRun_FileExFlags () from /usr/local/lib/libpython3.8.so.1.0
      #144 0x00000008004b632c in PyRun_SimpleFileExFlags () from /usr/local/lib/libpython3.8.so.1.0
      #145 0x00000008004d4afe in Py_RunMain () from /usr/local/lib/libpython3.8.so.1.0
      #146 0x00000008004d503b in ?? () from /usr/local/lib/libpython3.8.so.1.0
      #147 0x00000008004d50ba in Py_BytesMain () from /usr/local/lib/libpython3.8.so.1.0
      #148 0x00000000002017d0 in _start ()

       

      Hardware configuration:

      CPU: AMD Ryzen 5 3600 6-Core Processor
      Memory: 64 GB
      Motherboard: ASUS TUF GAMING X570-PLUS (WI-FI)
      Graphics: NVidia GeForce 710
      2 Intel PRO/1000 PCI-Express Network interface compatible cards
      nvd0: WDS100T3X0C-00SJG0 WD Black SN750 NVMe SSD
      nvd1: WDS100T3X0C-00SJG0 WD Black SN750 NVMe SSD
      ada0 boot device: Samsung SSD 860 EVO 500GB RVT04B6Q, Serial Number S598NJ0NA16137K
      ada1: HGST HUS728T8TALE6L4 V8GAW4J0
      ada2: HGST HUS728T8TALE6L4 V8GAW4J0
      ada3: HGST HUS728T8TALE6L4 V8GAW4J0

        Attachments

          Attachments

            JEditor

              Activity

                People

                Assignee:
                releng Triage Team
                Reporter:
                johnwbyrd John Byrd
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved: