Uploaded image for project: 'TrueNAS'
  1. TrueNAS
  2. NAS-108965

Box crashes on load

    XMLWordPrintable

    Details

      Description

      This box regularly crashes when causing higher load via nfs4.

      Environment:
      TN12U1 acts as nfs4 attached datastore for ESXi 6.7U3.
      When I migrate VMs to or from TN to another datastore (vSan) the TN box crashes after a while. This happens both on reads and writes.

      Unfortunately I changed a bunch of things from back when it was stable,
      replaced 12 SAS3 SSDs with 4 NVME drives (Dell branded), added a MLX5 NIC (100G connected), updated to TN 12.1;
      so I am not sure which change was the one that actually broke things.

      My first suspicion were power issues with the NVMe drives, but I now have a 1k PSU running and my power measurement only shows ~200 to 300W usage; i evenly distributed the nvme drives to the power connector to prevent overload of individual rails.

      The box also seems to randomly reboot sometimes; I get error messages that an unexpected reboot has occured and the boot volume was checked ok.
      I had these before as well, but attributed them to running an NVME drive as system drive which I have since removed; OS has been reinstalled and config restored.

      Sometimes (every other/third boot), the systemd does not boot properly but ends up with a kernel panic. Another reboot usually resolves this.

      The system has been stable while running 11.U3 with SSDs, reboots with kernel panics only happen since 12, as mentioned I am not sure if the reboots under load are HW related (NVME) or also TN 12 related.

      I tried capturing core files but couldnt see any.

      Thanks

        Attachments

          Attachments

            JEditor

              Activity

                People

                Assignee:
                releng Triage Team
                Reporter:
                Rand Thomas Rottig
                Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                  Dates

                  Created:
                  Updated:
                  Resolved: