openebs-zfs-controller-0 crashes
Description
Problem/Justification
Impact
Activity
j_r0dd July 10, 2021 at 3:21 PM
Well I set the OPENEBS_NAMESPACE vars to kube-system instead of openebs in the manifest. I was hopeful last night as it went about 5 hours without the zfs-operator crashing, but woke up to 15 of them today. A couple other people also tested this and got similar results. It is possible that the zfs plugin is not causing the issue, but something system related is causing the operator to crash. I have been noticing that the zfs system process spikes to 99% cpu usage every 15 seconds or so.
j_r0dd July 9, 2021 at 9:24 PM(edited)
I just wanted to report my findings on this since I am also seeing weirdness. I believe it has to do with the namespace. The zfs-operator.yaml has some things in the kube-system namespace as well as some things in the openebs namespace. OPENEBS_NAMESPACE var is set to openebs even though the deployments and statefulset is deployed to kube-system. I'm going to manually patch on my system for testing. Even though I do not use zfs storage class for any of my pvc's I do experience random hiccups when this container crashes. Here is the snippet that caught my attention from the beginning of the manifest which led me to look deeper.
Waqar July 7, 2021 at 1:46 PM
thank you for laying it out, i am not sure what might be at play here as it could be an upstream openebs issue. I'll look forward to narrowing it down perhaps with upstream k3s/openebs csi as that would then not contain any of our modifications. I also see it restarting frequently in my test cluster but functionality is not affected and storage works as desired, anyways we should still be raising this appropriately. Thanks!
Waqar July 7, 2021 at 1:46 PM
that works with me, thank you.
Kjeld Schouten-lebbing July 7, 2021 at 8:25 AM
I've added logs from my bare metal machine (ZFS-Provisioner Logs.txt), this one also has a super high restart counter (into the thousand).
But does not give the errors like the logs from show.
So we basically have two seperate issues going on at the same time:
The issue where it isn't starting at all ( )
The issue where the restart counter goes up into the hunderds (mine and another dude on the forums)
csi-resizer, csi-snapshotter, snapshot-controller, csi-provisioner, are crashing even on the fresh truenas installation. All settings are default except that pool was set for applications.
Hardware:
MOBO: Supermicro X11SCL-F
CPU: Intel(R) Xeon(R) E-2124 CPU @ 3.30GHz
32 RAM
2x WDC_WD40EFZX - Mirror
WDC WDS250G2B0C-00PXH0 - boot