kubernate / docker orphans unable to be dele
Description
Problem/Justification
Impact
relates to
Activity
Anyways, I have filed the upstream issue @ https://github.com/moby/moby/issues/43080 with high hopes.
We're going to drop zfs driver as soon as we get overlay_fs support which is being tracked @ https://jira.ixsystems.com/browse/NAS-109036.
Closing.
Hi @Jordan L , I couldn't reproduce this despite a couple tries (i.e. initiate pods removal & quit the vm). We use openebs zfs operator. This seems really an edge case where docker zfs driver couldn't complete its operation and ended up having malformed state (mere a hypothesis).
This is not an issue on our side as I have seen it happen for people using docker-compose with zfs driver: https://help.nextcloud.com/t/solved-cannot-start-service-db-error-creating-zfs-mount-after-zfs-snapshot-rollback/61695 and they also used the same workaround as you to mitigate the problem. At this point, I can only file a 3rd party ticket describing this problem but I am unsure whether its gonna get much traction given the steps to reproduce aren't clear enough. By the way, are you able to see this still time to time? If so, some concrete set of steps to reproduce would be a big plus! Please let me know
Thanks!
I did save a debug file and reviewed it at the time, but there's too much private data in there for me to submit that file publicly. So I opted to not do it and live with this bug.
I've recently resolved this issue by following this users advice who also had the problem. While this bug might still exist, this work around seems to roughly have resolved it.
https://www.truenas.com/community/threads/openebs-zfs-driver-removal-in-progress-for-4-weeks.96192/
Essentially I restarted k3s service to get openebs running again.
service k3s restart
Then I'd troll /var/log/syslog for failed deletes of pod datasets
2021-11-28 21:38:56
Marking for deletion Pod ix-frigate/frigate-ix-chart-84764769d6-btbwq
2021-11-28 21:38:56
Cancelling deletion of Pod ix-frigate/frigate-ix-chart-84764769d6-btbwq
2021-11-28 21:38:44
Then I'd recreate those missing datasets and allow openebs to delete them so that it can finish it's task.
zfs create ix-frigate/frigate-ix-chart-84764769d6-btbwq
rinse / repeat until all the errors are gone.
Now these errors no longer show up in my logs and I'm a happy camper.
Also I've updated to most recent TrueNAS-SCALE-22.02-RC.1-2 and this bug still persisted until I resolved it manually.
Thank you for the report, @Jordan L .
Can you please attach a debug file to this ticket? To generate a debug file on TrueNAS CORE, log in to the TrueNAS web interface, go to System > Advanced, then click Save Debug and wait for the file to download to your local system. In TrueNAS SCALE, this option is in System Settings > Advanced.
I'm faily new to truenas scale, zfs and kubernates, but have good knowledge of linux and medium knowledge of docker.
I have some orphaned kubernate/docker images, which TrueNas SCALE attempts to delete every minute or so and I get these messages repeated in /var/log/syslog
See attached error log
https://jira.ixsystems.com/plugins/servlet/jeditor_file_provider?imgId=ckupload202109184873100875622001987&fileName=TrueNas-error.log
This server is new and originally I installed SCALE 21.06-BETA.1 and upgraded to SCALE 21.08-BETA.1
How this could have happended is because two things (i assume).
1. When I first setup my server I had replication & scrubbing tasks to run at midnight and that would crash my server every night until I staggered those (another bug, but not what we're here for). I think this also left replication tasks in 1/2 state perhaps.
2. Additionally I've had multiple power cuts on my server because people were doing electrical work and well....it happened.
So those are great edge cases which could have allowed this to happen I assume.
If you guys could give me some commands so I can see where this command is coming from so I can clean things up that would be super.
some other debug info:
root@truenas:~# ls -ltrh /mnt/IronWolf_3TB/ix-applications/docker/8818edc0d76e545f26fe7f644b69188bd484757df0b3387a960da0ca391fec9b
ls: cannot access '/mnt/IronWolf_3TB/ix-applications/docker/8818edc0d76e545f26fe7f644b69188bd484757df0b3387a960da0ca391fec9b': No such file or directory
root@truenas:~# k3s crictl ps -a | grep 15408
154086d23d1f3 k8s.gcr.io/sig-storage/csi-resizer@sha256:7a5ba58a44e0d749e0767e4e37315bcf6a61f33ce3185c1991848af4db0fb70a 2 weeks ago Unknown csi-resizer 37 6c79b37429e6f