Schrödinger's Datasets
Description
Problem/Justification
Impact
Activity
Ryan Moeller October 13, 2021 at 4:44 PM
Nothing prevents your datasets from being mounted normally and then one rogue dataset being mounted over /mnt and hiding all of them. Then as you observed, ZFS will be convinced they are mounted because they are, and you'll be confused because you can't see them or interact with them.
I'm closing this ticket as it's not anything we have any control over when you're manually replicating things around, you'll have to get that right yourself or use the replication features within the TrueNAS UI.
Jacob McDonald October 8, 2021 at 7:40 PM
Yes, that's a good hypothesis, but doesn't zfs automatically create the mountpoint subdir if needed? And that should have still worked since the other datasets were not mounted ro.
Order of mounting may have something to do with it, but I'm still unsure because:
If the /mnt dataset was mounted first, then mounting the vol1 and ssd_scratch datasets on top would have zero impact.
If the vol1 and ssd_scratch datasets were mounted in /mnt first, then the subsequent dataset mount action to /mnt should have failed?
But vol1 was accessible and ssd_scratch was not.
Another related issue here: this may be what caused the loss of one of my TrueCharts pvc datasets since they seem to sometimes be mounted in /mnt/var for some bizarre reason. If the container dataset mount failed over top of /mnt/var, those files would have been written into the other dataset and then disappeared after I fixed the problem. Still a little strange since presumably a mount of the dataset would have failed, which should have caused the container creation pipeline to fail, but since somethings seem to have mounted and others didn't, it's possible it reported as mounted but writes were actually going to the other dataset instead of the intended dataset.
Ryan Moeller October 8, 2021 at 4:13 PM
So I suppose the dataset mounted over /mnt is preventing the other datasets from being unmounted, since their mountpoints are now hidden.
Jacob McDonald October 6, 2021 at 7:07 PM
It wasn't using TrueNAS replication; this is manual zfs send | ssh zfs recv
, from Ubuntu workstation to TrueNAS CORE (and now TrueNAS SCALE, but I have not done another send since migrating to SCALE) in order to push a backup of the workstation to the NAS.
I was pursuing setting mountpoint=none
, but on the forums a user suggested canmount=noauto
, and that's much cleaner. I suspect this does prevent me from doing an easy restore if I needed to though, without setting that canmount=auto
after the restore, but at least I won't have to maintain any state on the mountpoints.
This is a mess:
$ sudo zfs list -r vol1/remote_backups/nuc5_ubuntu
NAME USED AVAIL REFER MOUNTPOINT
vol1/remote_backups/nuc5_ubuntu 31.3G 8.57T 153K /mnt/vol1/remote_backups/nuc5_ubuntu
vol1/remote_backups/nuc5_ubuntu/ROOT 23.4G 8.57T 153K /mnt/vol1/remote_backups/nuc5_ubuntu/ROOT
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m 23.4G 8.57T 3.66G /mnt
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/srv 332K 8.57T 153K /mnt/srv
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/usr 920K 8.57T 153K /mnt/usr
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/usr/local 767K 8.57T 204K /mnt/usr/local
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var 18.4G 8.57T 153K /mnt/var
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/games 153K 8.57T 153K /mnt/var/games
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/lib 18.3G 8.57T 1.25G /mnt/var/lib
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/lib/AccountsService 677K 8.57T 179K /mnt/var/lib/AccountsService
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/lib/NetworkManager 1.67M 8.57T 243K /mnt/var/lib/NetworkManager
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/lib/apt 92.1M 8.57T 83.0M /mnt/var/lib/apt
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/lib/dpkg 93.8M 8.57T 50.8M /mnt/var/lib/dpkg
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/lib/libvirt 16.8G 8.57T 249K /mnt/var/lib/libvirt
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/lib/libvirt/images 16.8G 8.57T 11.0G /mnt/var/lib/libvirt/images
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/log 89.9M 8.57T 48.1M /mnt/var/log
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/mail 153K 8.57T 153K /mnt/var/mail
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/snap 1.11M 8.57T 856K /mnt/var/snap
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/spool 447K 8.57T 179K /mnt/var/spool
vol1/remote_backups/nuc5_ubuntu/ROOT/ubuntu_hrkc9m/var/www 153K 8.57T 153K /mnt/var/www
vol1/remote_backups/nuc5_ubuntu/USERDATA 7.82G 8.57T 153K /mnt/vol1/remote_backups/nuc5_ubuntu/USERDATA
vol1/remote_backups/nuc5_ubuntu/USERDATA/root_kopi51 1.17M 8.57T 1.17M /mnt/vol1/remote_backups/nuc5_ubuntu/USERDATA/root_kopi51
vol1/remote_backups/nuc5_ubuntu/USERDATA/yottabit_kopi51 7.81G 8.57T 7.81G /mnt/vol1/remote_backups/nuc5_ubuntu/USERDATA/yottabit_kopi51
I don't know why this didn't interfere with my other zpool mounted on /mnt/vol1, and I don't know why this behavior is different in Linux/SCALE v. BSD/CORE.
Now that I have a workaround, I'm unsure how I should proceed backing up my zfs datasets from the workstation without clobbering the TrueNAS SCALE filesystem mounts.
Ryan Moeller October 6, 2021 at 3:32 PM
It is a known issue that lsof doesn't work entirely correctly on ZFS on Linux. From the looks of things, the main issue here seems to be the replication behavior changing somehow from CORE. There is no replication information in the debug unfortunately. Are you doing the send/recv manually?
Hello, attached please find debug info for the problem of 2 of 3 datasets that are both mounted and unmounted at the same time.
Please refer to the forum post for all of the details: https://www.truenas.com/community/threads/schr%C3%B6dingers-datasets-mounted-not-at-the-same-time.95257/
I'm happy to provide system access for debug and/or video call to present the problem (e.g., Google Meet).
Thank you!