File Copies Hang
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Alexander Motin June 28, 2021 at 5:19 PM
The fact that disabling sync helps is important here. NFS unlike other protocols provides clients a way to control write caching to prevent data loss in case of NAS crash. For that client can send cache flush requests either with every request or from time to time. Both TrueNAS and ZFS are taking those flush requests very seriously, pushing all the unwritten data into intent logs and flushing write cache. RAIDZ2 of spinning disks is not a good configuration for that kind of workload. You'd need to either disable the sync or add slog device, depending on your requierments. Reconfiguration of your pool from 4-wide RAIDZ2 to two 2-wide mirrors may also improve performance while keeping comparable reliability, but whether it be enough for sync to be acceptable depends on specific request patterns sent by your NFS client.
Steven Wormuth June 26, 2021 at 8:33 PM(edited)
In the above image, you see the disk activity with SYNC ON as the ~75% disk usage. This was from copying a 3.8 GB file which stalls and fails after a great deal of time. In the space that follows I copied two files of 3.5 and 4.5 GB with SYNC OFF and you see the disk usage doesn't peak above 20%, and the copies took a few minutes.
Something in the way SYNC is telling NFS to wait is stalling my network and causing endless loops of retransmitting packets. I stand by my assertion that this is a bug in either TrueNAS or ZFS itself. I'm not saying that it couldn't be specific to my configuration, but I'm not doing anything out of the ordinary here. I've got a mapped share on Linux Mint. I should be able to copy a file to a NAS with the standard settings.
Steven Wormuth June 26, 2021 at 7:42 PM(edited)
So no other ideas here? I should just reinstall everything and try a different ZPOOL setup? That's where we are?
FYI, if I turn off SYNC on the dataset the problem stops.
Steven Wormuth May 23, 2021 at 10:27 PM
I have no disk at the moment to add a SLOG device. Maybe down the road.
The interesting thing here is that if I copy large amounts of data in smaller increments, I have no issue at all.
That hump in disk activity represents me copying over about 15GB worth of data. I threw three separate folders of 4-5GB each over one by one. The files in those folders were 150-400MB in size. The total of writing all that to this pool took 20 minutes and the disks were about 15% busy.
Now I'm copying one 5GB file over and we're right back to an hour and 80% disk activity.
The NAS has no pool issues. With a large file, it keeps throwing TCP_ZERO_WINDOW errors as though it's super busy, when it should have no issues writing the data, as seen above.
Alexander Motin May 20, 2021 at 6:01 PM
, From one side seeing Realtek NICs traditionally make me shiver. But considering the amount of disk activity I see I am not sure it is a networking problem. Looking on the provided pcaps I see there plenty of small random writes, interleaved with periodic COMMIT calls. Whatever you say is this workload, but does not look like a simple sequential large file write to me. I'd propose you to look closer on in, may be including your NFS client settings.
Your pool of single RAIDZ2 vdev does not have much performance for this kind of high IOPS workload. Also the regular NFS COMMITs must be translated by ZFS into synchronous ZFS Intent Log writes, for which single RAIDZ2 vdev of HDDs is the worse storage. So if you care about the data, then you should add fast SLOG device into your pool. Alternatively you may set sync=disabled for specific datasets, but then in case of power loss or NAS crash you may experience some data loss. At least you could experiment with this option live to see whether it help with your problem. Also you may monitor output of `gstat -I 1s -po` to see whether there are reads/writes, how many/big they are and how many "other" operations there, which are cache flushes. If you see a lot of cache flushes (more than few a second, closer to hundreds), it means you do may need fast SLOG device for your workload, or you need to change the workload.
RAIDZ2 of 4 disks do not make much sense to me. Out of the same 4 disks you could create two mirrors, providing the same (or even slightly better) capacity, but dramatically higher performance. Yes, it may not survive two any disk failures same time, but in cases where such level of reliability required, it is usually to achieve with backups.
What's about NAS continuing disk activity after client reboot. ZFS can accumulate several gigabytes of dirty data in RAM to write in background. It is the more obvious for systems with lots of RAM, but very slow storage. Though several minutes is pretty long indeed.
This problem occurs when I copy a file using NFS to a TrueNAS share. The client is Linux mint, and I have mounted an NFS share on that machine. It generally works perfectly, but during large file transfers the copy hangs for extremely long periods, although activity doesn't stop.
For example, I recently tried to copy a 9 GB file to the share, and after two days, only 6 GB had completed. Transfer happens from the client in bursts, and it seems as though the transfer is waiting for the next signal from TrueNAS that it is ready for the next burst of data.
I have attached two screenshots. The first shows that disk activity is continuous even though the transfer seems to stall. If I force a reboot of the client, the disk writes continue for several minutes after the client reboots, and then finally subsides as shown in the second screenshot. So the disk writing at 60-80% continues for up to 10 minutes after the client is no longer attempting to write.
I get no errors, but what on earth could TrueNAS be writing that takes this long? After two full days of continuous disk activity, I only got two-thirds of a 9 GB file over? It seems the disk is just thrashing about in endless writes, but zpool status is healthy.
Where should I start looking for errors? This is just a home NAS with not much stress on the system.