rsync in TrueNAS 12.0 Beta appears to be throttled.
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Rereading this again I don't see a single NIC model to be mentioned. I am thinking about it since we recently found performance issues affecting Intel, VMware and some other NICs, not affecting iperf. So I am proposing to update to upcoming TrueNAS 12.0-U2 fixing it and retesting again. Closing the ticket for now.
Source is a homebrew abomination... Basically four RAID5 arrays appended, across 16 drives...
Destination is 16 drives, split into two Z2 vdevs, in one pool...
(I think I've got the terminology correct...)
File size ranges anywhere from a few kilobytes, to multi-gigabytes, the aforementioned rsync that was still running was working on files at least 100MB each.
What's about disks? If you are copying many small files not in cache speed can be limited by HDD head seeks. Mentioned 160MB/s from single disk is reachable only on sequential I/O, that is often not the case. And if you have many disks in your pool, multiple rsyncs indeed may scale.
Reading it again, I wasn't quite as clear as I could've been in the original post.
When I said I had multiple SSH Windows, I was running them in parallel. They were all pulling from the same Server/RAID Array, and writing to the same TrueNAS Pool.
The first instance of rsync capped itself around 18MB/s, which seemed odd. So I opened up new windows, and started up additional instances of rsync to see what would happen
They too were capping out around 18MB/s... At the peak, I had five instances running simultaneously between the same systems, all running at 18MB/s.
While those were running, I SSH'd into an Ubuntu VM on a third server, and did a test rsync from the source server, and that one almost immediately saturated the VM's 1Gbe link with a 100MB/s transfer.
My desktop, the old server, the TrueNAS server, and the ESXi machine are all directly connected at 10Gbe to a 10Gbe switch (UBNT XG-16). The TrueNAS server is actually a 40Gb LAGG.
There's an rsync still running that is pulling from a one-off drive in the source server. Right now it is reporting that transfer is around 17.54MB/s.
I just copied 6GB (ensuring RAM/SLOG doesn't catch it) of data from that drive using SMB via my Desktop into the TrueNAS pool, and the transfer maintained around 160MB/s. That's about what I'd expect from a single-drive of spinning rust. Of note is while that was going on, the rsync dropped to about 2MB/s, and once it finished, bounced back up to ~18MB/s...
I agree that it makes zero sense that it would be throttled, but slicing the problem various ways, it is almost like rsync was compiled with bwlimit hardcoded.
Checking the suggested top, ssh/rsync were around 10-16% WCPU each. That appears to be a per-core percentage, which the TrueNAS box reports 32 (Dual E5-2660s). Watching top for a bit, it "feels" like they spend a bunch of time in the "select" state, and changing CPUs every few seconds... My cursory searches for an actual meaning for that state did not bear fruit.
Lastly, here's the results of iperf between TrueNAS and the old server, both directions.
[root@nas /var/log]# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 64.0 KByte (default)
------------------------------------------------------------
[ 4] local 10.2.2.203 port 5001 connected with 10.2.2.5 port 39664
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 10.8 GBytes 9.26 Gbits/sec
[root@nas /var/log]# iperf -c 10.2.2.5
------------------------------------------------------------
Client connecting to 10.2.2.5, TCP port 5001
TCP window size: 1.24 MByte (default)
------------------------------------------------------------
[ 3] local 10.2.2.203 port 21682 connected with 10.2.2.5 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 11.0 GBytes 9.40 Gbits/sec
[root@nas /var/log]#
Ryan, there can be many reasons why rsync is slow. You'd have to check all of them: make sure your network really works fine (run iperf between the hosts in both directions), check for CPU bottleneck (no thread get closer to 100% in `top -SHIz` on either side), check for disk bottlenecks (no disks on either side show high busy time or latency; notice if you have many disks in the pool, low queue depth I/O to random disks may not look like high load), etc. I doubt that what you see is an intentional throttling.
In attempting a one-off migratione of data from my existing homebrew NAS to TrueNAS, I popped open an SSH console and started a pull. Command was..
rsync -arvh --info=progress2 -e ssh --no-perms user@10.2.2.5:/public/. /mnt/tank/Storage/
After it ran a few minutes, I noticed that the speed was right around 18MB/s despite having a 10Gbe connection.
On a whim, I opened up multiple SSH windows, and set rsync to running in multiple subdirectories. They each stopped around 18MB/s.
I tried it with --bwlimit=500000, and it changed nothing.
Originally it was using compression as well, but since the spinning rust is the bottleneck, I took that off, no change.
I tried from a VM on an ESXi box, which was only a 1Gbe connection to the source server, and it immediately started pulling around 100MB/s, while TrueNAS rsync was pulling the 18MB/s many times over in other windows.
TrueNAS and the VM are both running rsync 3.1.3, the source server is running 3.1.2