zfs send errors out if sending > around 1000 snaps?
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
Stilez September 4, 2020 at 3:00 PM(edited)
It looks like that fix made it into BETA 2.1 and something's happened. This is what I have rerunning dtrace from above, and zfs zend -r (EVERYTHING IN POOL) > /dev/null:
Relevant pool context for this:
tl;dr -
In beta2, >1000 or so snaps failed, and the nvlist size requested for the failed send was 182 K.
In beta2.1 the same command on the full pool and latest snap, with 11.7k snaps in the request and an nvlist request size of 1.04 M, has no visible issue.
I think it might be fixed from here, let's hope so! Thank you!
Stilez August 14, 2020 at 12:11 AM(edited)
- I love wqhat ix is doing with FreeBSD/ZFS, and of course I'd help. I'd like to help more but the knowledge needed is considerable. Running debug stuff is very yes. Thank you for taking the time to produce the dtrace one-liners that made it possible!
(I thought the "max size" was number of entries, not number of bytes. So I was concerned that 1m limit was seen as very large but I was easily hitting tens of millions. That makes more sense now!!! Thanks very much for explaining so I can learn more and maybe help better over time)
Ryan Moeller August 13, 2020 at 11:18 PM
The size of an nvlist is mostly the length of the strings themselves (one byte per character in the dataset+snapshot, including whatever extra characters are added internally to create a unique temporary hold name), plus some padding to round out the pointer offsets, plus pointers to each string and linked list pointers for each string, so all in all I thing ~100 bytes per item seems on the order of a reasonable ballpark.
~25MiB or even ~40MiB is still considerably below the ~600MiB limit estimated for a system with 8GiB installed memory, so I don't think anyone will ever have to worry about the limit again and I may even decide to lower it considerably more. 128MiB would still leave plenty of headroom. Keep in mind this is a limit per individual ioctl, so whatever operations are going on concurrently would be limited by available memory in total, not this software limit.
Thank you for reporting this issue and for doing the inspection with dtrace to see what kind of sizes you are dealing with. On the order of tens of mebibytes confirms that you are exceeding the existing limit and will be satisfied by the new limits once they have been instated.
Stilez August 13, 2020 at 10:17 PM(edited)
I ran the same command as caused the issue above, namely "zfs send -R... POOL@NOWSNAP > /dev/null".
"zfs list -t snap -r POOL | wc -l" reports ~ 18k snaps in the pool.
dtrace reports was trying for max nvlist size 1.8m.
My old dataset had around 120k snaps when I cleaned it out (15 minutes and never seemed to remove old ones!), so I guess if it scales, and it's working on 2 pools and a few other ops simultaneously, max size could be at real risk of topping say 15~20 x that size overall on a busy maintenance week, or about max nvlist size I'd need to expect ~ 40m. I hope that's useful. It's not an especially large or busy pool.
But at least if a task fails due to this issue, I have a way to assess what size might help it it not fail. Thank you!
REALISM TEST
As a realism test I worked on my temp pool, which has just 1.8k snaps across 28 datasets: I took a -r snap, counted the nvlist size when replication failed, took another -r snap, and counted the new nvlist size.
Snaps = 1860, nvlist max size = 182084 (~97.90 nvlist per snap)
Snaps = 1888, nvlist max size = 184548 (~97.75 nvlist per snap)
+28 snaps = +2464 nvlist size (~88.00 nvlist per snap)
So it seems fair that replicating both pools when 15 minute snaps have built up, that involves say 150k ~ 250k snaps per pool, could invoke about 15m ~ 25m nvlist size.
Add safety margin for other operations and 40m max "as a precaution" from before, actually starts to sound huge but plausible, rather than huge+crazy??
Perhaps there's a valid question why it's trying to add nvlist size +100 per snap? Is that an unexpected figure?
TEST DATA:
12-BETA1 newly upgraded to 12-BETA2
First run:
Second run:
Ryan Moeller August 12, 2020 at 10:22 PM
And to see roughly what you can expect the new limit to be set at,
I don't know how best to report this. Its a pure ZFS send issue that wasnt the case on 11.3, so logs show nothing, and it doesnt cause a crash or exception either, so theres no system dumps.
I have a pool with this exact structure:
MY_POOL
MY_POOL/AAA
MY_POOL/BBB
MY_POOL/CCC
MY_POOL/CCC/DDD
MY_POOL/CCC/DDD/E1
MY_POOL/CCC/DDD/E2
MY_POOL/CCC/DDD/E3
MY_POOL/CCC/DDD/E4
MY_POOL/CCC/DDD/E5
MY_POOL/CCC/DDD/E6
MY_POOL/CCC/DDD/E7
MY_POOL/CCC/DDD/E8
MY_POOL/CCC/DDD/E9
MY_POOL/CCC/DDD/E10
MY_POOL/CCC/DDD/E11
MY_POOL/CCC/E12
MY_POOL/CCC/E13
MY_POOL/CCC/E14
(~ 16000 snaps total in pool)
I've replicated this pool many times to my backup NAS, at times when the snapshot count was around 80000. But it wont zfs send on 12-BETA, and the issue looks like some extremely low limit (~ 1000?) on the number of snapshots it'll send.
If I try to zfs send anything more than about 1000 snaps, I get an error at the "send" side.
total estimated size is nnnG
cannot hold: operation not supported by zfs kernel module
cannot send 'Main_pool/User_files': operation not supported by zfs kernel module
The issue isn't about the size of the send - it will send -R the first 1000 snaps from 2017 to July 2018, which total 35 TB. But if I try to send the rest of the pool after it, using send -R -I from that snap to the current snap, or to the 2018-12 or even the 2018-10 dated snaps, I get the above error. If I reduce the number of snaps in the stream to around 1000, by only sending -I from the end of the 35TB stream to 2018-09-01, which is around 1000 snaps, it'll send that happily, and let me send a followup stream, as long as that doesnt have more than about 1000 snaps in it.
As the error doesnt result in a log entry, I'm not sure what other info would help. dtrace maybe, if I was told what to run?