There are 6 replication tasks that run every hour, but 10m offset from each other due to yet other issues running more than 2 replication jobs at once causing the system to panic.
At some point a job will encounter the following exception:
[2021/03/05 21:20:16] WARNING [retention] [zettarepl.zettarepl] Remote retention failed on <SSH Transport(email@example.com)>: error listing snapshots: SSHException('Timeout opening channel.')
After that point ALL replication jobs will be stuck in WAITING status claiming the job is already running.
The only way to clear this state is to reboot the system.
Again, I've checked the box to attach a debug, but I suspect the bug about running the debug still exists and thus it won't be automatically attached.