replication does not work after upgrade to 12.0-U2.1
Description
Problem/Justification
Impact
SmartDraw Connector
Katalon Manual Tests (BETA)
Activity
George Kyriazis July 10, 2021 at 2:31 AM
Still exists in 12.0-U4.
Any updates on when it will be fixed?
Thanks!
George Kyriazis March 9, 2021 at 2:38 PM
Changing to /usr/local/bin/ssh does not change the behavior. ssh still works with no complaints about host key.
David Pesticcio March 9, 2021 at 1:56 PM(edited)
After reading the original conversation of this ticket more closely, I'd like to clarify a few things. (The UI was so slow, you've partially implemented one of my suggestions in the meantime!)
SSH from "Replication Task" will fail when:
The "SSH Connections" entry for <hostname> "remote host key" is not as required
SSH from the command line works when:
Putting the private key from "SSH Keypairs" into ~/.ssh/test_private.key
ssh -i ~/.ssh/test_private.key <hostname>
Accept the host key
~/.ssh/known_hosts file gets updated with the correct key
The ssh connection works just fine
The problem appears to be:
"SSH Connections" does not "discover" the appropriate host key that the "Replication Task" requires.
Hitting discover hundreds of times does NOT yield the correct key in a dependable/reliable/useful way.
Inconsistency with how the "SSH Connection" host keys are discovered, and how they are used throughout TrueNAS
Possible Solution for "SSH Connections":
Discover and display all keys, and let the user choose from the list
Have TrueNAS choose a key based upon a pre-defined ordered list
Fix the TrueNAS usage inconsistencies
BTW: An added bonus bug - even though you can add a comment to the "remote host key" text-box, and save it, you will not be able to view the logs, or edit the Source/Destination in the UI for the corresponding "Replication Task" that uses that SSH key. (I'll check if that's already been reported, and create a ticket if not. )
https://jira.ixsystems.com/browse/NAS-109730
The error:
Upgraded from 11.2-U8 to 12.0-U2.1 (with a stopover at 11.3-U5). Manual upgrade, since there was option to upgrade to 12.0 directly from 11.2. or 11.3 for that matter.
Replication before upgrade to a 12.0-U2.1 system worked fine before upgrade. After upgrade, and after fixing ssh connections, replication still does not work.
Turned on DEBUG logging in Replication Task, I got some weird output in debug.log:
[2021/03/02 19:55:05] INFO [replication_task__task_3] [zettarepl.replication.run] For replication task 'task_3': doing push from 'vol' to 'vol' of snapshot='auto-20210302.1800-2d' incremental_base='auto-20210302.1040-2d' receive_resume_token=None encryption=False
[2021/03/02 19:55:05] DEBUG [replication_task__task_3] [zettarepl.transport.base_ssh.root@vis-backup.shell.95.async_exec.5028] Running ['zfs', 'umount', 'vol']
[2021/03/02 19:55:05] DEBUG [replication_task__task_3] [zettarepl.paramiko.replication_task__task_3] [chan 60] Max packet in: 32768 bytes
[2021/03/02 19:55:05] DEBUG [Thread-194] [zettarepl.paramiko.replication_task__task_3] [chan 60] Max packet out: 32768 bytes
[2021/03/02 19:55:05] DEBUG [Thread-194] [zettarepl.paramiko.replication_task__task_3] Secsh channel 60 opened.
[2021/03/02 19:55:05] DEBUG [Thread-194] [zettarepl.paramiko.replication_task__task_3] [chan 60] Sesch channel 60 request ok
[2021/03/02 19:55:05] DEBUG [replication_task__task_3] [zettarepl.transport.base_ssh.root@vis-backup.shell.95.async_exec.5028] Reading stdout
[2021/03/02 19:55:05] DEBUG [Thread-194] [zettarepl.paramiko.replication_task__task_3] [chan 60] EOF received (60)
[2021/03/02 19:55:05] DEBUG [Thread-194] [zettarepl.paramiko.replication_task__task_3] [chan 60] EOF sent (60)
[2021/03/02 19:55:05] DEBUG [replication_task__task_3] [zettarepl.transport.base_ssh.root@vis-backup.shell.95.async_exec.5028] Waiting for exit status
[2021/03/02 19:55:05] DEBUG [replication_task__task_3] [zettarepl.transport.base_ssh.root@vis-backup.shell.95.async_exec.5028] Error 1: "cannot unmount 'vol': not currently mounted\n"
[2021/03/02 19:55:05] DEBUG [replication_task__task_3] [zettarepl.transport.local.shell.1.async_exec.5029] Running ['zfs', 'send', '-V']
[2021/03/02 19:55:05] DEBUG [replication_task__task_3] [zettarepl.transport.local.shell.1.async_exec.5029] Error 2: 'missing snapshot argument\nusag.... list, run: zfs allow|unallow\n'
[2021/03/02 19:55:05] DEBUG [replication_task__task_3] [zettarepl.transport.local.shell.1.async_exec.5031] Running ['sh', '-c', 'exec 3>&1; eval $(exec 4>&1 >&....] && exit $pipestatus1; exit 0']
[2021/03/02 19:55:06] DEBUG [replication_task__task_3.async_exec_tee.wait] [zettarepl.transport.local.shell.1.async_exec.5031] Error 141: None
[2021/03/02 19:55:06] DEBUG [replication_task__task_3.process] [zettarepl.transport.local.shell.1.async_exec.5030] Error 141: 'No ECDSA host key is known for....Host key verification failed.\n'
[2021/03/02 19:55:06] DEBUG [replication_task__task_3.monitor] [zettarepl.transport.local.shell.1.async_exec.5031] Stopping
[2021/03/02 19:55:06] WARNING [replication_task__task_3] [zettarepl.replication.run] For task 'task_3' at attempt 1 recoverable replication error RecoverableReplicationError('Broken pipe (No ECDSA host key is known for vis-backup.an.intel.com and you have requested str
ict checking.\nHost key verification failed.\n)')
[2021/03/02 19:55:06] ERROR [replication_task__task_3] [zettarepl.replication.run] Failed replication task 'task_3' after 1 retries
[2021/03/02 19:55:06] DEBUG [Thread-194] [zettarepl.paramiko.replication_task__task_3] EOF in transport thread