Here comes the findings:
From the upgrade logs. I could see the information below:
+ runuser -G vsplog vspvm -- ./.backup -d /tmp/tmp.TVChbD9920/upgrade-backup/vm-data/ServicesVM/services_vm
+ /opt/avaya/vsp/util/logger.py -t -f /opt/avaya/vsp/backup/scripts/log-vm.conf
ERROR: Failed to SSH to Services-VM at 10.82.8.19
Cdom cannot setup ssh connection with the public key authentication ( no password connection ). The security logs on SVM did not show successfull login with vspadmin account.
Then I opened the SSH debug and do the test from cdom to SVM by su – vspvm ; ssh vspadmin@SVM_IP, and I could see cdom sent the DSA encrypted data to SVM but SVM cannot response with correct answer. And then Cdom will use the password authentication as the secondary authentication method. And during the backup, the session will hang and wait the password input and after timeout, the backup failed.
The DSA looks okay on both Cdom and SVM per I checked, but not quite sure the private key doesn’t work during that time.
Regenerate a RSA with 1024bits key for vspvm (cdom account) and appended the id_rsa.pub to the SVM authorized_keys (vspadmin account). After that, the backup could be done accordingly.
Login vspvm by su - vspvm
Generate the RSA keys by ssh-keygen, copy the content under ~/.ssh/id_rsa.pub to ServicesVM vspadmin account ~/.ssh/authorized_keys
After the adding, please try to ssh from cdom to SVM by su - vspvm;ssh vspadmin@[SVMIP]
This is just a W.A.
Next time before you upgrade VSP to 6.3.8, please follow the official PSN to upgrade the serviceVM first and after upgrade VSP 6.3.8 and apply the sanity patch accordingly.
And BP opened another ticket for the other server upgrade failed. I found after patching, I cannot get access to the SVM by ssh using any account. And after failover, I checked the logs and found the error below:
pam_open_session() permission denied
That error could be caused by several reason. After checking the system, I found the /var/log/btmp file corrupted. I move the old file to btmp.bak and created a new one with permission 600. After that, the restore procedure could be proceeded successfully.