Backup fails when the source Endpoint is set for the VCenter
“The ESXi host performing the CBT export refused connection. The host is chosen automatically by Center, so please ensure that the Coriolis deployment can dial TCP/902 on all the ESXi hosts of a vSphere, and that DNS name resolution firewalls are setup to facilitate this.Alternatively, try connecting Coriolis directly to the specific ESXi host which is running the VM(s) to be migrated by creating a Coriolis endpoint using the DNS name/IP address of the host itself. “
If the above error message occurs in a multi ESXi hosts situation where vCenter is used in the Coriolis endpoint configuration, we recommend adding as Coriolis Endpoint the ESXi hosts IP or Hostname rather than the vCenter’. This way, we are making sure that Coriolis will send the commands directly to the ESXi host.
The error message will point to port 443 which vCenter will use to communicate with the ESXi hosts, but Coriolis will not be able to confirm the connection.
In the case where the ESXi Host is connected to vCenter using its hostname and the Coriolis Endpoint is created using vCenter details, the vixdisklib will try to lookup for the host.
If the above situation applies, following one of the two steps below will allow Coriolis to correctly communicate with the ESXi host:
- add ESXi Hosts mappings in /etc/hosts file on Coriolis Appliance
- restart systemd-hostnamed service, which is a system service that may be used to change the system’s hostname and related machine metadata from user programs.
- add in /etc/resolv.conf on Coriolis Appliance a secondary DNS server that can resolve the Hostname of the ESXi servers.
- The Coriolis Appliance network must have access to the DNS server IP used.
This can be avoided if the Coriolis Endpoint is created using the IP of the ESXi host.
Replica/Migration fails due to corrupt CBT data
“Error caused by file /vmfs/volumes/xxxxxx/xxxxx.vmdk”.
If the above message occurs, it is an internal VMWare type of error that causes CBT data to be corrupted and thus, unexportable. This is a VMware software issue and Coriolis is merely just presenting the error it receives from the hypervisor.
The following steps have been observed to solve this error. Further assistance can be found on the CBT Reset documentation page if the problem remains.
- disable and then re-enable CBT on the VM (while the VM is powered off).
- create a Coriolis endpoint using the IP/hostname of the ESXi host the VM is currently running on instead of the IP/hostname of the vCenter server. This will force Coriolis to export from that host instead of a random host within the cluster.
- move the VM using vMotion to a different ESXi host, and retry with the above step to also change Coriolis’s endpoint to the new ESXi host.
“Error caused by file </path/to/vmdk/file>” Error
This error occurs as Coriolis calls the QueryDiskChangedAreas method of a VM while performing a CBT-based Migration/Replica.
The error message should be formatted with the exact path (on the VMWare datastore) of the VMDK causing issues.
Possible causes are speculated to be:
- a power failure and hard shutdown were performed on the VM
- a power failure during a Storage vMotion operation on the VM, in which case existing CBT data may be cleared completely
- VMs that have been upgraded from hardware version <= 5 may have CBT data reset, leading to the issue
- in vSphere < 5.0, reverting a VM to an earlier snapshot may cause the issue, in which case disabling and re-enabling is recommended
Further details can be found at the following links:
- manifestation of an error on VMs with pre-existing snapshots: https://kb.vmware.com/s/article/1033816
- manifestation of error, the apparent cause is an I/O error on a disk used as a vmfs datastore: https://communities.vmware.com/thread/588337
- manifestation of error during a cross-datastore move using vMotion due to block sizes on two datastores do not match: https://kb.vmware.com/s/article/2009097
- manifestation of error for a VM which has been reverted to a previous snapshot on VMWare < 5.0: https://kb.vmware.com/s/article/1021607
Double-check that the file is indeed a VMDK file correctly located on the datastore.
It is also worth asking the VMWare platform admin to check the logs of both the vCenter server as well as the ESXi host the VM being replicated is running on.
NOTE: the ESXi host from which the export is performed is not necessarily the same one the VM is on. If exporting from a vSphere, any ESXi host in the cluster could hypothetically do the export
There is no clear solution, but using vMotion to migrate the VM to another ESXi host and Datastore might bypass the issue.
Suggested workarounds (in order of likelihood):
- Perform a CBT reset for the instance. Instructions are available on the CBT Reset page below:
- CBT Reset documentation page.
- using vMotion to move the VM to a different ESXi host
- using vMotion to move the VM to a different datastore
- manually deleting the CTK data off of the datastore with the VM powered off (there should be a file named “*-ctk.vmdk” within the same directory the VMDK file in the error report)
- as a last resort, one may export an OVF of the VM, and re-import it (making sure to select another host or datastore)
Windows instances updates and incremental replica runs
Due to a bug in VMware 6.x (such as VMware ESXi 6.5 and 6.7), running Replica syncs after a Windows guest OS system update, might lead to inconsistent data on the destination platform.
This is a bug in VMware when interacting with the Microsoft Volume Shadow Copy Service (VSS) and the Windows system files that get updated during the installation of Windows Updates. The problem is no longer observed in ESXi 7.0, and given that the VMware 6.x series are End of Life, the recommendation is to move to ESXi 7.0 or newer.
This applies to all supported Windows Server editions and does not affect the running guest OS on the source.
Replica / Migration job failing with “instance_name” exception
A generic error with “instance_name” might be presented when starting a migration from VMware.
The root-cause is due to the wrong os_type being set for the source VM on the VMware side.
The matching guest OS running inside the VM must be set in the VMware configuration for the VM, and avoid using the generic other / otherGuest64 variants. Once that is changed, please run again the Replica or Migration job in Coriolis.