Challenge
The backup of an Exchange server VM fails with:
Unfreeze error:[Backup job failed] Cannot create shadow copy of the volumes containing writer’s data A VSS critical writer has failed. Writer name: [Microsoft Exchange Writer]. Class ID: [{76fe1ac4-15f7-4bcd-987e-8e1acb462fb7}]. Instance ID: [{0db23250-4d1e-42c1-8d14-2be32f448184}]. Writer's state: [VSS_WS_FAILED_AT_FREEZE]. Error code: [0x800423f2].]
If you run the command ‘vssadmin list writers’ on the Exchange server after the job fails, typically you will see an Exchange Writer has failed because of a timeout error (error code 9).
Cause
Starting in Veeam Backup & Replication v8
To overcome this VSS limitation, Veeam Backup & Replication utilizes the Microsoft VSS persistent snapshots technology for backup of Microsoft Exchange VMs. If Microsoft Exchange fails to be frozen within the allowed period of time, Veeam Backup & Replication automatically fails over to the persistent snapshot mechanism. To learn more about this new feature please read:
https://helpcenter.veeam.com/docs/backup/vsphere/persistent_snapshots.html
"VSSControl: Failed to freeze guest, wait timeout"
Refers to the limit imposed by Microsoft VSS writers on the duration of a freeze. This timeout is not configurable. Veeam uses VSS to freeze applications immediately prior to creating the VMware snapshot, and then sends the thaw command as soon as snapshot creation is complete. VSS will only hold a freeze on the writers for up to 60 seconds (20 for Exchange), so several steps must fit within this timeframe:
- Verification of freeze state1
- Snapshot creation request via VIM API2
- Snapshot creation on the ESXi host
- Return of snapshot information via VIM API2
- Thaw request to Microsoft VSS1
- Thawing of VSS writers’ I/O
1 If a network connection to the guest OS is not available, VIX API will be used, which introduces additional latency.
2 These steps should usually be near-instantaneous, but if the vCenter is heavily loaded or has a high latency to the ESXi hosts, the delay may be significant.
Solution
This issue is an infrastructure issue which can be difficult to narrow down. The following is a comprehensive list of resolutions that customers have used to resolve the issue:
- First make sure that you can create a windows backup of the VM using VSS. This will prove that the issue isn’t specifically VSS related in and of itself but a combination of VSS and with VMware snapshot technology.
- Ensure that you have no other backup vendor agents on the server you are backing up and if you do, uninstall them. If you need to do VSS operations on a guest OS you should be doing this with only one backup product. Note that Veeam uses Microsoft VSS and other software vendors may use their own VSS providers/writers and that those backup solutions making successful backups is not an valid comparison.
- Reboot of the Exchange Server
- ESX(i) host not having enough resources
- VMware snapshot takes longer than 20 seconds (hardcoded Exchange VSS Writer timeout)
- Exchange freeze is too I/O intensive on the storage and backup time and or Exchange datastore may need to be modified.
- COM+ Event System Service may need to be restarted. Root cause unknown. In some cases customers have scripted this service to restart prior to backup.
- Latency between VC and Hosts can cause backing up through the host directly to produce successful VSS backups whereas going through the VC causes freeze issues.
- If Veeam does not have direct network communication to Exchange, as a test, put Veeam on a network that does have network connectivity to Exchange and see if that resolves the issue. Direct network communication is not necessary however if underlying issues with VIX are occurring then we will try to use IP to communicate and in some cases this does not work properly because of the network architecture
- One thing that is extremely important if you are attempting to use "connectionless mode" for VSS (i.e. if there is a firewall and thus we rely on the VIX API to communicate) is that you must meet at least ONE of the following conditions:
- The account being used for Application Aware Processing MUST be either the "built-in" local administrator, or the "built-in" domain administrator (i.e. it must have a "well-known" SID ending in 500), other local or domain administrator accounts will not work.
--OR--
- UAC must be disabled on the guest VM.
- The account being used for Application Aware Processing MUST be either the "built-in" local administrator, or the "built-in" domain administrator (i.e. it must have a "well-known" SID ending in 500), other local or domain administrator accounts will not work.
- Ensure there is no snapshot running on the Exchange VM that could cause additional storage I/O that isn’t necessary.
- Exchange server may need additional resources if it is taxed during the unfreeze.