Challenge
This article documents general performance expectations, best practices, and configuration advice, when using an EMC Data Domain appliance with deduplication as a repository for Veeam Backup & Replication.
Solution
For further information regarding how Veeam Backup & Replication works with EMC Data Domain DDBoost please review: https://helpcenter.veeam.com/docs/backup/vsphere/emc_dd.html
Performance Expectations
EMC Data Domain Deduplication Storage Systems provide both high compression and deduplication ratios so that data can be kept for extended periods. When a Data Domain is configured as a repository for Veeam Backup & Replication, write performance may vary depending upon the particular EMC Data Domain Deduplication System model, protocol, and backup infrastructure architecture.
When attempting to read data from a Data Domain it must rehydrate and decompress each block, for this reason operations which read from the Data Domain will perform slower than non-deduplicated storage, this is more noticeable with operations which use random I/O. All restores will occur as fast as the environment can accept new information, and as fast as the Data Domain can decompress and rehydrate the blocks.
For quick recovery you may consider using fast primary storage and keeping a several restore points (3-7) for quick restore operations such as Instant Recovery, SureBackup, Windows or Other-OS File restores since they generate the highest amount of random reads. Then use the DataDomain as a secondary storage to store files for long term retention. If an EMC Data Domain Deduplication System will be used as primary storage, it is strongly suggested to leverage alternative restore capabilities within Veeam Backup & Replication such as Entire VM restore and VM files restore. This may result in faster recovery capabilities when used with EMC Data Domain Deduplication Systems than Instant Recovery and File Level Restore operations.
Instant Recovery
- This type of restore can be effected adversely by the aforementioned limitations of a Data Domain appliance, and also the type of VM being restored. Highly transactional VMs will require more IOPS from the Data Domain during the Instant Recovery than others. With this in mind you can expect to only be capable of running only a few Instant Recoveries simultaneously. Instant Recovered VMs that are started from a backup file stored on a Data Domain may react or start slowly as the majority of their read operations will be hindered by the Data Domain.
- For VMware users it is highly advised when performing an IR that the user select to have virtual disk updates redirected to a high performance Datastore. This will improve performance by caching written blocks to low latency storage.
- It is advised that if the VM is intended to be made permanent that the VM be migrated to production storage as soon as possible after the Instant Recovery has begun.
File Level Restores
When performing Veeam Backup & Replication File Level Restore (FLR) capabilities slow recovery times may be experienced. During Veeam FLR recovery capabilities, a significant amount of read activity occurs when accessing the Veeam “service data” metadata for each individual file as the Veeam backup files are not arranged in sequence. This read activity must be performed to determine the location of the data block(s) associated with each file during granular restore sessions. This significant level of random access is not recommended with archive tier storage devices because they are designed for optimal performance with sequential read operations. Veeam recommends implementing EMC Data Domain Deduplication Storage Systems as a secondary target for these use cases as the more random read operations, the slower the restore will be with EMC Data Domain Deduplication Systems.
- The backup browser may take longer than usual to open if an increment is selected and furthermore by that increments distance from the full restore point.
- Navigating between folders within the Backup Browser may take additional time as each folder’s content must retrieved from the backup file to display it.
Backup
- Reverse Incremental performance will be very poor due to its highly random I/O.
Note: When the Backup Job is configured with a DDBoost repository, Veeam Backup & Replication will prevent Reversed Incremental from being selected by the user. - Synthetic Full creation will be very slow to a Data Domain, unless using DDBoost.
- Synthetic Full with Transforms are not advised.
Backup Copy
- A retention longer than 30 is not advisable as restore operations will diminish in performance.
- The Health Check option may take a very long time as it is performing a read operation.
Replication
- Using the Datadomain to store Replica metadata is not advisable.
Note: Veeam Backup & Replication will prevent the user from selecting a DDBoost repository.
Veeam Backup & Replication Configuration
Storage optimization (job option):
Setting the storage optimization to Local 16TB+ has been shown to improve the effectiveness of Data Domain’s deduplication. The larger this value is, the smaller the preparation phase will be for a backup task and less memory will be used to keep storage metadata in memory.
Inline-deduplication (job option):
Since EMC Data Domain Deduplication Systems have excellent hardware deduplication and compression capabilities, it is highly advised that Veeam built-in deduplication be disabled to decrease load on the backup proxy.
Decompress backup block before storing(repository option):
Veeam strongly recommends enabling this option so that raw data is sent to the EMC Data Domain Deduplication System, leveraging its global deduplication and compression capabilities. Leaving Veeam compression enabled may significantly impact EMC Data Domain deduplication capabilities resulting in high load and slow backup jobs.
Use Per-VM Backup Files(repository option):
Veeam recommends enabling this option so that there is improved performance writing and reading data from the EMC Data Domain Deduplication System. This option is enabled by default when adding the repository as a Deduplicating Storage Appliance.
Backup Mode
- For CIFS/NFS presented repositories Forward incremental mode with periodic Active full backups is recommended to avoid the rehydration penalty during synthetic operations.
- For DDboost enabled repositories Forward incremental mode with either Active full or Synthetic full backups is recommended. Synthetically produced full backups will generally have the best restore performance and reduce the time VM is run off of a snapshot during the backup job run. However in some environments an Active Full job may run faster.
- Transforming previous backup chains into rollbacks is not advisable for both repository types.
- For forever forward incremental backup and backup copy on DDboost enabled repositories, the option “Defragment and compact full backup file” should be enabled if available. In most cases a weekly schedule is appropriate. This helps to avoid excessive growth of pre-compression data size for the full backup file.
Repository Performance Expectations and Configuration
If DDBoost is not licensed on the Data Domain system it must be added as a CIFS type or Linux type repository. It is advised to use a Linux server with the volume mounted via NFS as a relay server to help improve performance. Under some circumstances, CIFS or NFS communication may perform better than DDBoost with Veeam Backup & Replication v8 because of the limitation of a single thread per backup job when using DDBoost. DDBoost has been shown to improve performance when performing Synthetic Fulls.
With Veeam Backup & Replication v9, support for EMC Data Domain Boost is enhanced with the introduction of the following capabilities:
- Support EMC Data Domain Boost 3.0
- Reduced impact of storage fragmentation during restore operations even with enabled parallel processing. This feature allows Veeam to store the VM backup in the dedicated backup chain so that fragmentation ratio will be minimum.
- Reduce the impact of the block size so you may define any block size without impact on the restore process. Veeam will be able to read data granularly so amount of the redundant will be minimum.
With DDBoost
If the Data Domain System is licensed for DDBoost please proceed to configure it using the following steps.
- Launch the creation of a new Repository, on the Type tab select Deduplication storage appliance.
- Select the deduplication storage as EMC Data Domain.
- On the next tab configure the information to for connecting to the Data Domain appliance.
- On the Repository tab click Browse and select the necessary location from the list of available paths.
- The default settings can be taken for the last steps in repository configuration.
Unless your environment requires you to specify a different vPower NFS Server.
Without DDBoost
CIFS
- Launch the creation of a new Repository, on the Type tab select CIFS
- On the next tab configure the path to which the Repository will write to, and set credentials to access that share.
- On the Repository tab within the advanced section, enable “Decompress backup data blocks before storing”
- The default settings can be taken for the last steps in repository configuration.
Unless your environment requires you to specify a different vPower NFS Server.
NFS
The Data Domain will need to be configured for NFS access, and configure a Linux server to mount the volumes from the Data Domain via NFS. Please refer to the following links for further information regarding connecting Linux to the Data Domain via NFS:
http://forums.veeam.com/veeam-backup-replication-f2/veeam-datadomain-and-linux-nfs-share-t8916.html
http://tsmith.co/2014/veeam-and-datadomain/
- Launch the creation of a new Repository, on the Type tab select Linux
- On the next tab select the Linux server that we be connected to. If it is not present in the list select “Add New…”
- On the Repository tab specify the path on the Linux server that leads to where you mounted the Data Domain via NFS. On this tab in the advanced section enabled “Decompress backup data blocks before storing”.
- The default settings can be taken for the last steps in repository configuration.
Unless your environment requires you to specify a different vPower NFS Server.