How to boost database backup performance
Post date: Feb 02, 2014 10:7:2 PM
1. Using Direct I/O
Direct I/O allows you to bypass the buffering of memory within the file system cache. When Direct I/O is used for a file, data is transferred directly from the disk to the application buffer, without the use of the file buffer cache. Direct I/O benefits applications by reducing CPU consumption and eliminating the overhead of copying data twice: first between the disk and the file buffer cache, and then from the file. Direct I/O is supported only on RHEL, CentOS and SUSE.
Turn on Direct I/O
$ gpconfig -c gp_backup_directIO -v on
Decrease network data chunks sent to dump when the database is busy
$ gpconfig -c gp_backup_directIO_read_chunk_mb -v 10
The above command sets the chunk size to 10MB; the default chunk size is 20MB. The default value has been tested to be the optimal setting. Decreasing it will increase the backup time and increasing it will result in little change to backup time.
Verify the current data chunk size
$ gpconfig –s gp_backup_directIO_read_chunk_mb
Verify whether Direct I/O is turned on
$ gpconfig –s gp_backup_directIO
2. Using Data Domain Boost
Data Domain Boost is a gpcrondump and gpdbrestore option that provides faster backups after the initial backup operation, and provides deduplication at the source to decrease network traffic. When you restore files from the Data Domain system with Data Domain Boost, some files are copied to the master local disk and are restored from there, and others are restored directly.
With Data Domain Boost managed file replication, you can replicate Greenplum Database backup images that are stored on a Data Domain system for disaster recover purposes. The gpmfr utility manages the Greenplum Database backup sets that are on the primary and a remote Data Domain system.
Managed file replication requires network configuration when a replication network is being used between two Data Domain systems:
1.The Greenplum Database system requires the Data Domain login credentials to be configured with gpcrondump. These credentials are created for the local and remote Data Domain systems.
2. When the non-management network interface is used for replication on the Data Domain systems, static routes must be configured on the systems to pass the replication data traffic to the correct interfaces. Do not use Data Domain Boost with gp_dump, pg_dump, or pg_dumpall.