gpcrondump and gpdbrestore
Post date: Oct 17, 2016 5:40:17 PM
[gpadmin@mdw ~]$ gpcrondump --help
COMMAND NAME: gpcrondump
Writes out a database to SQL script files. The script files can be used
to restore the database using the gpdbrestore utility. The gpcrondump
utility can be called directly or from a crontab entry.
*****************************************************
SYNOPSIS
*****************************************************
gpcrondump -x database_name
[-s <schema> | -S <schema> | -t <schema>.<table> | -T <schema>.<table>]
[--table-file=<filename> | --exclude-table-file=<filename>]
[--schema-file=<filename> | --exclude-schema-file=<filename>]
[-u backup_directory] [-R post_dump_script] [--incremental]
[ -K <timestamp> [--list-backup-files] ]
[--prefix <prefix_string> [--list-filter-tables] ]
[ -c [ --cleanup-date yyyymmdd | --cleanup-total n ] ]
[-z] [-r] [-f <free_space_percent>] [-b] [-h] [-j | -k]
[-g] [-G] [-C] [-d <master_data_directory>] [-B <parallel_processes>]
[-a] [-q] [-y <reportfile>] [-l <logfile_directory>]
[--email-file <path_to_file> ] [-v]
{ [-E encoding] [--inserts | --column-inserts] [--oids]
[--no-owner | --use-set-session-authorization]
[--no-privileges] [--rsyncable]
{ [--ddboost [--replicate --max-streams <max_IO_streams>
[--ddboost-skip-ping] ] ] } |
{ [--netbackup-service-host <netbackup_server>
--netbackup-policy <netbackup_policy>
--netbackup-schedule <netbackup_schedule>
[--netbackup-block-size <size> ]
[--netbackup-keyword <keyword> ] } }
gpcrondump --ddboost-host <ddboost_hostname>
[--ddboost-host ddboost_hostname ... ]
--ddboost-user <ddboost_user>
--ddboost-backupdir <backup_directory>
[--ddboost-remote] [--ddboost-skip-ping]
gpcrondump --ddboost-config-remove
gpcrondump -o [ --cleanup-date yyyymmdd | --cleanup-total n ]
gpcrondump -?
gpcrondump --version
*****************************************************
DESCRIPTION
*****************************************************
The gpcrondump utility dumps the contents of a database into SQL script
files, which can then be used to restore the database schema and user
data at a later time using gpdbrestore. During a dump operation, users
will still have full access to the database.
By default, dump files are created in their respective master and
segment data directories in a directory named db_dumps/YYYYMMDD. The
data dump files are compressed by default using gzip.
gpcrondump allows you to schedule routine backups of a Greenplum
database using cron (a scheduling utility for UNIX operating systems).
Cron jobs that call gpcrondump should be scheduled on the master host.
WARNING: Backing up a database with gpcrondump while simultaneously
running ALTER TABLE might cause gpcrondump to fail.
**********************
Data Domain Boost
**********************
gpcrondump is used to schedule Data Domain Boost backup and restore
operations. gpcrondump is also used to set or remove one-time
credentials for Data Domain Boost.
**********************
NetBackup
**********************
Greenplum Database must be configured to communicate with the Symantec
NetBackup master server that is used to backup the database. See the
"Greenplum Database Administrator Guide" for information on
configuring Greenplum Database and NetBackup and backing up and
restoring with NetBackup.
**********************
Return Codes
**********************
The following is a list of the codes that gpcrondump returns.
0 - Dump completed with no problems
1 - Dump completed, but one or more warnings were generated
2 - Dump failed with a fatal error
**********************
EMAIL NOTIFICATIONS
**********************
To have gpcrondump send email notifications with the completion status
after a back up operation completes, you must place a file named
mail_contacts in the home directory of the Greenplum superuser (gpadmin)
or in the same directory as the gpcrondump utility ($GPHOME/bin). This
file should contain one email address per line. gpcrondump will issue a
warning if it cannot locate a mail_contacts file in either location. If
both locations have a mail_contacts file, then the one in $HOME takes
precedence.
You can customize the email Subject and From lines of the email
notifications that gpcrondump sends after a back up completes for a
database. You specify the option --email-file with the location of a
YAML file that contains email Subject and From lines that gpcrondump
uses. See the section "File Format for Customized Emails" for
information about the format of the YAML file.
NOTE: The UNIX mail utility must be running on Greenplum Database host
and must be configured to allow the Greenplum superuser (gpadmin) to
send email.
**********************
Limitations
**********************
* NetBackup is not compatible with DDBoost. Both NetBackup and DDBoost
cannot be used in a single back up operation.
* For incremental back up sets, a full backup and associated incremental
backups, the backup set must be on a single device. For example, a
backup set must all be on a file system. The backup set cannot have some
backups on the local file system and others on a Data Domain system or a
NetBackup system.
*****************************************************
OPTIONS
*****************************************************
-a (do not prompt)
Do not prompt the user for confirmation.
-b (bypass disk space check)
Bypass disk space check. The default is to check for available disk
space, unless --ddboost is specified. When using Data Domain Boost, this
option is always enabled.
Note: Bypassing the disk space check generates a warning message. With a
warning message, the return code for gpcrondump is 1 if the dump is
successful. (If the dump fails, the return code is 2, in all cases.)
-B <parallel_processes>
The number of segments to check in parallel for pre/post-dump
validation. If not specified, the utility will start up to 60 parallel
processes depending on how many segment instances it needs to dump.
-c (clear old dump files first)
Specify this option to delete old backups before performing a back up.
In the db_dumps directory, the directory where the name is the oldest
date is deleted. If the directory name is the current date, the
directory is not deleted. The default is to not delete old backup files.
The deleted directory might contain files from one or more backups.
WARNING: Before using this option, ensure that incremental backups
required to perform the restore are not deleted. The gpdbrestore utility
option --list-backup lists the backup sets required to perform a backup.
You can specify the option --cleanup-date or --cleanup-total to specify
backup sets to delete.
If --ddboost is specified, only the old files on Data Domain Boost are
deleted.
This option is not supported with the -u option.
-C (clean catalog before restore)
Clean out the catalog schema prior to restoring database objects.
gpcrondump adds the DROP command to the SQL script files when creating
the backup files. When the script files are used by the gpdbrestore
utility to restore database objects, the DROP commands remove existing
database objects before restoring them.
If --incremental is specified and the files are on NFS storage, the -C
option is not supported. The database objects are not dropped if the -C
option is specified.
--cleanup-date=yyyymmdd
Remove backup sets for the date yyyy-mm-dd. The date format is yyyymmdd.
If multiple backup sets were created on the date, all the backup sets
for that date are deleted. If no backup sets are found, gpcrondump
returns a warning message and no backup sets are deleted. If the -c
option is specified, the backup process continues.
Valid only with the -c or -o option.
WARNING: Before using this option, ensure that incremental backups
required to perform the restore are not deleted. The gpdbrestore utility
option --list-backup lists the backup sets required to perform a backup.
--cleanup-total=n
Remove the n oldest backup sets based on the backup timestamp. If there
are fewer than n backup sets, gpcrondump returns a warning message and
no backup sets are deleted. If the -c option is specified, the backup
process continues.
Valid only with the -c or -o option.
WARNING: Before using this option, ensure that incremental backups
required to perform the restore are not deleted. The gpdbrestore utility
option --list-backup lists the backup sets required to perform a backup.
--column-inserts
Dump data as INSERT commands with column names.
If --incremental is specified, this option is not supported.
-d <master_data_directory>
The master host data directory. If not specified, the value set for
$MASTER_DATA_DIRECTORY will be used.
--ddboost [--replicate --max-streams <max_IO_streams>
[--ddboost-skip-ping] ]
Use Data Domain Boost for this backup. Before using Data Domain Boost,
set up the Data Domain Boost credential, as described in the next option
below.
The following option is recommended if --ddboost is specified.
* -z option (uncompressed)
Backup compression (turned on by default) should be turned off
with the -z option. Data Domain Boost will deduplicate and
compress the backup data before sending it to the Data Domain system.
--replicate --max-streams <max_IO_streams> is optional. If you specify
this option, gpcrondump replicates the backup on the remote Data Domain
server after the backup is complete on the primary Data Domain server.
<max_IO_streams> specifies the maximum number of Data Domain I/O streams
that can be used when replicating the backup set on the remote Data
Domain server from the primary Data Domain server.
You can use gpmfr to replicate a backup if replicating a backup with
gpcrondump takes a long time and prevents other backups from occurring.
Only one instance of gpcrondump can be running at a time. While
gpcrondump is being used to replicate a backup, it cannot be used to
create a backup.
You can run a mixed backup that writes to both a local disk and Data
Domain. If you want to use a backup directory on your local disk other
than the default, use the -u option. Mixed backups are not supported
with incremental backups. For more information about mixed backups and
Data Domain Boost, see "Backing Up and Restoring Databases" in the
"Greenplum Database Administrator Guide."
IMPORTANT: Never use the Greenplum Database default backup options with
Data Domain Boost.
To maximize Data Domain deduplication benefits, retain at least 30 days
of backups.
NOTE: The -b, -c, -f, -G, -g, -R, and -u options change if --ddboost is
specified. See the options for details.
The DDBoost backup options are not supported if the NetBackup options
are specified.
--ddboost-host ddboost_hostname
[--ddboost-host ddboost_hostname ... ]
--ddboost-user <ddboost_user>
--ddboost-backupdir <backup_directory>
[--ddboost-remote] [--ddboost-skip-ping]
Sets the Data Domain Boost credentials. Do not combine this options with
any other gpcrondump options. Do not enter just one part of this option.
<ddboost_hostname> is the IP address (or hostname associated to the IP)
of the host. There is a 30-character limit. If you use two or more
network connections to connect to the Data Domain system, specify each
connection with the --ddboost-host option.
<ddboost_user> is the Data Domain Boost user name. There is a 30-character
limit.
<backup_directory> is the location for the backup files, configuration
files, and global objects on the Data Domain system. The location on the
system is GPDB/<backup_directory>.
--ddboost-remote is optional. Indicates that the configuration
parameters are for the remote Data Domain system that is used for backup
replication Data Domain Boost managed file replication.
Example:
gpcrondump --ddboost-host 172.28.8.230 --ddboost-user
ddboostusername --ddboost-backupdir gp_production
After running gpcrondump with these options, the system verifies the
limits on the host and user names and prompts for the Data Domain Boost
password. Enter the password when prompted; the password is not echoed
on the screen. There is a 40-character limit on the password that can
include lowercase letters (a-z), uppercase letters (A-Z), numbers (0-9),
and special characters ($, %, #, +, etc.).
The system verifies the password. After the password is verified, the
system creates encrypted DDBOOST_CONFIG files in the user's home
directory.
In the example, the --ddboost-backupdir option specifies the backup
directory gp_production in the Data Domain Storage Unit GPDB.
NOTE: If there is more than one operating system user using Data Domain
Boost for backup and restore operations, repeat this configuration
process for each of those users.
IMPORTANT: Set up the Data Domain Boost credential before running any
Data Domain Boost backups with the --ddboost option, described above.
--ddboost-config-remove
Removes all Data Domain Boost credentials from the master and all
segments on the system. Do not enter this option with any other
gpcrondump option.
--ddboost-skip-ping
Specify this option to skip the ping of a Data Domain system. When
working with a Data Domain system, ping is used to ensure that the Data
Domain system is reachable. If the Data Domain system is configured to
block ICMP ping probes, specify this option.
--dump-stats
Dump optimizer statistics from pg_statistic. Statistics are dumped in the
master data directory to db_dumps/YYYYMMDD/gp_statistics_1_1_<timestamp>.
If --ddboost is specified, the backup is located on the default storage
unit in the directory specified by --ddboost-backupdir when the Data
Domain Boost credentials were set.
-E <encoding>
Character set encoding of dumped data. Defaults to the encoding of the
database being dumped. See the Greenplum Database Reference Guide for
the list of supported character sets.
-email-file <path_to_file>
Specify the fully-qualified location of the YAML file that contains the
customized Subject and From lines that are used when gpcrondump sends
notification emails about a database back up.
See the section "File Format for Customized Emails" for information
about the format of the YAML file.
-f <free_space_percent>
When checking that there is enough free disk space to create the dump
files, specifies a percentage of free disk space that should remain
after the dump completes. The default is 10 percent.
NOTE: This is option is not supported if --ddboost or --incremental is
specified.
-g (copy config files)
Secure a copy of the master and segment configuration files
postgresql.conf, pg_ident.conf, and pg_hba.conf. These configuration
files are dumped in the master or segment data directory to
db_dumps/YYYYMMDD/config_files_<timestamp>.tar.
If --ddboost is specified, the backup is located on the default storage
unit in the directory specified by --ddboost-backupdir when the Data
Domain Boost credentials were set.
-G (dump global objects)
Use pg_dumpall to dump global objects such as roles and tablespaces.
Global objects are dumped in the master data directory to
db_dumps/YYYYMMDD/gp_global_1_1_<timestamp>.
If --ddboost is specified, the backup is located on the default storage
unit in the directory specified by --ddboost-backupdir when the Data
Domain Boost credentials were set.
-h (record dump details)
Record details of database dump in database table
public.gpcrondump_history in database supplied via -x option. Utility
will create table if it does not currently exist.
--incremental (backup changes to append-optimized tables)
Adds an incremental backup to a backup set. When performing an
incremental backup, the complete backup set created prior to the
incremental backup must be available. The complete backup set includes
the following backup files:
* The last full backup before the current incremental backup
* All incremental backups created between the time of the full backup
the current incremental backup
An incremental backup is similar to a full back up except for
append-optimized tables, including column-oriented tables. An
append-optimized table is backed up only if at least one of the
following operations was performed on the table after the last backup.
ALTER TABLE
INSERT
UPDATE
DELETE
TRUNCATE
DROP and then re-create the table
For partitioned append-optimized tables, only the changed table
partitions are backed up.
The -u option must be used consistently within a backup set that
includes a full and incremental backups. If you use the -u option with a
full backup, you must use the -u option when you create incremental
backups that are part of the backup set that includes the full backup.
You can create an incremental backup for a full backup of set of
database tables. When you create the full backup, specify the --prefix
option to identify the backup. To include a set of tables in the full
backup, use either the -t option or --table-file option. To exclude a
set of tables, use either the -T option or the --exclude-table-file
option. See the description of the option for more information on its
use.
To create an incremental backup based on the full backup of the set of
tables, specify the option --incremental and the --prefix option with
the string specified when creating the full backup. The incremental
backup is limited to only the tables in the full backup.
WARNING: gpcrondump does not check for available disk space prior to
performing an incremental backup.
IMPORTANT: An incremental back up set, a full backup and associated
incremental backups, must be on a single device. For example, a the
backups in a backup set must all be on a file system or must all be on a
Data Domain system.
--inserts
Dump data as INSERT, rather than COPY commands.
[gpadmin@mdw ~]$ gpdbrestore --help
COMMAND NAME: gpdbrestore
Restores a database from a set of dump files generated by gpcrondump.
*****************************************************
SYNOPSIS
*****************************************************
gpdbrestore { -t <timestamp_key> { [-L]
| [--netbackup-service-host <netbackup_server>
[--netbackup-block-size <size>] ] }
| -b <YYYYMMDD>
| -R <hostname>:<path_to_dumpset>
| -s <database_name> }
[--noplan] [--noanalyze] [-u <backup_directory>] [--list-backup]
[--prefix <prefix_string> ] [--report-status-dir <report_directory> ]
[-T <schema>.<table> [-T ...]] [--table-file <file_name>]
[--truncate] [-e] [-G]
[-B <parallel_processes>]
[-d <master_data_directory>] [-a] [-q] [-l <logfile_directory>]
[-v] [--ddboost ]
[-S <schema_name> [-S ...]]
[--redirect <database_name> ]
[--change-schema=<schema_name> ]
gpdbrestore -?
gpdbrestore --version
*****************************************************
DESCRIPTION
*****************************************************
The gpdbrestore utility recreates the data definitions (schema) and userdata in a Greenplum database using the script files created by
gpcrondump operations.
When you restore from an incremental backup, the gpdbrestore utility assumes the complete backup set is available. The complete backup set
includes the following backup files:
* The last full backup before the specified incremental backup
* All incremental backups created between the time of the full backup
the specified incremental backup
The gpdbrestore utility provides the following functionality:
* Automatically reconfigures for compression.
* Validates the number of dump files are correct (For primary only,
mirror only, primary and mirror, or a subset consisting of some mirror
and primary segment dump files).
* If a failed segment is detected, restores to active segment instances.
* Except when restoring data from a NetBackup server, you do not need to
know the complete timestamp key (-t) of the backup set to restore.
Additional options are provided to instead give just a date (-b), backup
set directory location (-R), or database name (-s) to restore.
* The -R option allows the ability to restore from a backup set located
on a host outside of the Greenplum Database array (archive host).
Ensures that the correct dump file goes to the correct segment instance.
* Identifies the database name automatically from the backup set.
* Allows you to restore particular tables only (-T option) instead of
the entire database. Note that single tables are not automatically
dropped or truncated prior to restore.
Performs an ANALYZE operation on the tables that are restored. You can
disable the ANALYZE operation by specifying the option --noanalyze.
* Can restore global objects such as roles and tablespaces (-G option).
* Detects if the backup set is primary segments only or primary and
mirror segments and passes the appropriate options to gp_restore.
* Allows you to drop the target database before a restore in a single
operation.
*****************************************************
Restoring a Database from NetBackup
*****************************************************
Greenplum Database must be configured to communicate with the Symantec
NetBackup master server that is used to restore database data. See
"Backing Up and Restoring Databases" in the "Greenplum Database
Administrator Guide" for information about configuring Greenplum
Database and NetBackup.
When restoring from NetBackup server, you must specify the timestamp of
the backup with the -t option.
NetBackup is not compatible with DDBoost. Both NetBackup and DDBoost
cannot be used in a single back up operation.
*****************************************************
Restoring a Database with Named Pipes
*****************************************************
If you used named pipes when you backed up a database with gpcrondump,
named pipes with the backup data must be available when restoring the
database from the backup.
*****************************************************
Error Reporting
*****************************************************
gpdbrestore does not report errors automatically. After the restore is
completed, check the report status files to verify that there are no
errors. The restore status files are stored in the db_dumps/<date>/
directory by default.
*****************************************************
OPTIONS
*****************************************************
-a (do not prompt)
Do not prompt the user for confirmation.
-b <YYYYMMDD>
Looks for dump files in the segment data directories on the Greenplum
Database array of hosts in db_dumps/<YYYYMMDD>. If --ddboost is specified,
the systems looks for dump files on the Data Domain Boost host.
-B <parallel_processes>
The number of segments to check in parallel for pre/post-restore
validation. If not specified, the utility will start up to 60 parallel
processes depending on how many segment instances it needs to restore.
--change-schema=schema-name
Optional. Restores tables from a backup created with gpcrondump to a
different schema. The specified schema must exist in the database.
If the schema does not exist, the utility returns an error. System
catalog schemas are not supported.
You must specify tables to restore with the -T and --table-file options.
If a table that is being restored exists in schema-name, the utility
returns a warning and attempts to append the data to the table from the
backup. You can specify the --trunctate option to truncate table data
before restoring data to the table from the backup.
-d <master_data_directory>
Optional. The master host data directory. If not specified, the value
set for $MASTER_DATA_DIRECTORY will be used.
--ddboost
Use Data Domain Boost for this restore, if the --ddboost option was
passed when the data was dumped. Before using Data Domain Boost, make
sure the one-time Data Domain Boost credential setup is complete. See
the Greenplum Database System Administrator Guide for details.
If you backed up Greenplum Database configuration files with the
gpcrondump option -g and specified the --ddboost option, you must
manually restore the backup from the Data Domain system. The
configuration files must be restored for the Greenplum Database master
and all the hosts and segments. The backup location on the Data Domain
system is the directory GPDB/<backup_directory>/<date>. The
<backup_directory> is set when you specify the Data Domain credentials
with gpcrondump.
This option is not supported if --netbackup-service-host is specified.
-e (drop target database before restore)
Drops the target database before doing the restore and then recreates
it.
-G [include|only]
Restores global objects such as roles and tablespaces if the global
object dump file db_dumps/<date>/gp_global_1_1_<timestamp> is found in
the master data directory.
Specify either "-G only" to only restore the global objects dump file
or "-G include" to restore global objects along with a normal restore.
Defaults to "include" if neither argument is provided.
-l <logfile_directory>
The directory to write the log file. Defaults to ~/gpAdminLogs.
--list-backup
Lists the set of full and incremental backup sets required to perform a
restore based on the <timestamp_key> specified with the -t option and the
location of the backup set.
This option is supported only if the <timestamp_key> is for an incremental
backup.
-L (list tablenames in backup set)
When used with the -t option, lists the table names that exist in the
named backup set and exits. Does not perform a restore.
-m (restore metadata only)
Performs a restore of database metadata (schema and table definitions, SET
statements, and so forth) without restoring data. If the --restore-stats or
-G options are provided as well, statistics or globals will also be restored.
The --noplan and --noanalyze options are not supported in conjunction with
this option, as they affect the restoration of data and no data is restored.
--netbackup-block-size <size>
Specify the block size, in bytes, of data being transferred from the
Symantec NetBackup server. The default is 512 bytes.
NetBackup options are not supported if DDBoost backup options are
specified.
--netbackup-service-host <netbackup_server>
The NetBackup master server that Greenplum Database connects to when
backing up to NetBackup. If you specify this option, you must specify
the timestamp of the backup with the -t option.
This option is not supported with any of the these options:
-R, -s, -b, -L, or --ddboost.
--noanalyze
The ANALYZE command is not run after a successful restore. The default
is to run the ANALYZE command on restored tables. This option is useful
if running ANALYZE on the tables requires a significant amount of time.
If this option is specified, you should run ANALYZE manually on
restored tables. Failure to run ANALYZE following a restore might result
in poor database performance.
--noplan
Restores only the data backed up during the incremental backup specified
by the timestamp_key. No other data from the complete backup set are
restored. The full backup set containing the incremental backup must be
available.
If the timestamp_key specified with the -t option does not reference an
incremental backup, an error is returned.
--prefix <prefix_string>
If you specified the gpcrondump option --prefix <prefix_string> to create
the backup, you must specify this option with the <prefix_string> when
restoring the backup.
If you created a full backup of a set of tables with gpcrondump and
specified a prefix, you can use gpcrondump with the options
--list-filter-tables and --prefix <prefix_string> to list the tables
that were included or excluded for the backup.
-q (no screen output)
Run in quiet mode. Command output is not displayed on the screen, but is
still written to the log file.
-R <hostname>:<path_to_dumpset>
Allows you to provide a hostname and full path to a set of dump files.
The host does not have to be in the Greenplum Database array of hosts,
but must be accessible from the Greenplum master.
--redirect <database_name>
The name of the database where the data is restored. Specify this option
to restore data to a database that is different than the database
specified during back up. If <database_name> does not exist, it is
created.
--report-status-dir <report_directory>
Specifies the absolute path to the directory on the each Greenplum
Database host (master and segment hosts) where gpdbrestore writes report
status files for a restore operation. If <report_directory> does not exist
or is not writable, gpdbrestore returns an error and stops.
If this option is not specified and the -u option is specified, report
status files are written to the location specified by the -u option if
the -u location is writable. If the location specified by -u option is
not writable, the report status files are written to segment data
directories.
--restore-stats [include|only]
Restores optimizer statistics if the statistics dump file
db_dumps/<date>/gp_statistics_1_1_<timestamp> is found in the master data
directory. Setting this option automatically skips the final analyze step,
so it is not necessary to also set the --noanalyze flag in conjunction with
this one.
Specify "--restore-stats only" to only restore the statistics dump file or
"--restore-stats include" to restore statistics along with a normal restore.
Defaults to "include" if neither argument is provided.
If "--restore-stats only" is specified along with the -e option, an error
is returned.
-s <database_name>
Looks for latest set of dump files for the given database name in the
segment data directories db_dumps directory on the Greenplum Database
array of hosts.
-t <timestamp_key>
The 14 digit timestamp key that uniquely identifies a backup set of data
to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files
matching this timestamp key in the segment data directories db_dumps
directory on the Greenplum Database array of hosts.
-T <schema>.<table_name>
Table names to restore, specify multiple times for multiple tables. The
named table(s) must exist in the backup set of the database being restored.
Existing tables are not automatically truncated before data is restored
from backup. If your intention is to replace existing data in the table
from backup, truncate the table prior to running gpdbrestore -T.
-S <schema>
Schema names to restore, specify multiple times for multiple schemas.
Existing tables are not automatically truncated before data is restored
from backup. If your intention is to replace existing data in the table
from backup, truncate the table prior to running gpdbrestore -S.
--table-file <file_name>
Specify a file <file_name> that contains a list of table names to restore.
The file contains any number of table names, listed one per line. See
the -T option for information about restoring specific tables.
--truncate
Truncate table data before restoring data to the table from the backup.
This option is supported only when restoring a set of tables with the
option -T or --table-file.
This option is not supported with the -e option.
-u <backup_directory>
Specifies the absolute path to the directory containing the db_dumps
directory on each host. If not specified, defaults to the data directory
of each instance to be backed up. Specify this option if you specified a
backup directory with the gpcrondump option -u when creating a backup
set.
If <backup_directory> is not writable, backup operation report status
files are written to segment data directories. You can specify a
different location where report status files are written with the
--report-status-dir option.
NOTE: This option is not supported if --ddboost is specified.
-v | --verbose
Specifies verbose mode.
--version (show utility version)
Displays the version of this utility.
-? (help)
Displays the online help.
*****************************************************
EXAMPLES
*****************************************************
Restore the sales database from the latest backup files generated by
gpcrondump (assumes backup files are in the segment data directories in
db_dumps):
gpdbrestore -s sales
Restore a database from backup files that reside on an archive host
outside the Greenplum Database array (command issued on the Greenplum
master host):
gpdbrestore -R archivehostname:/data_p1/db_dumps/20080214
Restore global objects only (roles and tablespaces):
gpdbrestore -G
NOTE: The -R option is not supported when restoring a backup set
that includes incremental backups.
If you restore from a backup set that contains an incremental backup, all the files in
the backup set must be available to gpdbrestore. For example, the following
timestamp keys specify a backup set. 20120514054532 is the full backup and the
others are incremental.
20120514054532
20120714095512
20120914081205
20121114064330
20130114051246
The following gbdbrestore command specifies the timestamp key 20121114064330.
The incremental backup with the timestamps 20120714095512 and 20120914081205
and the full backup must be available to perform a restore.
gpdbrestore -t 20121114064330
The following gbdbrestore command uses the --noplan option to restore only the
data that was backed up during the incremental backup with the timestamp key
20121114064330. Data in the previous incremental backups and the data in the full
backup are not restored.
gpdbrestore -t 20121114064330 --noplan
This gpdbrestore command restores Greenplum Database data from the data
managed by NetBackup master server nbu_server1. The option -t
20130530090000 specifies the timestamp generated by gpcrondump when the
backup was created. The -e option specifies that the target database is
dropped before it is restored.
gpdbrestore -t 20130530090000 -e --netbackup-service-host=nbu_server1
*****************************************************
SEE ALSO
*****************************************************
gpcrondump
[gpadmin@mdw ~]$
If --incremental is specified, this option is not supported.
-j (vacuum before dump)
Run VACUUM before the dump starts.
-K <timestamp> [--list-backup-files]
Specify the <timestamp> that is used when creating a backup. The
<timestamp> is 14-digit string that specifies a date and time in the
format yyyymmddhhmmss. The date is used for backup directory name. The
date and time is used in the backup file names. If -K <timestamp> is not
specified, a timestamp is generated based on the system time.
When adding a backup to set of backups, gpcrondump returns an error if
the <timestamp> does not specify a date and time that is more recent
than all other backups in the set.
--list-backup-files is optional. When you specify both this option and
the -K <timestamp> option, gpcrondump does not perform a backup.
gpcrondump creates two text files that contain the names of the files
that will be created when gpcrondump backs up a Greenplum database. The
text files are created in the same location as the backup files.
The file names use the timestamp specified by the -K <timestamp> option
and have the suffix _pipes and _regular_files. For example:
gp_dump_20130514093000_pipes
gp_dump_20130514093000_regular_files
The _pipes file contains a list of file names that be can be created as
named pipes. When gpcrondump performs a backup, the backup files will
generate into the named pipes. The _regular_files file contains a list
of backup files that must remain regular files. gpcrondump and
gpdbrestore use the information in the regular files during backup and
restore operations. To backup a complete set of Greenplum Database
backup files, the files listed in the _regular_files file must also be
backed up after the completion of the backup job.
To use named pipes for a backup, you need to create the named pipes on
all the Greenplum Database and make them writeable before running
gpcrondump.
If --ddboost is specified, -K <timestamp> [--list-backup-files] is not
supported.
-k (vacuum after dump)
Run VACUUM after the dump has completed successfully.
-l <logfile_directory>
The directory to write the log file. Defaults to ~/gpAdminLogs.
--netbackup-block-size <size>
Specify the block size of data being transferred to the Symantec
NetBackup server. The default is 512 bytes. NetBackup options are not
supported if DDBoost backup options are specified.
--netbackup-keyword <keyword>
Specify a keyword for the backup that is transferred to the Symantec
NetBackup server. NetBackup adds the keyword property and the specified
<keyword> value to the NetBackup .img files that are created for the
backup.
The minimum length is 1 character, and the maximum length is 100
characters.
NetBackup options are not supported if DDBoost backup options are
specified.
--netbackup-service-host netbackup_server
The NetBackup master server that Greenplum Database connects to when
backing up to NetBackup. NetBackup options are not supported if DDBoost
backup options are specified.
--netbackup-policy <netbackup_policy>
The name of the NetBackup policy created for backing up Greenplum
Database. NetBackup options are not supported if DDBoost backup options
are specified..
--netbackup-schedule <netbackup_schedule>
The name of the NetBackup schedule created for backing up Greenplum
Database. NetBackup options are not supported if DDBoost backup options
are specified.
--no-owner
Do not output commands to set object ownership.
--no-privileges
Do not output commands to set object privileges (GRANT/REVOKE commands).
-o (clear old dump files only)
Clear out old dump files only, but do not run a dump. This will remove
the oldest dump directory except the current date's dump directory.
All dump sets within that directory will be removed.
WARNING: Before using this option, ensure that incremental backups
required to perform the restore are not deleted. The gpdbrestore utility
option --list-backup lists the backup sets required to perform a
restore.
You can specify the option --cleanup-date or --cleanup-total to specify
backup sets to delete.
If --ddboost is specified, only the old files on Data Domain Boost are
deleted.
If --incremental is specified, this option is not supported.
--oids
Include object identifiers (oid) in dump data.
If --incremental is specified, this option is not supported.
--prefix <prefix_string> [--list-filter-tables ]
Prepends <prefix_string> followed by an underscore character (_) to the
names of all the backup files created during a backup.
--list-filter-tables is optional. When you specify both options,
gpcrondump does not perform a backup. For the full backup created by
gpcrondump that is identified by the <prefix-string>, the tables that were
included or excluded for the backup are listed. You must also specify
the --incremental option if you specify the --list-filter-tables option.
If --ddboost is specified, --prefix <prefix_string> [--list-filter-tables]
is not supported.
-q (no screen output)
Run in quiet mode. Command output is not displayed on the screen, but is
still written to the log file.
-r (rollback on failure)
Rollback the dump files (delete a partial dump) if a failure is
detected. The default is to not rollback.
Note: This option is not supported if --ddboost is specified.
-R <post_dump_script>
The absolute path of a script to run after a successful dump operation.
For example, you might want a script that moves completed dump files to
a backup host. This script must reside in the same location on the
master and all segment hosts.
--rsyncable
Passes the --rsyncable flag to the gzip utility to synchronize the
output occasionally, based on the input during compression. This
synchronization increases the file size by less than 1% in most cases.
When this flag is passed, the rsync(1) program can synchronize
compressed files much more efficiently. The gunzip utility cannot
differentiate between a compressed file created with this option, and
one created without it.
-s <schema_name>
Dump all the tables that are qualified by the specified schema in the
database. The -s option can be specified multiple times. System catalog
schemas are not supported. If you want to specify multiple schemas, you
can also use the --schema-file=<filename> option in order not to exceed
the maximum token limit.
Only a set of tables or set of schemas can be specified. For example,
the -s option cannot be specified with the -t option.
If --incremental is specified, this option is not supported.
-S <schema_name>
A schema name to exclude from the database dump. The -S option can be
specified multiple times. If you want to specify multiple schemas, you
can also use the --exclude-schema-file=<filename> option in order not to
exceed the maximum token limit.
Only a set of tables or set of schemas can be specified. For example,
this option cannot be specified with the -t option.
If --incremental is specified, this option is not supported.
-t <schema>.<table>
Dump only the named table in this database. The -t option can be
specified multiple times. If you want to specify multiple tables, you
can also use the --table-file=<filename> option in order not to exceed the
maximum token limit.
Only a set of tables or set of schemas can be specified. For example,
this option cannot be specified with the -s option.
If --incremental is specified, this option is not supported.
-T <schema>.<table>
A table name to exclude from the database dump. The -T option can be
specified multiple times. If you want to specify multiple tables, you
can also use the --exclude-table-file=<filename> option in order not to
exceed the maximum token limit.
Only a set of tables or set of schemas can be specified. For example,
this option cannot be specified with the -s option.
If --incremental is specified, this option is not supported.
--exclude-schema-file=<filename>
Excludes all the tables that are qualified by the specified schemas
listed in the <filename> from the database dump. The file <filename>
contains any number of schemas, listed one per line.
Only a set of tables or set of schemas can be specified. For example,
this option cannot be specified with the -t option.
If --incremental is specified, this option is not supported.
--exclude-table-file=<filename>
Excludes all tables listed in the <filename> from the database dump. The
file <filename> contains any number of tables, listed one per line.
Only a set of tables or set of schemas can be specified. For example,
this option cannot be specified with the -s option.
If --incremental is specified, this option is not supported.
--schema-file=<filename>
Dumps only the tables that are qualified by the schemas listed in the
<filename>. The file <filename> contains any number of schemas, listed one
per line.
Only a set of tables or set of schemas can be specified. For example,
this option cannot be specified with the -t option.
If --incremental is specified, this option is not supported.
--table-file=<filename>
Dumps only the tables listed in the <filename>. The file <filename> contains
any number of tables, listed one per line.
If --incremental is specified, this option is not supported.
-u <backup_directory>
Specifies the absolute path where the backup files will be placed on
each host. If the path does not exist, it will be created, if possible.
If not specified, defaults to the data directory of each instance to be
backed up. Using this option may be desirable if each segment host has
multiple segment instances as it will create the dump files in a
centralized location rather than the segment data directories.
Note: This option is not supported if --ddboost is specified.
--use-set-session-authorization
Use SET SESSION AUTHORIZATION commands instead of ALTER OWNER commands
to set object ownership.
-v | --verbose
Specifies verbose mode.
--version (show utility version)
Displays the version of this utility.
-x <database_name>
Required. The name of the Greenplum database to dump. Specify multiple times for
multiple databases.
-y <reportfile>
This option is deprecated and will be removed in a future release. If
specified, a warning message is returned stating that the -y option is
deprecated.
Specifies the full path name where a copy of the backup job log file is
placed on the master host. The job log file is created in the master
data directory or if running remotely, the current working directory.
-z (no compression)
Do not use compression. Default is to compress the dump files using
gzip.
We recommend this option (-z) be used for NFS and Data Domain Boost
backups.
-? (help)
Displays the online help.
*****************************************************
File Format for Customized Emails
*****************************************************
You can configure gpcrondump to send an email notification after a back
up operation completes for a database. To customize the From and Subject
lines of the email that are sent for a database, you create a YAML file
and specify the location of the file with the option --email-file. In
the YAML file, you can specify a different From and Subject line for
each database that gpcrondump backs up. This is the format of the YAML
file to specify a custom From and Subject line for a database:
EMAIL_DETAILS:
-
DBNAME: <database_name>
FROM: <from_user>
SUBJECT: <subject_text>
When email notification is configured for gpcrondump, the <from_user> and
the <subject_text> are the strings that gpcrondump uses in the email
notification after completing the back up for <database_name>.
This example YAML file specifies different From and Subject lines for
the databases testdb100 and testdb200.
EMAIL_DETAILS:
-
DBNAME: testdb100
FROM: RRP_MPE2_DCA_1
SUBJECT: backup completed for Database 'testdb100'
-
DBNAME: testdb200
FROM: Report_from_DCDDEV_host
SUBJECT: Completed backup for database 'testdb200'
*****************************************************
EXAMPLES
*****************************************************
Call gpcrondump directly and dump mydatabase (and global objects):
gpcrondump -x mydatabase -c -g -G
A crontab entry that runs a backup of the sales database (and global
objects) nightly at one past midnight:
01 0 * * * /home/gpadmin/gpdump.sh >> gpdump.log
The content of dump script gpdump.sh is:
#!/bin/bash
export GPHOME=/usr/local/greenplum-db
export MASTER_DATA_DIRECTORY=/data/gpdb_p1/gp-1
. $GPHOME/greenplum_path.sh
gpcrondump -x sales -c -g -G -a -q
This example creates two text files, one with the suffix _pipes and the
other with _regular_files. The _pipes file contain the file names that
can be named pipes when you backup the Greenplum database mytestdb.
gpcrondump -x mytestdb -K 20131030140000 --list-backup-files
To use incremental backup with a set of database tables, you must
create a full backup of the set of tables and specify the --prefix
option to identify the backup set. The following example uses the
--table-file option to create a full backup of the set of files listed
in the file user-tables. The prefix user_backup identifies the backup
set.
gpcrondump -x mydatabase --table-file=user-tables --prefix user_backup
To create an incremental backup for the full backup created in the
previous example, specify the --incremental option and the option
--prefix user_backup to identify backup set. This example creates an
incremental backup.
gpcrondump -x mydatabase --incremental --prefix user_backup
This command lists the tables that were included or excluded for the
full backup.
gpcrondump -x mydatabase --incremental --prefix user_backup --list-filter-tables
This command backs up the database customer and specifies a NetBackup
policy and schedule that are defined on the NetBackup master server
nbu_server1. A block size of 1024 bytes is used to transfer data to the
NetBackup server.
gpcrondump -x customer --netbackup-service-host=nbu_server1
--netbackup-policy=gpdb_cust --netbackup-schedule=gpdb_backup
--netbackup-block-size=1024
*****************************************************
SEE ALSO
*****************************************************
gpdbrestore
[gpadmin@mdw ~]$
===============================================================================================================================
x