Greenplum Database New Features - Enhancements in Managing and Migrating Greenplum Database Objects and Data

1. Enhancement for Restoring Data from a Greenplum Database Backup (GPDB 4.3.1.0)

2. Managing and Migrating Greenplum Database Objects and Data (GPDB 4.3.2.0)   a) 2 New utilities gptransfer, gpreload added. b) The --redirect and --report-status-dir options have been added to the Greenplum Database utility gpdbrestore.

3. Managing Greenplum Database Objects and Data : a) The GRANT and REVOKE commands support the TRUNCATE privilege on a table ( GPDB 4.3.3.0)  b) Capturing errors that occur from reading data from external data sources does not require an error table (GPDB 4.3.3.0) c) enables delta compression for compressed columns in append-optimized, column-oriented tables(GPDB 4.3.3.0)

Enhancements in Managing and Migrating Greenplum Database Objects and Data (4.3.1.0, 4.3.2.0, 4.3.3.0)

1. [ Added in GPDB 4.3.1.0] : When you run gpdbrestore command to restore your database/tables from a backup, your tables automatically gets analyzed after backup. Analyzing large tables takes long time and hence your restore time increase exponentially. To shorten the restore time, greenplum added a new option in gpdbrestore utility called noanalyze. 

The new option --noanalyze for the Greenplum Database gpdbrestore utility disables ANALYZE of tables during restore.

The default action is to run the ANALYZE command after a restore. This option is useful if running ANALYZE on tables in your database requires a significant amount of time. If you specify this option, you should run ANALYZE manually on restored tables. 

Note:  It is important to analyze the tables after restore. If you skip this step following a restore then you might result in poor database performance.

2. [ Added in GPDB 4.3.2.0] gptransfer utility : The gptransfer utility copies database objects from databases in a source Greenplum Database system to databases in a destination Greenplum Database system. The gptransfer utility is supported only with the IPv4 protocol. The gptransfer utility is not supported with the IPv6 protocol.

*****************************************************

SYNOPSIS

*****************************************************

gptransfer  { --full | [-d <database1> [ -d <database2> ... ]]

  [-t <db.schema.table> [ -t <db1.schema1.table1> ... ]]

  [-f <table_file> ] } 

  [--delimiter <delim> ]

  [--skip-existing | --truncate | --drop] 

  [--analyze] [--validate=<type> ] [-x] [--dry-run] 

  [--schema-only ]

  [--source-host=<source_host> [--source-port=<source_port>] 

  [--source-user=<source_user>] ]

  [--base-port=<base_gpfdist_port>]

  [--dest-host=<dest_host> --source-map-file=<host_map_file> 

  [--dest-port=<dest_port>] [--dest-user=<dest_user>] ]

  [--dest-database=<dest_database_name>] 

  [--batch-size=<batch_size>] [--sub-batch-size=<sub_batch_size>]

  [--timeout <seconds>]

  [--max-line-length=<length>] 

  [--work-base-dir=<work_dir>] [-l <log_dir>] 

  [-v | --verbose] 

  [-q |--quiet] 

  [-a] 

gptransfer --version 

gptransfer h |-? | --help 

*****************************************************

DESCRIPTION

*****************************************************

The gptransfer utility copies database objects from a source Greenplum Database system to a destination system. You can perform one of the

following types of operations:

* Copy a Greenplum Database system with the --full option.

This option copies all user created databases in a source system to a different destination system. If you specify the --full option, you must

specify both a source and destination system. The destination system cannot contain any user defined databases, only the default databases

postgres, template0, and template1.

* Copy a set of user defined database tables to a destination system. The -f, and -t options copy a specified set of user defined tables and

table data, and re-creates the table indexes. The -d option copies all user defined tables and table data, and re-creates the table indexes

from a specified database.

If the destination system is the same as the source system, you must also specify a destination database with the --dest-database option.

When you specify a destination database, the source database tables are copied into the specified destination database.

If an invalid set of gptransfer options are specified, or if a specified source table or database does not exist, gptransfer returns an error and

quits. No data is copied.

To copy database objects between Greenplum Database systems gptransfer utility uses:

* The Greenplum Database utility gpfdist on the source database system. The gpfdists protocol is not supported

* Writeable external tables on the source database system and readable external tables on the destination database system.

* Named pipes that transfer the data between a writeable external table and a readable external table.

When copying data into the destination system, it is redistributed on the Greenplum Database segments of the destination system. This is the

flow of data when gptransfer copies database data:

writable external table > gpfidst > named pipe > gpfdist > readable external table

*****************************************************

NOTES

*****************************************************

When copying database data between different Greenplum Database systems, gptransfer requires a text file that lists all the source segment host 

names and IP addresses. Specify the name and location of the file with the --source-map-file option. If the file is missing or not all segment 

hosts are listed, gptransfer returns an error and quits. See the description of the option for file format information. 

The source and destination Greenplum Database segment hosts need to be able to communicate with each other. To ensure that the segment hosts 

can communicate, you can use a tool such as the Linux netperf utility. 

If a filespace has been created for a source Greenplum Database system, a corresponding filespace must exist on the target system. 

SSH keys must be exchanged between the two systems before using gptransfer. The gptransfer utility connects to the source system with 

SSH to create the named pipes and start the gpfdist instances. You can use the Greenplum Database gpssh-exkeys utility with a list of all the 

source and destination primary hosts to exchange keys between Greenplum Database hosts. 

Source and destination systems must be able to access the gptransfer work directory. The default directory is the user's home directory. You 

can specify a different directory with the --work-base-dir option. 

The gptransfer utility does not move configuration files such as postgres.conf and pg_hba.conf. You must set up the destination system 

configuration separately. 

The gptransfer utility does not move external objects such as Greenplum Database extensions, third-party jar files, and shared object files. You 

must install the external objects separately. 

The gptransfer utility does not move dependent database object unless you specify the --full option. For example, if a table has a default 

value on a column that is a user-defined function, that function must exist in the destination system database when using the -t, -d, or -f 

options. 

If you move a set of database tables with the -d, -t, and -f options, and the destination table or database does not exist, gptransfer creates 

it. The utility re-creates any indexes on tables before copying data. 

If a table exists on the destination system and one of the options --skip-existing, --truncate, or --drop is not specified, gptransfer returns an error and quits. 

If an error occurs when during the process of copying a table, or table validation fails, gptransfer continues copying the other specified tables. After gptransfer finishes, it displays a list of tables where an error occurred, writes the names of tables that failed into a text file, and then prints the name of the file. You can use this file with the gptransfer -f option to retry copying tables. 

The name of the file that contains the list of tables where errors occurred is failed_migrated_tables_<yyyymmdd_hhmmss>.txt. The <yyyymmdd_hhmmss> is a time stamp when the gptransfer process was started. The file is created in the directory were gptransfer is executed. 

The gp_external_max_segs server configuration parameter controls the number of segment instances that can access a single gpfdist instance 

simultaneously. Setting a low value might affect gptransfer performance. For information about the parameter, see the "Greenplum Database 

Reference Guide". 

*****************************************************

LIMITATION FOR THE SOURCE AND DESTINATION SYSTEMS

*****************************************************

If you are copying data from a system with a larger number of segments to a system fewer number of segment hosts. The total number of primary 

segments on the destination system must be greater than or equal to the total number of segment hosts on the source system. 

For example, a quarter rack V1 DCA has a total of 24 primary segments. The means the source side cannot have more than 24 segment hosts (one 

and one-half racks). 

When you copy data from a source Greenplum Database system with a larger number of primary segment instances than on the destination system, the 

data transfer might be slower when compared to a transfer where the source system has fewer segment instances than the destination system. 

The gptransfer utility uses a different configuration of named pipes and gpfdist instances in the two situations. 

*****************************************************

OPTIONS

*****************************************************

-a 

 Quiet mode, do not prompt the user for confirmation. 

--analyze 

 Run the ANALYZE command on non-system tables. The default is to not run  the ANALYZE command. 

--base-port=<base_gpfdist_port> 

 Base port for gpfdist on source segment systems. If not specified, the  default is 8000. 

--batch-size=<batch_size> 

 Sets the maximum number of tables that gptransfer concurrently copies to  the destination database. If not specified, the default is 2. The 

 maximum is 10. 

-d <database_name> 

 A source database to copy. This option can be specified multiple times  to copy multiple databases to the destination system. All the user  defined tables and table data are copied to the destination system.  Wildcard characters are not supported. 

 If the source database does not exist, gptransfer returns an error and  quits. If a destination database does not exist a database is created. 

 Alternatively, specify the -t or -f option to copy a specified set of  tables. 

 Not valid with the --full option. 

--delimiter=<delim> 

 Delimiter to use for external tables. Specify a single ASCII character  that separates columns within each row of data. The default is octal 

 character \001, the SOH character. 

--dest-database=<dest_database_name> 

 The database in the destination Greenplum Database system. If not  specified, the source tables are copied into a destination system 

 database with the same name as the source system database. 

 This option is required if the source and destination Greenplum Database  systems are the same. 

 If destination database does not exist, it is created. 

 Not valid with the --full option. 

--dest-host=<dest_host> 

 Destination Greenplum Database hostname or IP address. If not specified,  the default is the host the system running gptransfer (127.0.0.1) 

--dest-port=<dest_port> 

 Destination Greenplum Database port number, If not specified, the  default is 5432. 

--dest-user=<dest_user> 

 User ID that is used to connect to the destination Greenplum Database  system. If not specified, the default is the user gpadmin. 

--drop 

 Specify this option to drop the table that is in the destination  database if it already exists. Before copying table data, gptransfer 

 drops the table and creates it again. 

 At most, only one of the options can be specified --skip-existing,  --truncate, or --drop. If one of them is not specified and the table 

 exists in the destination system, gptransfer returns an error and quits. 

 Not valid with the --full option. 

--dry-run 

 When you specify this option, gptransfer generates a list of the  migration operations that would have been performed with the specified 

 options. The data is not migrated. 

 The information is displayed at the command line and written to the log  file. 

-f <table_file> 

 The location and name of file containing list of fully qualified table  names to copy from the Greenplum Database source system. In the text 

 file, you specify a single fully qualified table per line. Wildcard  characters are not supported. 

  sales.public.clients

  sales.public.merchants

  demo.test.lineitem 

 If the source table does not exist, gptransfer returns an error and  quits. If the destination database or table does not exist, it is 

 created. 

 Only the table and table data are copied and indexes are re-created.  Dependent objects are not copied. 

 You cannot specify views, or system catalog tables. The --full option  copies user defined views. 

 If you specify the -d option to copy all the tables from a database, you  do not need to specify individual tables from the database. 

 

 Not valid with the --full option. 

--full 

 Full migration of a Greenplum Database source system to a destination  system. You must specify the options for the destination system, the 

 --source-map-file option, the --dest-host option, and if necessary, the  other destination system options. 

 The --full option cannot be specified with the -t, -d, or -f options. 

 A full migration copies all database objects including, tables,  indexes, users, roles, functions, and resource queues for all user  defined databases. The default databases, postgres, template0 and  template1 are not moved. 

 If a database exists in the destination system, besides the default  postgres, template0 and template1 databases, gptransfer returns an error 

 and quits. 

-l <log_dir> 

 Specify the gptransfer log file directory. If not specified, the default 

 is ~/AdminLogs. 

 --max-line-length=<length> 

 Sets the maximum allowed data row length in bytes for the gpfidst  utility. If not specified, the default is 10485760. Valid range is 

 32768 (32K) to 268435456 (256MB). 

 Should be used when user data includes very wide rows (or when line too  long error message occurs). Should not be used otherwise as it increases 

 resource allocation. 

-q | --quiet 

 If specified, suppress status messages. Messages are only sent to the  log file. 

--schema-only 

 Create only the schemas specified by the command. Data is not transferred. 

 If specified with the --full option, gptransfer replicates the complete  database schema, including all tables, indexes, views, user defined 

 types (UDT), and user defined functions (UDF) for the source databases. 

 No data is transferred. 

 If you specify tables with the -t or -f option with --schema-only,  gptransfer creates only the tables and indexes. No data is transferred. 

 Not valid with the --truncate option. 

--skip-existing 

 Specify this option to skip copying a table from the source database if  the table already exists in the destination database. 

 At most, only one of the options can be specified --skip-existing,  --truncate, or --drop. If one of them is not specified and the table 

 exists in the destination system, gptransfer returns an error and quits. 

 Not valid with the --full option. 

--source-host=<source_host> 

 Source Greenplum Database host name or IP address. If not specified, the  default host is the system running gptransfer (127.0.0.1). 

--source-map-file=<host_map_file> 

 File that lists source segment host name and IP addresses. If the file  is missing or not all segment hosts are listed, gptransfer returns an 

 error and quits. 

 Each line of the file contains a source host name and the host IP  address separated by a comma: hostname,IPaddress. This example lists 

 four Greenplum Database hosts and their IP addresses. 

  sdw1,172.168.2.1

  sdw2,172.168.2.2

  sdw3,172.168.2.3

  sdw4,172.168.2.4 

 This option is required if the --full option is specified or if the  source Greenplum Database system is different than the destination  system. This option is not required if source and destination systems  are the same. 

--source-port=<source_port> 

 Source Greenplum Database port number. If not specified, the default is  5432. 

--source-user=<source_user> 

 User ID that is used to connect to the source Greenplum Database system.  If not specified, the default is the user gpadmin. 

--sub-batch-size=<sub_batch_size> 

 Specifies the maximum degree of parallelism of the operations performed  when migrating a table such as starting gpfdist instances, creating 

 named pipes for the move operations. If not specified, the default is  25. The maximum is 50. 

 Specify the --batch-size option to control the the maximum number of  tables that gptransfer concurrently processes. 

 

 -t <db.schema.table> 

 A table from the source database system to copy. The fully qualified  table name must be specified, for example -t sales.public.clients. 

 Wildcard characters are not supported. 

 If the destination table or database does not exist, it is created. This  option can be specified multiple times to include multiple tables. Only 

 the table and table data are copied and indexes are re-created.  Dependent objects are not copied. 

 If the source table does not exist, gptransfer returns an error and  quits. 

 If you specify the -d option to copy all the tables from a database, you  do not need to specify individual tables from the database. 

 Not valid with the --full option. 

--timeout <seconds> 

 Specify the time out value in seconds that gptransfer passes the gpfdist  processes that gptransfer uses. The value is the time allowed for 

 Greenplum Database to establish a connection to a gpfdist process. You  might need to increase this value when operating on high-traffic 

 networks. 

 The default value is 300 seconds (5 minutes). The minimum value is 2  seconds, the maximum value is 600 seconds. 

--truncate 

 Specify this option to truncate the table that is in the destination  database if it already exists. 

 At most, only one of the options can be specified --skip-existing, 

 --truncate, or --drop. If one of them is not specified and the table 

 exists in the destination system, gptransfer returns an error and quits. 

 Not valid with the --full option. 

--validate=<type> 

 Perform data validation on table data. These are the supported types of  validation. 

  count - Specify this value to compare row counts between source and destination table data. 

  MD5 - Specify this value to compare MD5 values between source and destination table data. 

 If validation for a table fails, gpfidist displays the name of the table  and writes the file name to the text file  failed_migrated_tables_<yyyymmdd_hhmmss>.txt. The yyyymmdd_hhmmss is a  time stamp when the gptransfer process was started. The file is created  in the directory were gptransfer is executed. 

 NOTE: The file contains the table names were validation failed or other  errors occurred during table migration. 

-v |--verbose 

 If specified, sets the logging level to verbose. Additional log 

 information is written to the log file and the command line during 

 command execution. 

--work-base-dir=<work_dir> 

 Specify the directory that gptransfer uses to store temporary working  files such as PID files and named pipes. The default directory is the  user's home directory. 

 Source and destination systems must be able to access the gptransfer  work directory. 

-x 

 Acquire an exclusive lock on tables during the migration to prevent  insert or updates. 

 On the source database, an exclusive lock is acquired when gptransfer  inserts into the external table and is released after validation. 

 On the destination database, an exclusive lock is acquired when  gptransfer selects from external table and released after validation. 

 NOTE: If -x option is not specified and --validate is specified,  validation failures occur if data is inserted into either the source or 

 destination table during the migration process. The gptransfer utility  displays messages if validation errors occur. 

-h | -? | --help 

 Displays the online help. 

--version 

 Displays the version of this utility. 

*****************************************************

EXAMPLE

*****************************************************

This command copies the table public.t1 from the database db1 and all tables in the database db2 to the system mytest2. 

 gptransfer -t db1.public.t1 -d db2 --dest-host=mytest2 

    --source-map-file=gp-source-hosts --truncate

If the databases db1 and db2 do not exist on the system mytest2, they are created. If any of the source tables exist on the destination 

system, gptransfer truncates the table and copies the data from the source to the destination table. 

*****************************************************

See Also

*****************************************************

gpfdist 

See the "Greenplum Database Database Administrator Guide" for information about loading and unloading data. 

[gpadmin@localhost ~]$ 

3. [ Added in GPDB 4.3.2.0] gpreload utility :  The gpreload utility reloads table data with column data sorted. For tables that were created with the table storage option APPENDONLY=TRUE and compression enabled, reloading the data with sorted data can improve table compression. You specify a list of tables to be reloaded the table column to be sorted.

COMMAND NAME: gpreload

Reloads Greenplum Database table data sorting the data based on specified columns.

*****************************************************

SYNOPSIS

*****************************************************

gpreload -d <database> [-p <port>]    {-t |--table-file} <path_to_file>  [-a ]

gpreload -h | --version

*****************************************************

DESCRIPTION

*****************************************************

The gpreload utility reloads table data with column data sorted. For tables that were created with the table storage option APPENDONLY=TRUE

and compression enabled, reloading the data with sorted data can improve table compression. You specify a list of tables to be reloaded the table

column to be sorted in a text file.

Compression is improved by sorting data when the data in the column has a relatively low number of distinct values when compared to the total

number of rows.

For a table being reloaded, the order of the columns to be sorted might affect compression. The columns with fewest distinct values should be

listed first. For example, listing state then city would generally result in better compression than listing city then state.

 public.cust_table: state, city

 public.cust_table: city, state

For information about the format of the file used with gpreload, see the

--table-file option.

*****************************************************

NOTES

****************************************************

To improve reload performance, indexes on tables being reloaded should be removed before reloading the data.

Running the ANALYZE command after reloading table data might query performance because of a change in the data distribution of the reloaded

data.

*****************************************************

OPTIONS

*****************************************************

-a (do not prompt)

 Optional. If specified, the gpreload utility does not prompt the user  for confirmation. 

 -d <database>

 The database that contains the tables to be reloaded. The gpreload  utility connects to the database as the user running the utility. 

-p <port> 

 The Greenplum Database master port. If not specified, the value of the  PGPORT environment variable is used. If the value is not available, an 

 error is returned. 

 {-t | --table-file } <path_to_file> 

 The location and name of file containing list of schema qualified table  names to reload and the column names to reorder from the Greenplum 

 Database. Only user defined tables are supported. Views or system  catalog tables are not supported. 

 If indexes are defined on table listed in the file, gpreload prompts to  continue. 

 Each line specifies a table name and the list of columns to sort. This  is the format of each line in the file: 

  <schema>.<table_name>: <column> [desc] [, <column2> [desc] ... ]

 The table name is followed by a colon ( : ) and then at least one column  name. If you specify more than one column, separate the column names 

 with a comma. The columns are sorted in ascending order. Specify the  keyword desc after the column name to sort the column in descending 

 order. 

 Wildcard characters are not supported. 

 If there are errors in the file, gpreload reports the first error and  exits. No data is reloaded. 

 The following example reloads three tables. 

  public.clients: region, state, rep_id desc

  public.merchants: region, state

  test.lineitem: group, assy, whse 

 In the first table public.clients, the data in the rep_id column is  sorted in descending order. The data in the other columns are sorted in 

 ascending order. 

--version (show utility version)

 Displays the version of this utility.

-? (help)

 Displays the online help.

*****************************************************

EXAMPLE

*****************************************************

This example command reloads the tables in the database mytest that are listed in the file data-tables.txt. 

 gpreload -d mytest --file data-tables.txt

*****************************************************

SEE ALSO

*****************************************************

The CREATE TABLE command in the "Greenplum Database Reference Guide."       

4. [ Added in GPDB 4.3.2.0] : The --redirect and --report-status-dir options 

The --redirect and --report-status-dir options have been added to the Greenplum Database utility gpdbrestore.

--redirect: When restoring data from a backup created with the gpcrondump utility, you can specify a the --redirectoption and a database name to restore data to a database that is different than the database specified during back up.

--report-status-dir: If you run a the gpcrondump operation with the -u option to specify the backup directory and that is read only, you can specify --report-status-dir so that gpcrondump writes the report files to a directory with write privileges. The --report-status-dir option specify the absolute path to the directory on the each Greenplum Database host (master and segment hosts) where gpdbrestore writes report status files for a restore operation.

                                                                                                             

[sachi@localhost ~]$ gpdbrestore -help

COMMAND NAME: gpdbrestore

Restores a database from a set of dump files generated by gpcrondump.

*****************************************************

SYNOPSIS

*****************************************************

gpdbrestore { -t <timestamp_key> { [-L]

| [--netbackup-service-host <netbackup_server>

[--netbackup-block-size <size>] ] }

| -b <YYYYMMDD>

| -R <hostname>:<path_to_dumpset>

| -s <database_name> }

[--noplan] [--noanalyze] [-u <backup_directory>] [--list-backup]

[--prefix <prefix_string> ] [--report-status-dir <report_directory> ]

[-T <schema>.<table> [,...]] [--table-file <file_name>]

[--truncate] [-e] [-G]

[-B <parallel_processes>]

[-d <master_data_directory>] [-a] [-q] [-l <logfile_directory>]

[-v] [--ddboost ]

[--redirect <database_name> ]

gpdbrestore -?

gpdbrestore --version

*****************************************************

DESCRIPTION

*****************************************************

The gpdbrestore utility recreates the data definitions (schema) and user data in a Greenplum database using the script files created by gpcrondump operations.

When you restore from an incremental backup, the gpdbrestore utility assumes the complete backup set is available. The complete backup set includes the following backup files:

* The last full backup before the specified incremental backup

* All incremental backups created between the time of the full backup the specified incremental backup The gpdbrestore utility provides the following functionality:

* Automatically reconfigures for compression.

* Validates the number of dump files are correct (For primary only, mirror only, primary and mirror, or a subset consisting of some mirror and primary segment dump files).

* If a failed segment is detected, restores to active segment instances.

* Except when restoring data from a NetBackup server, you do not need to know the complete timestamp key (-t) of the backup set to restore. Additional options are provided to instead give just a date (-b), backup set directory location (-R), or database name (-s) to restore.

* The -R option allows the ability to restore from a backup set located on a host outside of the Greenplum Database array (archive host). Ensures that the correct dump file goes to the correct segment instance.

* Identifies the database name automatically from the backup set.

* Allows you to restore particular tables only (-T option) instead of the entire database. Note that single tables are not automatically dropped or truncated prior to restore. Performs an ANALYZE operation on the tables that are restored. You can disable the ANALYZE operation by specifying the option --noanalyze.

* Can restore global objects such as roles and tablespaces (-G option).

* Detects if the backup set is primary segments only or primary and mirror segments and passes the appropriate options to gp_restore.

* Allows you to drop the target database before a restore in a single operation.

*****************************************************

Restoring a Database from NetBackup

*****************************************************

Greenplum Database must be configured to communicate with the Symantec NetBackup master server that is used to restore database data. See "Backing Up and Restoring Databases" in the "Greenplum Database Administrator Guide" for information about configuring Greenplum Database and NetBackup.

When restoring from NetBackup server, you must specify the timestamp of the backup with the -t option.

NetBackup is not compatible with DDBoost. Both NetBackup and DDBoost cannot be used in a single back up operation.

*****************************************************

Restoring a Database with Named Pipes

*****************************************************

If you used named pipes when you backed up a database with gpcrondump, named pipes with the backup data must be available when restoring the database from the backup.

*****************************************************

Error Reporting

*****************************************************

gpdbrestore does not report errors automatically. After the restore is completed, check the report status files to verify that there are no errors. The restore status files are stored in the db_dumps/<date>/directory by default.

*****************************************************

OPTIONS

*****************************************************

-a (do not prompt)

Do not prompt the user for confirmation.

-b <YYYYMMDD>

Looks for dump files in the segment data directories on the Greenplum Database array of hosts in db_dumps/<YYYYMMDD>. If --ddboost is specified, the systems looks for dump files on the Data Domain Boost host.

-B <parallel_processes>

The number of segments to check in parallel for pre/post-restore validation. If not specified, the utility will start up to 60 parallel processes depending on how many segment instances it needs to restore.

-d <master_data_directory>

Optional. The master host data directory. If not specified, the value set for $MASTER_DATA_DIRECTORY will be used.

--ddboost

Use Data Domain Boost for this restore, if the --ddboost option was passed when the data was dumped. Before using Data Domain Boost, make sure the one-time Data Domain Boost credential setup is complete. See the Greenplum Database System Administrator Guide for details.

If you backed up Greenplum Database configuration files with the gpcrondump option -g and specified the --ddboost option, you must manually restore the backup from the Data Domain system. The configuration files must be restored for the Greenplum Database master and all the hosts and segments. The backup location on the Data Domain

system is the directory GPDB/<backup_directory>/<date>. The <backup_directory> is set when you specify the Data Domain credentials with gpcrondump.

This option is not supported if --netbackup-service-host is specified.

-e (drop target database before restore)

Drops the target database before doing the restore and then recreates it.

-G (restore global objects)

Restores global objects such as roles and tablespaces if the global object dump file db_dumps/<date>/gp_global_1_1_<timestamp> is found in the master data directory.

-l <logfile_directory>

The directory to write the log file. Defaults to ~/gpAdminLogs.

--list-backup

Lists the set of full and incremental backup sets required to perform a restore based on the <timestamp_key> specified with the -t option and the location of the backup set.

This option is supported only if the <timestamp_key> is for an incremental backup.

-L (list tablenames in backup set)

When used with the -t option, lists the table names that exist in the named backup set and exits. Does not perform a restore.

--netbackup-block-size <size>

Specify the block size, in bytes, of data being transferred from the Symantec NetBackup server. The default is 512 bytes.

NetBackup options are not supported if DDBoost backup options are specified.

--netbackup-service-host <netbackup_server>

The NetBackup master server that Greenplum Database connects to when backing up to NetBackup. If you specify this option, you must specify the timestamp of the backup with the -t option.

This option is not supported with any of the these options:

-R, -s, -b, -L, or --ddboost.

--noanalyze

The ANALYZE command is not run after a successful restore. The default is to run the ANALYZE command on restored tables. This option is useful if running ANALYZE on the tables requires a significant amount of time.

If this option is specified, you should run ANALYZE manually on restored tables. Failure to run ANALYZE following a restore might result in poor database performance.

--noplan

Restores only the data backed up during the incremental backup specified by the timestamp_key. No other data from the complete backup set are restored. The full backup set containing the incremental backup must be available. If the timestamp_key specified with the -t option does not reference an incremental backup, an error is returned.

--prefix <prefix_string>

If you specified the gpcrondump option --prefix <prefix_string> to create the backup, you must specify this option with the <prefix_string> when restoring the backup.

If you created a full backup of a set of tables with gpcrondump and specified a prefix, you can use gpcrondump with the options --list-filter-tables and --prefix <prefix_string> to list the tables that were included or excluded for the backup.

-q (no screen output)

Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file.

-R <hostname>:<path_to_dumpset>

Allows you to provide a hostname and full path to a set of dump files. The host does not have to be in the Greenplum Database array of hosts, but must be accessible from the Greenplum master.

--redirect <database_name>

The name of the database where the data is restored. Specify this option to restore data to a database that is different than the database specified during back up. If <database_name> does not exist, it is created.

--report-status-dir <report_directory>

Specifies the absolute path to the directory on the each Greenplum Database host (master and segment hosts) where gpdbrestore writes report status files for a restore operation. If <report_directory> does not exist or is not writable, gpdbrestore returns an error and stops.

If this option is not specified and the -u option is specified, report status files are written to the location specified by the -u option if the -u location is writable. If the location specified by -u option is not writable, the report status files are written to segment data directories.

-s <database_name>

Looks for latest set of dump files for the given database name in the segment data directories db_dumps directory on the Greenplum Database array of hosts.

-t <timestamp_key>

The 14 digit timestamp key that uniquely identifies a backup set of data to restore. It is of the form YYYYMMDDHHMMSS. Looks for dump files matching this timestamp key in the segment data directories db_dumps directory on the Greenplum Database array of hosts.

-T <schema>.<table_name>

A comma-separated list of specific table names to restore. The named table(s) must exist in the backup set of the database being restored. Existing tables are not automatically truncated before data is restored from backup. If your intention is to replace existing data in the table from backup, truncate the table prior to running gpdbrestore -T.

--table-file <file_name>

Specify a file <file_name> that contains a list of table names to restore. The file contains any number of table names, listed one per line. See the -T option for information about restoring specific tables.

--truncate

Truncate table data before restoring data to the table from the backup.This option is supported only when restoring a set of tables with the option -T or --table-file.

This option is not supported with the -e option.

-u <backup_directory>

Specifies the absolute path to the directory containing the db_dumps directory on each host. If not specified, defaults to the data directory of each instance to be backed up. Specify this option if you specified a backup directory with the gpcrondump option -u when creating a backup set.

If <backup_directory> is not writable, backup operation report status files are written to segment data directories. You can specify a different location where report status files are written with the --report-status-dir option.

NOTE: This option is not supported if --ddboost is specified.

-v | --verbose

Specifies verbose mode.

--version (show utility version)

Displays the version of this utility.

-? (help)

Displays the online help.

*****************************************************

EXAMPLES

*****************************************************

Restore the sales database from the latest backup files generated by gpcrondump (assumes backup files are in the segment data directories in db_dumps):

gpdbrestore -s sales

Restore a database from backup files that reside on an archive host outside the Greenplum Database array (command issued on the Greenplum master host):

gpdbrestore -R archivehostname:/data_p1/db_dumps/20080214

Restore global objects only (roles and tablespaces):

gpdbrestore -G

NOTE: The -R option is not supported when restoring a backup set that includes incremental backups.

If you restore from a backup set that contains an incremental backup, all the files in the backup set must be available to gpdbrestore. For example, the following

timestamp keys specify a backup set. 20120514054532 is the full backup and the others are incremental.

20120514054532

20120714095512

20120914081205

20121114064330

20130114051246

The following gbdbrestore command specifies the timestamp key 20121114064330. The incremental backup with the timestamps 20120714095512 and 20120914081205

and the full backup must be available to perform a restore.

gpdbrestore -t 20121114064330

The following gbdbrestore command uses the --noplan option to restore only the data that was backed up during the incremental backup with the timestamp key

20121114064330. Data in the previous incremental backups and the data in the full backup are not restored.

gpdbrestore -t 20121114064330 --noplan

This gpdbrestore command restores Greenplum Database data from the data managed by NetBackup master server nbu_server1. The option -t 20130530090000 specifies the timestamp generated by gpcrondump when the backup was created. The -e option specifies that the target database is dropped before it is restored.

gpdbrestore -t 20130530090000 -e --netbackup-service-host=nbu_server1

*****************************************************

SEE ALSO

*****************************************************

gpcrondump

[sachi@localhost ~]$