Running a Parallel Backup in Greenplum
Post date: Oct 30, 2014 7:58:6 PM
Parallel backups are issued with the gp_dump command. When this command is executed:
Each active segment is dumped in parallel.
A single dump file is created in the data directory for each of these segments. The dump file contains the data for an individual segment instance.
The backup files for this single gp_dump process is identified by a unique 14-digit timestamp key.
The master dumps the configuration of the database, or the system catalog, as well as DDL statements for the database and schemas. Data can be compressed to save space. This is accomplished with the --gp-c option to the gp_dump command. The dump files are created on the file system, so you must also ensure that there is sufficient disk space for the dump files both on the master and the segments.
Dump Files Created During Parallel Backup
Here is an overview of the files created by a gp_dump. By default, the dump files are created in the data directory of the instance that was dumped. On the master, dump files of the following are created:
System Catalog data
CREATE DATABASE statement
DDL to recreate schema and database objects, the following files are created on the master:
Dump File Description
gp_catalog_1_<dbid>_<timestamp> System Catalog tables
gp_cdatabase_1_<dbid>_<timestamp> CREATE DATABASE statement
gp_dump_1_<dbid>_<timestamp> Database schemas
gp_dump_status_1_<dbid>_<timestamp> Log file
Log On the segments, COPY statements and user data
Log
and the following dump files are created on the segments:
Dump File Description
gp_dump_0_<dbid>_<timestamp> Data for the segment
gp_dump_status_0_<dbid>_<timestamp> Log file