Running a Parallel Backup in Greenplum

posted Oct 30, 2014, 12:58 PM by Sachchida Ojha   [ updated Nov 3, 2014, 7:21 PM ]
Parallel backups are issued with the gp_dump command. When this command is executed: 
The gadget spec URL could not be found
  1. Each active segment is dumped in parallel. 
  2. A single dump file is created in the data directory for each of these segments. The dump file contains the data for an individual segment instance. 
  3. The backup files for this single gp_dump process is identified by a unique 14-digit timestamp key. 
  4. The master dumps the configuration of the database, or the system catalog, as well as DDL statements for the database and schemas. Data can be compressed to save space. This is accomplished with the --gp-c option to the gp_dump command. The dump files are created on the file system, so you must also ensure that there is sufficient disk space for the dump files both on the master and the segments.
Dump Files Created During Parallel Backup
The gadget spec URL could not be found
Here is an overview of the files created by a gp_dump. By default, the dump files are created in the data directory of the instance that was dumped. On the master, dump files of the following are created: 
  1. System Catalog data 
  2. CREATE DATABASE statement 
  3. DDL to recreate schema and database objects, the following files are created on the master:
        Dump File Description
  • gp_catalog_1_<dbid>_<timestamp> System Catalog tables
  • gp_cdatabase_1_<dbid>_<timestamp> CREATE DATABASE statement
  • gp_dump_1_<dbid>_<timestamp> Database schemas
  • gp_dump_status_1_<dbid>_<timestamp> Log file
  1.  Log On the segments, COPY statements and user data 
  2. Log 
  3. and the following dump files are created on the segments:
Dump File Description
  • gp_dump_0_<dbid>_<timestamp> Data for the segment
  • gp_dump_status_0_<dbid>_<timestamp> Log file