greenplum database new features - Incremental Analyze

Incremental Analyze is much awaited features of the Greenplum database. Analyzing large (partitioned) tables are the most time consuming operation during the regular database maintenance activity. Lets explore this feature a little further.
The gadget spec URL could not be found
The gadget spec URL could not be found

Greenplum database 4.3.5.0 introduced a new utility called analyzedb. As of 

[gpadmin@localhost gpsne-1]$ analyzedb --help
Usage: analyzedb [options] 

Analyze a database incrementally. 'Incremental' means if a table or partition has not been modified by DML or DDL commands since the last analyzedb run, it will be automatically skipped since its statistics must be up to date. Some restrictions apply:

1. The incremental semantics only applies to append-only tables or partitions. 

2. All heap tables are regarded as having stale stats every time analyzedb is run. This is because we use AO metadata to check for DML or DDL events, which is not available to heap tables. 
 
3. Catalog tables, views and external tables are automatically skipped.

Options:
  --version             show program's version number and exit
  -d <database name>    Database name. Required.
  -s <schema name>      Specify a schema to analyze. All tables in the schema
                        will be analyzed.
  -t <schema name>.<table name>
                        Analyze a single table. Table name needs to be
                        qualified with schema name.
  -i <column1>,<column2>,...
                        Columns to include to be analyzed, separated by comma.
                        All columns will be analyzed if not specified.
  -x <column1>,<column2>,...
                        Columns to exclude to be analyzed, separated by comma.
                        All columns will be analyzed if not specified.
  -f <config_file>, --file=<config_file>
                        Config file that includes a list of tables to be
                        analyzed. Table names must be qualified with schema
                        name. Optionally a list of columns (separated by
                        comma) can be specified using -i or -x.
  -l, --list            List the tables to be analyzed without actually
                        running analyze (dry run).
  -p <parallel level>   Parallel level, i.e. the number of tables to be
                        analyzed in parallel. Valid numbers are between 1 and
                        10. Default value is 5.
  --skip_root_stats     Skip refreshing root partition stats if any of the
                        leaf partitions is analyzed.
  --full                Analyze without using incremental. All tables
                        requested by the user will be analyzed.
  --clean_last          Clean the state files generated by last analyzedb run.
                        All other options except -d will be ignored.
  --clean_all           Clean all the state files generated by analyzedb. All
                        other options except -d will be ignored.
  -h, -?, --help        Show this help message and exit.
  -v, --verbose         Print debug messages.
  -a                    Quiet mode. Do not prompt for user confirmation.
[gpadmin@localhost gpsne-1]$ 

While performing the ANALYZE operations, analyzedb creates a snapshot of table metadata snapshot and stores it on disk on the master host. If you run analyzedb to ANALYZE an append-optimized table, the utility checks the metadata taken during a previous analyzedb operation to determine if the table has been modified. An ANALYZE operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, analyzedb automatically skips the table or partition because it already contains up-to-date statistics.

For a partitioned, append-optimized table, analyzedb checks the partitioned table root partition and leaf partitions. If needed, the utility updates statistics for partitions that have changed and, if necessary, the root partition.

To perform the ANALYZE operations, analyzedb creates concurrent sessions to analyze tables in parallel. For each session,analyzedb issues an ANALYZE command to the database and specifies different table names.

This example command specifies a file that contains a list of tables. The command collects statistics on the tables listed in the file analyze-tables in the database named mytest.
$ analyzedb -d sachi -f analyze-table_list


The gadget spec URL could not be found
Incremental Analyze (4.3.5.0)

The gadget spec URL could not be found

The gadget spec URL could not be found

Comments