ANALYZE

Post date: Aug 25, 2014 7:20:9 PM

Good statistics is likely the most important factor affecting query performance and optimization. 

Updated statistics are critical for the query planner to generate optimal query plans. When a table is analyzed, table information about the data is stored into the system catalog tables. Always run ANALYZE after loading data, after CREATE INDEX operations, and after INSERT, UPDATE and DELETE operations that significantly changes the underlying data. 

ANALYZE requires only a read lock on the table, so it may be run in parallel with other database activity but do not run ANALYZE when performing loads, INSERT, UPDATE, DELETE and CREATE INDEX operations. 

For large tables, ANALYZE takes a random sample of the table contents, rather than examining every row to allow large tables to be analyzed in a reasonable amount of time. To increase sampling for all table columns adjust the default_statistics_target GUC. The default target value is 25. Keep in mind that default_statistics_target applies to all columns by default. A larger target value will increase the time needed to do the ANALYZE, but may improve the quality of the query planner’s estimates. Especially for columns with irregular data patterns, a larger target value may allow for more accurate query plans.

Important: If you intend to execute queries on partitioned tables with the Pivotal Query Optimizer enabled, you must collect statistics on the root partition of the partitioned table with the ANALYZE ROOTPARTITION command. For information about the Pivotal Query Optimizer, see "Querying Data" in the Greenplum Database Administrator Guide.

The gp_analyze_relative_error GUC affects the sampling rate during statistics collection to determine cardinality in a column. For example, a value of .5 is equivalent to an acceptable error of 50%. The default is .25. Use gp_analyze_relative_error to set the estimated acceptable relative error in the cardinality of a table. If statistics do not produce good estimates of cardinality for a particular table attribute, decreasing the relative error fraction (accepting less errors) tells the system to sample more rows. However, it is not recommended to reduce this below 0.1 as it will increase analyze time substantially.

It is a good idea to run ANALYZE periodically, or just after making major changes in the contents of a table. Accurate statistics helps Greenplum Database choose the most appropriate query plan, and thereby improve the speed of query processing. 

A common strategy is to run VACUUM and ANALYZE once a day during a low-usage time of day.

ANALYZE requires SHARE UPDATE EXCLUSIVE lock on the target table. This lock conflicts with these locks: SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE,ACCESS EXCLUSIVE.

Note: You can also use the Greenplum Database utility analyzedb to update table statistics. The analyzedb utility can update statistics for multiple tables concurrently. The utility can also check table statistics and update statistics only if the statistics are not current or do not exist. For information about the utility, see the Greenplum Database Utility Guide.