Tuning Statistics Collection in Greenplum Database

Post date: Oct 11, 2013 1:11:59 PM

Here are the 2 configuration parameters which control the amount of data sampled for statistics collection:

1.default_statistics_target

2.gp_analyze_relative_error

These parameters control statistics sampling at the system level. It is probably better to only sample increased statistics for the columns used most frequently in query predicates. You can adjust statistics for a particular column using the ALTER TABLE...SET STATISTICS command. For example:

ALTER TABLE sales ALTER COLUMN region SET STATISTICS 50;

This is equivalent to increasing default_statistics_target for a particular column. Subsequent ANALYZE operations will then gather more statistics data for that column, and hopefully produce better query plans as a result.