Collecting Table/Column level Statistics in Greenplum: sachi=# vacuum verbose sachi3 ; INFO: vacuuming "public.sachi3" (seg0 sachi:40000 pid=19169) INFO: vacuuming "public.sachi3" (seg1 sachi:40001 pid=19171) INFO: "sachi3": found 0 removable, 0 nonremovable row versions in 0 pages (seg0 sachi:40000 pid=19169) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "sachi3": found 0 removable, 0 nonremovable row versions in 0 pages (seg1 sachi:40001 pid=19171) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_52386" (seg0 sachi:40000 pid=19169) INFO: vacuuming "pg_toast.pg_toast_52386" (seg1 sachi:40001 pid=19171) INFO: index "pg_toast_52386_index" now contains 0 row versions in 1 pages (seg0 sachi:40000 pid=19169) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_toast_52386_index" now contains 0 row versions in 1 pages (seg1 sachi:40001 pid=19171) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_52386": found 0 removable, 0 nonremovable row versions in 0 pages (seg0 sachi:40000 pid=19169) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_52386": found 0 removable, 0 nonremovable row versions in 0 pages (seg1 sachi:40001 pid=19171) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM sachi=# sachi=# vacuum verbose analyze sachi3 ; INFO: vacuuming "public.sachi3" (seg0 sachi:40000 pid=19240) INFO: vacuuming "public.sachi3" (seg1 sachi:40001 pid=19242) INFO: "sachi3": found 0 removable, 0 nonremovable row versions in 0 pages (seg1 sachi:40001 pid=19242) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "sachi3": found 0 removable, 0 nonremovable row versions in 0 pages (seg0 sachi:40000 pid=19240) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_52386" (seg0 sachi:40000 pid=19240) INFO: vacuuming "pg_toast.pg_toast_52386" (seg1 sachi:40001 pid=19242) INFO: index "pg_toast_52386_index" now contains 0 row versions in 1 pages (seg0 sachi:40000 pid=19240) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_52386": found 0 removable, 0 nonremovable row versions in 0 pages (seg0 sachi:40000 pid=19240) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_toast_52386_index" now contains 0 row versions in 1 pages (seg1 sachi:40001 pid=19242) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_52386": found 0 removable, 0 nonremovable row versions in 0 pages (seg1 sachi:40001 pid=19242) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: Executing SQL: select sum(gp_statistics_estimate_reltuples_relpages_oid(c.oid))::float4[] from gp_dist_random('pg_class') c where c.oid=52386 INFO: ANALYZE estimated reltuples=0.000000, relpages=0.000000 for table sachi3 INFO: ANALYZE skipping computing statistics on table sachi3 because it has no rows. VACUUM sachi=# sachi=# vacuum verbose foo ; INFO: vacuuming "public.foo" (seg0 sachi:40000 pid=19750) INFO: "foo": found 0 removable, 4000 nonremovable row versions in 5 pages (seg0 sachi:40000 pid=19750) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 1 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "public.foo" (seg1 sachi:40001 pid=19752) INFO: "foo": found 0 removable, 4000 nonremovable row versions in 5 pages (seg1 sachi:40001 pid=19752) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 1 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM sachi=# sachi=# explain select * from foo; QUERY PLAN -------------------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..90.00 rows=4000 width=8) -> Seq Scan on foo (cost=0.00..90.00 rows=4000 width=8) (2 rows) sachi=# explain analyze select * from foo; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..90.00 rows=4000 width=8) Rows out: 8000 rows at destination with 1.376 ms to first row, 4.920 ms to end, start offset by 9.103 ms. -> Seq Scan on foo (cost=0.00..90.00 rows=4000 width=8) Rows out: Avg 4000.0 rows x 2 workers. Max 4000 rows (seg0) with 0.110 ms to first row, 0.697 ms to end, start offset by 10 ms. Slice statistics: (slice0) Executor memory: 247K bytes. (slice1) Executor memory: 181K bytes avg x 2 workers, 181K bytes max (seg0). Statement statistics: Memory used: 128000K bytes Total runtime: 14.878 ms (10 rows) sachi=# select relname,relpages,reltuples from pg_class where relname='foo'; relname | relpages | reltuples ---------+----------+----------- foo | 10 | 8000 (1 row) sachi=# select count(*) from foo; count ------- 8000 (1 row) So the estimated cost here is (disk pages read * seq_page_cost ) + (tuples scanned * cpu_tuple_cost ) Where , default seq_page_cost = 1.0 cpu_tuple_cost = 0.01 so the estimated cost = ( 10 * 1.0 ) + ( 8000 * 0.01 ) = 10+80=90 So this helps you understand why statistics are really imported for the planner to make in correct decision. sachi=# vacuum verbose analyze foo ; INFO: vacuuming "public.foo" (seg0 sachi:40000 pid=19349) INFO: vacuuming "public.foo" (seg1 sachi:40001 pid=19351) INFO: "foo": found 0 removable, 4000 nonremovable row versions in 5 pages (seg0 sachi:40000 pid=19349) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 1 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "foo": found 0 removable, 4000 nonremovable row versions in 5 pages (seg1 sachi:40001 pid=19351) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 1 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: Executing SQL: select sum(gp_statistics_estimate_reltuples_relpages_oid(c.oid))::float4[] from gp_dist_random('pg_class') c where c.oid=33435 INFO: ANALYZE estimated reltuples=8000.000000, relpages=10.000000 for table foo INFO: ANALYZE computing statistics on attribute x INFO: Executing SQL: select count(*)::float4 from only public.foo as Ta where Ta.x is null INFO: nullfrac = 0.00000000000000000000000000000000000000 INFO: avgwidth = 8.000000 INFO: Executing SQL: select count(*)::float4 from (select Ta.x from only public.foo as Ta group by Ta.x) as Tb INFO: count(ndistinct()) gives 1.000000 values. INFO: Executing SQL: select count(v)::float4 from (select Ta.x as v, count(Ta.x) as f from only public.foo as Ta group by Ta.x) as foo where f > 1 INFO: ndistinct = 1.000000 INFO: Executing SQL: select Ta.x as v, count(Ta.x)::float4/8000.000000::float4 as f from only public.foo as Ta where Ta.x is not null group by (Ta.x) order by f desc limit 25 INFO: mcv={"2013-08-22 00:00:00"} INFO: freq={1} INFO: Executing SQL: select v from (select Ta.x as v, row_number() over (order by Ta.x) as r from only public.foo as Ta where x is not null union select max(Tb.x) as v, 1 as r from only public.foo as Tb where x is not null) as foo where r % 320 = 1 group by v order by v INFO: hist={"2013-08-22 00:00:00"} VACUUM sachi=# Finding missing statistics of a table To monitor the missing stats of a table, we use gp_toolkit.gp_stats_missing view. Lets explain it in more details. To check the table that do not have statistics and therefore may require an ANALYZE be run on the table. It looks simple, but actually that is not the case. To explain have a look at the below example. sachi=# \d gp_toolkit.gp_stats_missing View "gp_toolkit.gp_stats_missing" Column | Type | Modifiers -----------+---------+----------- smischema | name | smitable | name | smisize | boolean | smicols | bigint | smirecs | bigint | View definition: SELECT aut.autnspname AS smischema, aut.autrelname AS smitable, CASE WHEN aut.autrelpages = 0 OR aut.autreltuples = 0::double precision THEN false ELSE true END AS smisize, attrs.attcnt AS smicols, COALESCE(bar.stacnt, 0::bigint) AS smirecs FROM gp_toolkit.__gp_user_tables aut JOIN ( SELECT pg_attribute.attrelid, count(*) AS attcnt FROM pg_attribute WHERE pg_attribute.attnum > 0 GROUP BY pg_attribute.attrelid) attrs ON aut.autoid = attrs.attrelid LEFT JOIN ( SELECT pg_statistic.starelid, count(*) AS stacnt FROM pg_statistic GROUP BY pg_statistic.starelid) bar ON aut.autoid = bar.starelid WHERE (aut.autrelkind = 'r'::"char" AND (aut.autrelpages = 0 OR aut.autreltuples = 0::double precision)) OR (bar.stacnt IS NOT NULL AND attrs.attcnt > bar.stacnt); sachi=# sachi=# create table sachi (id numeric, name varchar(32)); NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table. HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew. CREATE TABLE sachi=# select * from pg_stat_last_operation where objid = 'sachi' ::regclass order by statime; classid | objid | staactionname | stasysid | stausename | stasubtype | statime ---------+-------+---------------+----------+------------+------------+------------------------------- 1259 | 52411 | CREATE | 10 | gpadmin | TABLE | 2014-09-25 12:29:20.730637-04 (1 row) sachi=# select * from gp_toolkit.gp_stats_missing ; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 public | sachi | f | 2 | 0 (4 rows) sachi=# vacuum verbose sachi; INFO: vacuuming "public.sachi" (seg0 sachi:40000 pid=20978) INFO: vacuuming "public.sachi" (seg1 sachi:40001 pid=20980) INFO: "sachi": found 0 removable, 0 nonremovable row versions in 0 pages (seg1 sachi:40001 pid=20980) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "sachi": found 0 removable, 0 nonremovable row versions in 0 pages (seg0 sachi:40000 pid=20978) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_52411" (seg1 sachi:40001 pid=20980) INFO: vacuuming "pg_toast.pg_toast_52411" (seg0 sachi:40000 pid=20978) INFO: index "pg_toast_52411_index" now contains 0 row versions in 1 pages (seg1 sachi:40001 pid=20980) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_52411": found 0 removable, 0 nonremovable row versions in 0 pages (seg1 sachi:40001 pid=20980) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "pg_toast_52411_index" now contains 0 row versions in 1 pages (seg0 sachi:40000 pid=20978) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_52411": found 0 removable, 0 nonremovable row versions in 0 pages (seg0 sachi:40000 pid=20978) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM sachi=# sachi=# select * from gp_toolkit.gp_stats_missing ; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 public | sachi | f | 2 | 0 (4 rows) sachi=# sachi=# vacuum verbose analyze sachi; INFO: vacuuming "public.sachi" (seg0 sachi:40000 pid=21017) INFO: "sachi": found 0 removable, 0 nonremovable row versions in 0 pages (seg0 sachi:40000 pid=21017) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_52411" (seg0 sachi:40000 pid=21017) INFO: index "pg_toast_52411_index" now contains 0 row versions in 1 pages (seg0 sachi:40000 pid=21017) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_52411": found 0 removable, 0 nonremovable row versions in 0 pages (seg0 sachi:40000 pid=21017) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "public.sachi" (seg1 sachi:40001 pid=21019) INFO: "sachi": found 0 removable, 0 nonremovable row versions in 0 pages (seg1 sachi:40001 pid=21019) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming "pg_toast.pg_toast_52411" (seg1 sachi:40001 pid=21019) INFO: index "pg_toast_52411_index" now contains 0 row versions in 1 pages (seg1 sachi:40001 pid=21019) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "pg_toast_52411": found 0 removable, 0 nonremovable row versions in 0 pages (seg1 sachi:40001 pid=21019) DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages contain useful free space. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: Executing SQL: select sum(gp_statistics_estimate_reltuples_relpages_oid(c.oid))::float4[] from gp_dist_random('pg_class') c where c.oid=52411 INFO: ANALYZE estimated reltuples=0.000000, relpages=0.000000 for table sachi INFO: ANALYZE skipping computing statistics on table sachi because it has no rows. VACUUM sachi=# sachi=# select * from gp_toolkit.gp_stats_missing ; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 public | sachi | f | 2 | 0 (4 rows) sachi=# select oid,relname,reltuples,relpages from pg_class where relname='sachi'; oid | relname | reltuples | relpages -------+---------+-----------+---------- 52411 | sachi | 0 | 1 (1 row) ============================================================================================ ============================================================================================ sachi=# show gp_autostats_mode; gp_autostats_mode ------------------- ON_NO_STATS (1 row) sachi=# show gp_autostats_on_change_threshold; gp_autostats_on_change_threshold ---------------------------------- 2147483647 (1 row) Automatic Statistics Collection Parameters When automatic statistics collection is enabled, you can run ANALYZE automatically in the same transaction as an INSERT, UPDATE, DELETE, COPY or CREATE TABLE...AS SELECT statement when a certain threshold of rows is affected (on_change), or when a newly generated table has no statistics (on_no_stats). To enable this feature, set the following server configuration parameters in your Greenplum master postgresql.conf file and restart Greenplum Database: • gp_autostats_mode • log_autostats Warning: Depending on the specific nature of your database operations, automatic statistics collection can have a negative performance impact. Carefully evaluate whether the default setting of on_no_stats is appropriate for your system. This specifies when an automatic ANALYZE should take place. The values for this configuration parameter can be:
on_no_stats The default is on_no_stats and the database will automatically perform an ANALYZE for you. If you INSERT data into an empty table (new table or a table you just did a TRUNCATE on), the database will automatically gather statistics with an ANALYZE. Typically in an Analytics Data Warehouse, you will insert data once and then read it many times. So the default of on_no_stats will automatically give you statistics as needed with no additional work from you. none Self explanatory. You have to execute ANALYZE if you want statistics. on_change With this setting, if perform an INSERT, UPDATE, or DELETE that exceeds the gp_autostats_on_change_threshold value, then an automatic ANALYZE will happen. Note for HAWQ, you can only INSERT. In Greenplum and in the rare case where you are doing a DELETE or UPDATE to a table, you will need to execute an ANALYZE statement. Or, you can set the database to use on_change and the ANALYZE will happen automatically. gp_autostats_on_change_threshold This is only relevant for on_change and it is the number of rows that need to change before an automatic ANALYZE will happen. The default is over 2 billion rows so if you really want to use on_change, then you will also need to reduce this configuration parameter to something more realistic. Examples Example 1 – The default gp_autostats_mode = on_no_stats gp_autostats_on_change_threshold = 2147483647 ============================================================================================ ============================================================================================ sachi=# insert into sachi values(1, 'abc'); INSERT 0 1 sachi=# select oid,relname,reltuples,relpages from pg_class where relname='sachi'; oid | relname | reltuples | relpages -------+---------+-----------+---------- 52411 | sachi | 0 | 1 (1 row) sachi=# sachi=# select * from gp_toolkit.gp_stats_missing; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 public | sachi | f | 2 | 0 (4 rows) sachi=# select count(*) from sachi; count ------- 1 (1 row) sachi=# select oid,relname,reltuples,relpages from pg_class where relname='sachi'; oid | relname | reltuples | relpages -------+---------+-----------+---------- 52411 | sachi | 0 | 1 (1 row) So if we now run a analyze after the data insert sachi=# analyze sachi; ANALYZE sachi=# sachi=# select oid,relname,reltuples,relpages from pg_class where relname='sachi'; oid | relname | reltuples | relpages -------+---------+-----------+---------- 52411 | sachi | 1 | 1 (1 row) sachi=# select * from gp_toolkit.gp_stats_missing; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 (3 rows) Note: Now it did remove the data from gp_toolkit.gp_stats_missing Now lets do truncate. sachi=# truncate table sachi; TRUNCATE TABLE sachi=# select * from gp_toolkit.gp_stats_missing; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 public | sachi | f | 2 | 2 (4 rows) sachi=# Note: Now it came back. sachi=# insert into sachi values(1, 'abc'); INSERT 0 1 sachi=# select * from gp_toolkit.gp_stats_missing; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 (3 rows) sachi=# Note: Now it did remove the data from gp_toolkit.gp_stats_missing sachi=# alter table sachi add column address varchar(128); ALTER TABLE sachi=# sachi=# select * from gp_toolkit.gp_stats_missing; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 public | sachi | t | 3 | 2 (4 rows) sachi=# Note: it came back again. sachi=# analyze sachi; ANALYZE sachi=# select * from gp_toolkit.gp_stats_missing; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 (3 rows) Note: Now it is gone. sachi=# sachi=# delete from sachi; DELETE 1 sachi=# select count(*) from sachi; count ------- 0 (1 row) sachi=# select * from gp_toolkit.gp_stats_missing; smischema | smitable | smisize | smicols | smirecs -----------+----------+---------+---------+--------- public | usha | f | 1 | 0 public | usha1 | f | 1 | 0 public | sachi3 | f | 1 | 0 (3 rows) So
Checking the Statistics Queries you can use to see the statics information. SELECT c.oid, c.relpages, c.reltuples
FROM pg_class c
JOIN pg_namespace n ON c.relnamespace = n.oid
WHERE n.nspname = 'public'
AND c.relname = 'my_table';
SELECT * FROM pg_statistic WHERE starelid = 298610; |