Checking for Tables that Need Routine Maintenance

Post date: Jan 20, 2014 4:4:44 PM

The following views can help identify tables that need routine table maintenance (VACUUM and/or ANALYZE).

1. gp_bloat_diag

2. gp_stats_missing

The VACUUM or VACUUM FULL command reclaims disk space occupied by deleted or obsolete rows. Because of the MVCC transaction concurrency model used in Greenplum Database, data rows that are deleted or updated still occupy physical space on disk even though they are not visible to any new transactions. Expired rows increase table size on disk and eventually slow down scans of the table.

The ANALYZE command collects column-level statistics needed by the query planner. Greenplum Database uses a cost-based query planner that relies on database statistics. Accurate statistics allow the query planner to better estimate selectivity and the number of rows retrieved by a query operation in order to choose the most efficient query plan.

gp_bloat_diag view

This view shows tables that have bloat (the actual number of pages on disk exceeds the expected number of pages given the table statistics). Tables that are bloated require a VACUUM or a VACUUM FULL in order to reclaim disk space occupied by deleted or obsolete rows. This view is accessible to all users, however non-superusers will only be able to see the tables that they have permission to access.

gp_bloat_diag view has the following column

bdirelid=> Table object id.

bdinspname=> Schema name.

bdirelname=> Table name.

bdirelpages=> Actual number of pages on disk.

bdiexppages=> Expected number of pages given the table data.

bdidiag=> Bloat diagnostic message.

select current_database() as dbname , relid , schemaname , tablename , round(bloat,1) as bloat_ratio , pg_size_pretty(expbytes) as expected_size , pg_size_pretty(relbytes) as relation_size , pg_size_pretty(wastedbytes) as wasted_space , round(1.0 * relbytes / nullif((sum(relbytes) over ())::bigint,0),4) as relative_size , round((bloat-1) * (1.0 * relbytes / nullif((sum(relbytes) over ())::bigint, 0)), 8) as concern from ( SELECT relid , schemaname , tablename , CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END AS bloat , reltuples::bigint , relpages::bigint , otta , (bs*otta)::bigint as expbytes , CASE WHEN relpages < otta THEN 0 ELSE (bs*(sml.relpages-otta))::bigint END AS wastedbytes , CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages , (bs*relpages)::bigint as relbytes FROM ( SELECT schemaname , tablename , cc.oid as relid , cc.reltuples , cc.relpages , bs , CEIL((cc.reltuples*((datahdr+ma-(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta FROM ( SELECT ma , bs , schemaname , tablename , (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr , (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname , tablename , hdr , ma , bs , SUM((1-s.null_frac)*avg_width) AS datawidth , MAX(s.null_frac) AS maxfracsum , hdr+(1+(count(case when s.null_frac<>0 then 1 end))/8) as nullhdr FROM pg_stats s cross join ( SELECT current_setting('block_size')::numeric AS bs , CASE WHEN substring(version(),12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr , CASE WHEN version() ~ 'mingw32' THEN 8 ELSE 4 END AS ma ) AS constants GROUP BY schemaname, tablename, hdr, ma, bs ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename and cc.relkind = 'r' JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname ) AS sml ) wrapper where wastedbytes > 2*1024*1024 and bloat >= 1.4 order by (bloat-1) * (1.0 * relbytes / nullif((sum(relbytes) over ())::bigint, 0)) desc;

gp_stats_missing view

This view shows tables that do not have statistics and therefore may require an ANALYZE be run on the table. gp_stats_missing view has the following column

smischema=> Schema name.

smitable=> Table name.

smisize=> Does this table have statistics? False if the table does not have row count and row sizing statistics recorded in the system catalog, which may indicate that the table needs to be analyzed. This will also be false if the table does not contain any rows. For example, the parent tables of partitioned tables are always empty and will always return a false result. 

smicols=> Number of columns in the table.

smirecs=> Number of rows in the table.