HANDLING BLOAT WITH VACUUM AND VACUUM FULL VACUUM

Post date: Dec 04, 2012 7:54:28 PM

PostgreSQL, Greenplum Database, and Oracle use a database implementation called MVCC (multi-version concurrency control) to avoid locking and improve throughput. MVCC causes the database to make private versions or copies of data at runtime, which become current after a commit. As a result, this leaves parts of database memory blocks marked as "dirty" and unavailable for reuse, forcing the database to allocate new memory blocks. This allocation is called “bloat”.

Vacuuming the database or table regularly prevents bloat, making the dirty blocks available for reuse and preventing allocation of new blocks.

Vacuuming should be part of regular maintenance to reclaim used transaction IDs (XIDs) or 32-bit signed integers. Unless you vacuum regularly, you may hit some built-in limits that prevent XID wraparound. You should avoid this because hitting these limits means the GPDB runtime prevents further activity.

To enable VACUUM, set two memory-related config parameters: max_fsm_relations and max_fsm_pages. These two parameters relate to Free Map Space (fsm). You can use the contents of the system catalog table pg_class to estimate the settings for max_fsm_*.

Note: Before calculating values for these parameters, Greenplum recommends you run ANALYZE. The relpages values in pg_class are estimates updated by VACUUM, ANALYZE, and other system operations.

When ANALYZE completes, calculate an upper bound on max_fsm_relations:

1. Run the following SQL command on all databases and sum the results:

SELECT COUNT(*) FROM pg_class;

2. Estimate max_fsm_pages:

SELECT MAX(relpages) FROM pg_class;

Or

SELECT SUM(relpages) FROM pg_class;

Note: If the values returned for max_fsm_relations or max_fsm_pages are less than the calculated values, VACUUM will leave the space untouched. In such cases, you may see messages in the pg_log files saying that VACUUM failed and that you need to increase the value of max_fsm_relation or max_fsm_paqes.

VACUUM FULL

In addition to simple VACUUM, which marks dirty space in blocks available for reuse, you can use VACUUM FULL to reorganize and compact block storage for a table. VACUUM FULL functions similarly to a file system de-fragmentation, rewriting data to new blocks. However, VACUUM FULL takes much longer and establishes exclusive full table locks for the duration of the process.

A VACUUM FULL will reclaim all expired row space, but is a very expensive operation and may take an unacceptably long time to finish on large, distributed Greenplum Database tables. If you do get into a situation where the free space map has overflowed, it may be more timely to recreate the table with a CREATE TABLE AS statement and drop the old table. A VACUUM FULL is not recommended in Greenplum Database.

Option 1.

ALTER TABLE sales SET WITH (REORGANIZE=TRUE);

This leaves any indexes etc. in tact but rewrites the underlying table and, hence, eliminates all dead data.

Option 2

SET gp_auto_stats_mode = none;

CREATE TABLE xyz AS SELECT * FROM abc <ADD YOUR DISTRIBUTION AND STORAGE OPTIONS HERE>;

DROP TABLE abc;

ALTER TABLE xyz RENAME TO abc;

ANALYZE abc;

ALTER TABLE abc ADD PRIMARY KEY (<YOUR_PK_COLUMN_NAME>);

This only writes the data one time.

Option 3

Create temp table t1 as select * from <Table Name>;

Truncate <Table Name>;

set gp_autostats_mode = none;

Insert into <Table Name> select * from t1;

Drop table t1;

analyze <Table Name>;

This only writes the data 2 times.

SELECT current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes, iname, /*ituples::bigint, ipages::bigint, iotta,*/ ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema' LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml ORDER BY wastedbytes DESC

select current_database() as dbname , relid , schemaname , tablename , round(bloat,1) as bloat_ratio , pg_size_pretty(expbytes) as expected_size , pg_size_pretty(relbytes) as relation_size , pg_size_pretty(wastedbytes) as wasted_space , round(1.0 * relbytes / nullif((sum(relbytes) over ())::bigint,0),4) as relative_size , round((bloat-1) * (1.0 * relbytes / nullif((sum(relbytes) over ())::bigint, 0)), 8) as concern from ( SELECT relid , schemaname , tablename , CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END AS bloat , reltuples::bigint , relpages::bigint , otta , (bs*otta)::bigint as expbytes , CASE WHEN relpages < otta THEN 0 ELSE (bs*(sml.relpages-otta))::bigint END AS wastedbytes , CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages , (bs*relpages)::bigint as relbytes FROM ( SELECT schemaname , tablename , cc.oid as relid , cc.reltuples , cc.relpages , bs , CEIL((cc.reltuples*((datahdr+ma-(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta FROM ( SELECT ma , bs , schemaname , tablename , (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr , (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname , tablename , hdr , ma , bs , SUM((1-s.null_frac)*avg_width) AS datawidth , MAX(s.null_frac) AS maxfracsum , hdr+(1+(count(case when s.null_frac<>0 then 1 end))/8) as nullhdr FROM pg_stats s cross join ( SELECT current_setting('block_size')::numeric AS bs , CASE WHEN substring(version(),12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr , CASE WHEN version() ~ 'mingw32' THEN 8 ELSE 4 END AS ma ) AS constants GROUP BY schemaname, tablename, hdr, ma, bs ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename and cc.relkind = 'r' JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname ) AS sml ) wrapper where wastedbytes > 2*1024*1024 and bloat >= 1.4 order by (bloat-1) * (1.0 * relbytes / nullif((sum(relbytes) over ())::bigint, 0)) desc;

Table and Index Bloat view

CREATE OR REPLACE VIEW bloat AS

SELECT

schemaname, tablename, reltuples::bigint, relpages::bigint, otta,

ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,

relpages::bigint - otta AS wastedpages,

bs*(sml.relpages-otta)::bigint AS wastedbytes,

pg_size_pretty((bs*(relpages-otta))::bigint) AS wastedsize,

iname, ituples::bigint, ipages::bigint, iotta,

ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,

CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages,

CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes,

CASE WHEN ipages < iotta THEN pg_size_pretty(0) ELSE pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize

FROM (

SELECT

schemaname, tablename, cc.reltuples, cc.relpages, bs,

CEIL((cc.reltuples*((datahdr+ma-

(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta,

COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,

COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols

FROM (

SELECT

ma,bs,schemaname,tablename,

(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,

(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2

FROM (

SELECT

schemaname, tablename, hdr, ma, bs,

SUM((1-null_frac)*avg_width) AS datawidth,

MAX(null_frac) AS maxfracsum,

hdr+(

SELECT 1+count(*)/8

FROM pg_stats s2

WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename

) AS nullhdr

FROM pg_stats s, (

SELECT

(SELECT current_setting('block_size')::numeric) AS bs,

CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,

CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma

FROM (SELECT version() AS v) AS foo

) AS constants

GROUP BY 1,2,3,4,5

) AS foo

) AS rs

JOIN pg_class cc ON cc.relname = rs.tablename

JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname

LEFT JOIN pg_index i ON indrelid = cc.oid

LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid

) AS sml

WHERE sml.relpages - otta > 0 OR ipages - iotta > 10

ORDER BY wastedbytes DESC, wastedibytes DESC;