HANDLING BLOAT WITH VACUUM AND VACUUM FULL VACUUM

Post date: Dec 04, 2012 7:54:28 PM

PostgreSQL, Greenplum Database, and Oracle use a database implementation called MVCC (multi-version concurrency control) to avoid locking and improve throughput. MVCC causes the database to make private versions or copies of data at runtime, which become current after a commit. As a result, this leaves parts of database memory blocks marked as "dirty" and unavailable for reuse, forcing the database to allocate new memory blocks. This allocation is called “bloat”. 

Vacuuming the database or table regularly prevents bloat, making the dirty blocks available for reuse and preventing allocation of new blocks.

Vacuuming should be part of regular maintenance to reclaim used transaction IDs (XIDs) or 32-bit signed integers. Unless you vacuum regularly, you may hit some built-in limits that prevent XID wraparound. You should avoid this because hitting these limits means the GPDB runtime prevents further activity.

To enable VACUUM, set two memory-related config parameters: max_fsm_relations and max_fsm_pages. These two parameters relate to Free Map Space (fsm). You can use the contents of the system catalog table pg_class to estimate the settings for max_fsm_*.

Note: Before calculating values for these parameters, Greenplum recommends you run ANALYZE. The relpages values in pg_class are estimates updated by VACUUM, ANALYZE, and other system operations.

When ANALYZE completes, calculate an upper bound on max_fsm_relations:

1. Run the following SQL command on all databases and sum the results:

SELECT COUNT(*) FROM pg_class;

2. Estimate max_fsm_pages:

SELECT MAX(relpages) FROM pg_class;

Or

SELECT SUM(relpages) FROM pg_class;

Note: If the values returned for max_fsm_relations or max_fsm_pages are less than the calculated values, VACUUM will leave the space untouched. In such cases, you may see messages in the pg_log files saying that VACUUM failed and that you need to increase the value of max_fsm_relation or max_fsm_paqes.

VACUUM FULL

In addition to simple VACUUM, which marks dirty space in blocks available for reuse, you can use VACUUM FULL to reorganize and compact block storage for a table. VACUUM FULL functions similarly to a file system de-fragmentation, rewriting data to new blocks. However, VACUUM FULL takes much longer and establishes exclusive full table locks for the duration of the process.

A VACUUM FULL will reclaim all expired row space, but is a very expensive operation and may take an unacceptably long time to finish on large, distributed Greenplum Database tables. If you do get into a situation where the free space map has overflowed, it may be more timely to recreate the table with a CREATE TABLE AS statement and drop the old table. A VACUUM FULL is not recommended in Greenplum Database.

Option 1.

ALTER TABLE sales SET WITH (REORGANIZE=TRUE);

This leaves any indexes etc. in tact but rewrites the underlying table and, hence, eliminates all dead data.

 

Option 2

SET gp_auto_stats_mode = none;

CREATE TABLE xyz AS SELECT * FROM abc <ADD YOUR DISTRIBUTION AND STORAGE OPTIONS HERE>;

DROP TABLE abc;

ALTER TABLE xyz RENAME TO abc;

ANALYZE abc;

ALTER TABLE abc ADD PRIMARY KEY (<YOUR_PK_COLUMN_NAME>);

This only writes the data one time.

Option 3

Create temp table t1 as select * from <Table Name>;

Truncate <Table Name>;

set gp_autostats_mode = none;

Insert into <Table Name> select * from t1;

Drop table t1;

analyze <Table Name>;

This only writes the data 2 times.

SELECT   current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/   ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,   CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,   iname, /*ituples::bigint, ipages::bigint, iotta,*/   ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,   CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes FROM (   SELECT     schemaname, tablename, cc.reltuples, cc.relpages, bs,     CEIL((cc.reltuples*((datahdr+ma-       (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta,     COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,     COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols   FROM (     SELECT       ma,bs,schemaname,tablename,       (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,       (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2     FROM (       SELECT         schemaname, tablename, hdr, ma, bs,         SUM((1-null_frac)*avg_width) AS datawidth,         MAX(null_frac) AS maxfracsum,         hdr+(           SELECT 1+count(*)/8           FROM pg_stats s2           WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename         ) AS nullhdr       FROM pg_stats s, (         SELECT           (SELECT current_setting('block_size')::numeric) AS bs,           CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,           CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma         FROM (SELECT version() AS v) AS foo       ) AS constants       GROUP BY 1,2,3,4,5     ) AS foo   ) AS rs   JOIN pg_class cc ON cc.relname = rs.tablename   JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'   LEFT JOIN pg_index i ON indrelid = cc.oid   LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml ORDER BY wastedbytes DESC

select       current_database() as dbname     , relid     , schemaname     , tablename     , round(bloat,1) as bloat_ratio     , pg_size_pretty(expbytes) as expected_size     , pg_size_pretty(relbytes) as relation_size     , pg_size_pretty(wastedbytes) as wasted_space     , round(1.0 * relbytes / nullif((sum(relbytes) over ())::bigint,0),4) as relative_size     , round((bloat-1) * (1.0 * relbytes / nullif((sum(relbytes) over ())::bigint, 0)), 8) as concern     from       (       SELECT         relid       , schemaname       , tablename       , CASE WHEN otta=0 THEN            0.0          ELSE              sml.relpages/otta::numeric          END AS bloat       , reltuples::bigint       , relpages::bigint       , otta       , (bs*otta)::bigint as expbytes       , CASE WHEN relpages < otta THEN 0 ELSE              (bs*(sml.relpages-otta))::bigint          END AS wastedbytes       , CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages       , (bs*relpages)::bigint as relbytes       FROM         (         SELECT           schemaname         , tablename , cc.oid as relid         , cc.reltuples         , cc.relpages         , bs         , CEIL((cc.reltuples*((datahdr+ma-(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta          FROM            (           SELECT             ma           , bs           , schemaname           , tablename           , (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr           , (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2           FROM             (             SELECT               schemaname             , tablename             , hdr             , ma             , bs             , SUM((1-s.null_frac)*avg_width) AS datawidth             , MAX(s.null_frac) AS maxfracsum             , hdr+(1+(count(case when s.null_frac<>0 then 1 end))/8) as nullhdr             FROM                pg_stats s       cross join                (               SELECT                 current_setting('block_size')::numeric AS bs               , CASE WHEN substring(version(),12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr               , CASE WHEN version() ~ 'mingw32' THEN 8 ELSE 4 END AS ma             ) AS constants             GROUP BY schemaname, tablename, hdr, ma, bs           ) AS foo         ) AS rs         JOIN pg_class cc            ON             cc.relname = rs.tablename              and cc.relkind = 'r'         JOIN pg_namespace nn            ON             cc.relnamespace = nn.oid              AND nn.nspname = rs.schemaname        ) AS sml ) wrapper where wastedbytes > 2*1024*1024 and bloat >= 1.4 order by (bloat-1) * (1.0 * relbytes / nullif((sum(relbytes) over ())::bigint, 0)) desc;

Table and Index Bloat view

CREATE OR REPLACE VIEW bloat AS

      SELECT

        schemaname, tablename, reltuples::bigint, relpages::bigint, otta,

        ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat,

        relpages::bigint - otta AS wastedpages,

        bs*(sml.relpages-otta)::bigint AS wastedbytes,

        pg_size_pretty((bs*(relpages-otta))::bigint) AS wastedsize,

        iname, ituples::bigint, ipages::bigint, iotta,

        ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat,

        CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages,

        CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes,

        CASE WHEN ipages < iotta THEN pg_size_pretty(0) ELSE pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize

      FROM (

        SELECT

          schemaname, tablename, cc.reltuples, cc.relpages, bs,

          CEIL((cc.reltuples*((datahdr+ma-

            (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta,

          COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,

          COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols

        FROM (

          SELECT

            ma,bs,schemaname,tablename,

            (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,

            (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2

          FROM (

            SELECT

              schemaname, tablename, hdr, ma, bs,

              SUM((1-null_frac)*avg_width) AS datawidth,

              MAX(null_frac) AS maxfracsum,

              hdr+(

                SELECT 1+count(*)/8

                FROM pg_stats s2

                WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename

              ) AS nullhdr

            FROM pg_stats s, (

              SELECT

                (SELECT current_setting('block_size')::numeric) AS bs,

                CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,

                CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma

              FROM (SELECT version() AS v) AS foo

            ) AS constants

            GROUP BY 1,2,3,4,5

          ) AS foo

        ) AS rs

        JOIN pg_class cc ON cc.relname = rs.tablename

        JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname

        LEFT JOIN pg_index i ON indrelid = cc.oid

        LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid

      ) AS sml

      WHERE sml.relpages - otta > 0 OR ipages - iotta > 10

      ORDER BY wastedbytes DESC, wastedibytes DESC;