Vacuuming the Greenplum Database

Vacuuming the Database

Because of the MVCC transaction concurrency model, data rows that are deleted or updated still occupy physical space on disk even though they are not visible to any  new transactions. If you have a database with lots of updates and deletes, you will  generate a lot of expired rows. Periodically running the VACUUM command will remove these expired rows. For example:

VACUUM mytable;

The VACUUM command also collects table-level statistics such as number of rows and pages, so it is necessary to vacuum all tables after loading data, including append-only 

tables. 

Configuring the Free Space Map

Expired rows are held in what is called the free space map. The free space map must be sized large enough to cover the expired rows of all tables in your database. If not  sized large enough, space occupied by expired rows that overflow the free space map  cannot be reclaimed by a regular VACUUM command.

A VACUUM FULL will reclaim all expired row space, but is a very expensive operation and may take an unacceptably long time to finish on large, distributed Greenplum Database tables. If you do get into a situation where the free space map has overflowed, it may be more timely to recreate the table with a CREATE TABLE AS statement and drop the old table. A VACUUM FULL is not recommended in Greenplum  Database.

It is best to size the free space map appropriately. The free space map is configured with the following server configuration parameters:

max_fsm_pages

max_fsm_relations