Greenplum DBA FAQ
Understanding Greenplum Bloat on heap, AO and AO columnar tables by examples and how to remove it.
Step 1: Create heap, ao and ao columnar (column oriented) table drop table IF EXISTS sachi.vac_test_heap; drop table IF EXISTS sachi.vac_test_ao; drop table IF EXISTS sachi.vac_test_aoco; create table sachi.vac_test_heap( a int, b text) distributed by (a); create table sachi.vac_test_ao( a int, b text ) with (appendonly=true) distributed by (a); create table sachi.vac_test_aoco( a int, b text ) with (appendonly=true, orientation=column) distributed by (a);
DROP TABLE DROP TABLE DROP TABLE CREATE TABLE CREATE TABLE CREATE TABLE
select c.oid, c.relname, n.nspname, c.relkind, c.reltuples, c.relstorage from pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid where n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | relname | nspname | relkind | reltuples | relstorage ----------+---------------+---------+---------+-----------+------------ 19807184 | vac_test_heap | sachi | r | 0 | h 19807190 | vac_test_ao | sachi | r | 0 | a 19807200 | vac_test_aoco | sachi | r | 0 | c (3 rows)
SELECT n.nspname AS schema_name, c.relname AS table_name, c.reltuples AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows FROM pg_appendonly a JOIN pg_class c ON c.oid = a.relid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE c.relkind = 'r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); schema_name | table_name | num_rows | ao_num_rows -------------+------------+----------+------------- (0 rows)
insert into vac_test_heap values (generate_series(1,10000000),'this is a test'); insert into vac_test_ao values (generate_series(1,10000000),'this is a test'); insert into vac_test_aoco values (generate_series(1,10000000),'this is a test'); INSERT 0 10000000 INSERT 0 10000000 INSERT 0 10000000
select c.oid,c.relname,n.nspname,c.relkind,c.relpages, c.reltuples,c.relstorage from pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid where n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | relname | nspname | relkind | relpages | reltuples | relstorage ----------+---------------+---------+---------+----------+-----------+------------ 19807184 | vac_test_heap | sachi | r | 0 | 0 | h 19807190 | vac_test_ao | sachi | r | 0 | 0 | a 19807200 | vac_test_aoco | sachi | r | 0 | 0 | c (3 rows) SELECT n.nspname AS schema_name, c.relname AS table_name, c.reltuples AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows FROM pg_appendonly a JOIN pg_class c ON c.oid = a.relid JOIN pg_namespace n ON c.relnamespace = n.oid WHERE c.relkind = 'r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); schema_name | table_name | num_rows | ao_num_rows -------------+---------------+----------+------------- sachi | vac_test_ao | 0 | 10000000 sachi | vac_test_aoco | 0 | 10000000 (2 rows)
SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows, (SELECT max(percent_hidden) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as perhid, (SELECT sum(total_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as tottup, (SELECT max(hidden_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as hidtup FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | schema_name | table_name | num_rows | ao_num_rows | perhid | tottup | hidtup ----------+-------------+---------------+----------+-------------+--------+---------- +-------- 19807190 | sachi | vac_test_ao | 0 | 10000000 | 0.00 | 10000000 | 0 19807200 | sachi | vac_test_aoco | 0 | 10000000 | 0.00 | 10000000 | 0 (2 rows)
-- Now run analyze on each table.
analyze sachi.vac_test_heap; analyze sachi.vac_test_ao; analyze sachi.vac_test_aoco;
ANALYZE ANALYZE ANALYZE
select c.oid,c.relname,n.nspname,c.relkind,c.relpages, c.reltuples::bigint,c.relstorage from pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid where n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | relname | nspname | relkind | relpages | reltuples | relstorage ----------+---------------+---------+---------+----------+-----------+------------ 19807184 | vac_test_heap | sachi | r | 16000 | 10011500 | h 19807190 | vac_test_ao | sachi | r | 10496 | 10000000 | a 19807200 | vac_test_aoco | sachi | r | 5888 | 10000000 | c (3 rows)
select count(*) from sachi.vac_test_heap; count ---------- 10000000 (1 row)
SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows, (SELECT max(percent_hidden) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as perhid, (SELECT sum(total_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as tottup, (SELECT max(hidden_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as hidtup FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | schema_name | table_name | num_rows | ao_num_rows | perhid | tottup | hidtup ----------+-------------+---------------+----------+-------------+--------+---------- +-------- 19807190 | sachi | vac_test_ao | 10000000 | 10000000 | 0.00 | 10000000 | 0 19807200 | sachi | vac_test_aoco | 10000000 | 10000000 | 0.00 | 10000000 | 0 (2 rows) insert into vac_test_heap values (generate_series(1,500000),'this is a new 5 percent rows test'); insert into vac_test_ao values (generate_series(1,500000),'this is a new 5 percent rows test'); insert into vac_test_aoco values (generate_series(1,500000),'this is a new 5 percent rows test'); INSERT 0 500000 INSERT 0 500000 INSERT 0 500000
select c.oid,c.relname,n.nspname,c.relkind,c.relpages, c.reltuples::bigint,c.relstorage from pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid where n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | relname | nspname | relkind | relpages | reltuples | relstorage ----------+---------------+---------+---------+----------+-----------+------------ 19807184 | vac_test_heap | sachi | r | 16000 | 10011500 | h 19807190 | vac_test_ao | sachi | r | 10496 | 10000000 | a 19807200 | vac_test_aoco | sachi | r | 5888 | 10000000 | c (3 rows)
SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows, (SELECT max(percent_hidden) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as perhid, (SELECT sum(total_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as tottup, (SELECT max(hidden_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as hidtup FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | schema_name | table_name | num_rows | ao_num_rows | perhid | tottup | hidtup ----------+-------------+---------------+----------+-------------+--------+---------- +-------- 19807190 | sachi | vac_test_ao | 10000000 | 10500000 | 0.00 | 10500000 | 0 19807200 | sachi | vac_test_aoco | 10000000 | 10500000 | 0.00 | 10500000 | 0 (2 rows)
select oid, schema_name, table_name, num_rows, ao_num_rows, (sum(ao_num_rows)/num_rows) as pct from ( SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' AND c.reltuples > 0 and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco') --order by 4 desc limit 50000 ) as sub GROUP BY 1,2,3,4,5 HAVING sum(ao_num_rows) > num_rows * (1::numeric + (5::numeric/100::numeric)) order by 4 desc; oid | schema_name | table_name | num_rows | ao_num_rows | pct -----+-------------+------------+----------+-------------+----- (0 rows) insert into vac_test_heap values (generate_series(1,5000),'this is a new 1 percent rows test'); insert into vac_test_ao values (generate_series(1,5000),'this is a new 1 percent rows testt'); insert into vac_test_aoco values (generate_series(1,5000),'this is a new 1 percent rows test'); INSERT 0 5000 INSERT 0 5000 INSERT 0 5000 select oid, schema_name, table_name, num_rows, ao_num_rows, (sum(ao_num_rows)/num_rows) as pct from ( SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' AND c.reltuples > 0 and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco') --order by 4 desc limit 50000 ) as sub GROUP BY 1,2,3,4,5 HAVING sum(ao_num_rows) > num_rows * (1::numeric + (5::numeric/100::numeric)) order by 4 desc; oid | schema_name | table_name | num_rows | ao_num_rows | pct ----------+-------------+---------------+----------+-------------+------------------- - 19807190 | sachi | vac_test_ao | 10000000 | 10505000 | 1.0505000000000000 19807200 | sachi | vac_test_aoco | 10000000 | 10505000 | 1.0505000000000000 (2 rows) SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows, (SELECT max(percent_hidden) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as perhid, (SELECT sum(total_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as tottup, (SELECT max(hidden_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as hidtup FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | schema_name | table_name | num_rows | ao_num_rows | perhid | tottup | hidtup ----------+-------------+---------------+----------+-------------+--------+---------- +-------- 19807190 | sachi | vac_test_ao | 10000000 | 10505000 | 0.00 | 10505000 | 0 19807200 | sachi | vac_test_aoco | 10000000 | 10505000 | 0.00 | 10505000 | 0 (2 rows)
insert into vac_test_heap values (generate_series(1,10000000),'this is a test'); insert into vac_test_ao values (generate_series(1,10000000),'this is a test'); insert into vac_test_aoco values (generate_series(1,10000000),'this is a test'); INSERT 0 10000000 INSERT 0 10000000 INSERT 0 10000000
select c.oid,c.relname,n.nspname,c.relkind,c.relpages, c.reltuples::bigint,c.relstorage from pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid where n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | relname | nspname | relkind | relpages | reltuples | relstorage ----------+---------------+---------+---------+----------+-----------+------------ 19807184 | vac_test_heap | sachi | r | 16000 | 10011500 | h 19807190 | vac_test_ao | sachi | r | 10496 | 10000000 | a 19807200 | vac_test_aoco | sachi | r | 5888 | 10000000 | c (3 rows)
analyze sachi.vac_test_heap; analyze sachi.vac_test_ao; analyze sachi.vac_test_aoco; select c.oid,c.relname,n.nspname,c.relkind,c.relpages, c.reltuples::bigint,c.relstorage from pg_class c JOIN pg_namespace n ON c.relnamespace = n.oid where n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco');
delete from sachi.vac_test_heap where a<800000; delete from sachi.vac_test_ao where a<800000; delete from sachi.vac_test_aoco where a<800000; analyze sachi.vac_test_heap; analyze sachi.vac_test_ao; analyze sachi.vac_test_aoco; DELETE 14400000 DELETE 14400000 DELETE 14400000 ANALYZE ANALYZE ANALYZE
----no Bloat
select bdinspname as "Schema Name", bdirelname as "Table/Index Name", bdirelpages as "Pages(current)",bdiexppages as "Pages(Expected)", bdidiag as "Comments" from gp_toolkit.gp_bloat_diag where bdinspname='sachi' and bdirelname in ('vac_test_heap','vac_test_ao','vac_test_aoco') ; Schema Name | Table/Index Name | Pages(current) | Pages(Expected) | Comments -------------+------------------+----------------+-----------------+---------- (0 rows)
analyze sachi.vac_test_heap; analyze sachi.vac_test_ao; analyze sachi.vac_test_aoco; select bdinspname as "Schema Name", bdirelname as "Table/Index Name", bdirelpages as "Pages(current)",bdiexppages as "Pages(Expected)", bdidiag as "Comments" from gp_toolkit.gp_bloat_diag where bdinspname='sachi' and bdirelname in ('vac_test_heap','vac_test_ao','vac_test_aoco') ; Schema Name | Table/Index Name | Pages(current) | Pages(Expected) | Comments -------------+------------------+----------------+-----------------+---------- (0 rows)
delete from sachi.vac_test_heap where a<8000000; delete from sachi.vac_test_ao where a<8000000; delete from sachi.vac_test_aoco where a<8000000;
analyze sachi.vac_test_heap; analyze sachi.vac_test_ao; analyze sachi.vac_test_aoco;
-- Now bloat diag views shows the bloat. AO and AOCO Table does not from this view. select bdinspname as "Schema Name", bdirelname as "Table/Index Name", bdirelpages as "Pages(current)",bdiexppages as "Pages(Expected)", bdidiag as "Comments" from gp_toolkit.gp_bloat_diag where bdinspname='sachi' and bdirelname in ('vac_test_heap','vac_test_ao','vac_test_aoco') ; Schema Name | Table/Index Name | Pages(current) | Pages(Expected) | Comments -------------+------------------+----------------+-----------------+----------------- ------------------- sachi | vac_test_heap | 32896 | 5086 | moderate amount of bloat suspected (1 row) SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows, (SELECT max(percent_hidden) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as perhid, (SELECT sum(total_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as tottup, (SELECT max(hidden_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as hidtup FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | schema_name | table_name | num_rows | ao_num_rows | perhid | tottup | hidtup ----------+-------------+---------------+----------+-------------+--------+---------- +-------- 19807190 | sachi | vac_test_ao | 4000002 | 20505000 | 80.49 | 20505000 | 128947 19807200 | sachi | vac_test_aoco | 4000002 | 20505000 | 80.49 | 20505000 | 128947 (2 rows)
vacuum sachi.vac_test_heap; vacuum sachi.vac_test_ao; vacuum sachi.vac_test_aoco; VACUUM VACUUM VACUUM
-- Still show bloat on heap tables. select bdinspname as "Schema Name", bdirelname as "Table/Index Name", bdirelpages as "Pages(current)",bdiexppages as "Pages(Expected)", bdidiag as "Comments" from gp_toolkit.gp_bloat_diag where bdinspname='sachi' and bdirelname in ('vac_test_heap','vac_test_ao','vac_test_aoco') ; Schema Name | Table/Index Name | Pages(current) | Pages(Expected) | Comments -------------+------------------+----------------+-----------------+----------------- ------------------- sachi | vac_test_heap | 32896 | 5372 | moderate amount of bloat suspected (1 row) -- Hidden tuple is gone but see additional rows. SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows, (SELECT max(percent_hidden) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as perhid, (SELECT sum(total_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as tottup, (SELECT max(hidden_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as hidtup FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | schema_name | table_name | num_rows | ao_num_rows | perhid | tottup | hidtup ----------+-------------+---------------+----------+-------------+--------+--------- +-------- 19807190 | sachi | vac_test_ao | 4000002 | 0 | 0.00 | 4000002 | 0 19807190 | sachi | vac_test_ao | 4000002 | 4000002 | 0.00 | 4000002 | 0 19807200 | sachi | vac_test_aoco | 4000002 | 0 | 0.00 | 4000002 | 0 19807200 | sachi | vac_test_aoco | 4000002 | 4000002 | 0.00 | 4000002 | 0 (4 rows)
vacuum full sachi.vac_test_heap; vacuum full sachi.vac_test_ao; vacuum full sachi.vac_test_aoco; VACUUM VACUUM VACUUM
select bdinspname as "Schema Name", bdirelname as "Table/Index Name", bdirelpages as "Pages(current)",bdiexppages as "Pages(Expected)", bdidiag as "Comments" from gp_toolkit.gp_bloat_diag where bdinspname='sachi' and bdirelname in ('vac_test_heap','vac_test_ao','vac_test_aoco') ; Schema Name | Table/Index Name | Pages(current) | Pages(Expected) | Comments -------------+------------------+----------------+-----------------+---------- (0 rows) SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows, (SELECT max(percent_hidden) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as perhid, (SELECT sum(total_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as tottup, (SELECT max(hidden_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as hidtup FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | schema_name | table_name | num_rows | ao_num_rows | perhid | tottup | hidtup ----------+-------------+---------------+----------+-------------+--------+--------- +-------- 19807190 | sachi | vac_test_ao | 4000002 | 0 | 0.00 | 4000002 | 0 19807190 | sachi | vac_test_ao | 4000002 | 4000002 | 0.00 | 4000002 | 0 19807200 | sachi | vac_test_aoco | 4000002 | 0 | 0.00 | 4000002 | 0 19807200 | sachi | vac_test_aoco | 4000002 | 4000002 | 0.00 | 4000002 | 0 (4 rows)
analyze sachi.vac_test_heap; analyze sachi.vac_test_ao; analyze sachi.vac_test_aoco; ANALYZE ANALYZE ANALYZE
select bdinspname as "Schema Name", bdirelname as "Table/Index Name", bdirelpages as "Pages(current)",bdiexppages as "Pages(Expected)", bdidiag as "Comments" from gp_toolkit.gp_bloat_diag where bdinspname='sachi' and bdirelname in ('vac_test_heap','vac_test_ao','vac_test_aoco') ; Schema Name | Table/Index Name | Pages(current) | Pages(Expected) | Comments -------------+------------------+----------------+-----------------+---------- (0 rows) SELECT c.oid, n.nspname AS schema_name, c.relname AS table_name, c.reltuples::bigint AS num_rows, (gp_toolkit.__gp_aovisimap_hidden_info(c.oid)).total_tupcount AS ao_num_rows, (SELECT max(percent_hidden) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as perhid, (SELECT sum(total_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as tottup, (SELECT max(hidden_tupcount) FROM gp_toolkit.__gp_aovisimap_compaction_info(c.oid)) as hidtup FROM pg_appendonly a JOIN pg_class c ON c.oid=a.relid JOIN pg_namespace n ON c.relnamespace=n.oid WHERE c.relkind='r' and n.nspname='sachi' and c.relname in ('vac_test_heap','vac_test_ao','vac_test_aoco'); oid | schema_name | table_name | num_rows | ao_num_rows | perhid | tottup | hidtup ----------+-------------+---------------+----------+-------------+--------+--------- +-------- 19807190 | sachi | vac_test_ao | 4000002 | 0 | 0.00 | 4000002 | 0 19807190 | sachi | vac_test_ao | 4000002 | 4000002 | 0.00 | 4000002 | 0 19807200 | sachi | vac_test_aoco | 4000002 | 0 | 0.00 | 4000002 | 0 19807200 | sachi | vac_test_aoco | 4000002 | 4000002 | 0.00 | 4000002 | 0 (4 rows)
Get more information
SELECT * FROM gp_toolkit.__gp_aovisimap_compaction_info(19807190); SELECT * FROM gp_toolkit.__gp_aovisimap_compaction_info(19807200); SELECT * FROM gp_toolkit.__gp_aovisimap_hidden_info(19807200); SELECT * FROM gp_toolkit.__gp_aovisimap_hidden_info(19807190); |
MADLib and MADLib pivot function
Apache MADlib is a SQL-based open source library for scalable in-database analytics that supports Greenplum Database. The library offers data scientists numerous distributed implementations of mathematical, statistical and machine learning methods, including many utilities for data transformation. New utilities have been added in the recent MADlib 1.9.1 release, including: Pivot: data summarization tool that can do basic OLAP type operations Sessionization: time-oriented session reconstruction on a data set comprising a sequence of events Prediction metrics: set of metrics to evaluate the quality of predictions of a model For more details https://madlib.incubator.apache.org/docs/latest/group__grp__pivot.html https://blog.pivotal.io/big-data-pivotal/products/new-tools-to-shape-data-in-apache-madlib How to Install MADLib in Greenplum database |
Checking list of security definer functions in GPDB
testdb=# \d pg_proc Table "pg_catalog.pg_proc" Column | Type | Modifiers ----------------+-----------+----------- proname | name | not null pronamespace | oid | not null proowner | oid | not null prolang | oid | not null proisagg | boolean | not null prosecdef | boolean | not null proisstrict | boolean | not null proretset | boolean | not null provolatile | "char" | not null pronargs | smallint | not null prorettype | oid | not null proiswin | boolean | not null proargtypes | oidvector | not null proallargtypes | oid[] | proargmodes | "char"[] | proargnames | text[] | prosrc | text | probin | bytea | proacl | aclitem[] | prodataaccess | "char" | not null Indexes: "pg_proc_oid_index" UNIQUE, btree (oid) "pg_proc_proname_args_nsp_index" UNIQUE, btree (proname, proargtypes, pronamespace) testdb=# psql> select proname from pg_proc where prosecdef is TRUE; |
Connecting as another user without their password
In Oracle we use to run (from privileged user) alter session set current_schema=abc; Lets discuss how we can achieve same thing in Greenplum using SET ROLE and with SET SESSION AUTHORIZATION. SET ROLE command sets the current role identifier of the current session. This command sets the current role identifier of the current SQL-session context to be rolename. The role name may be written as either an identifier or a string literal. After SET ROLE, permissions checking for SQL commands is carried out as though the named role were the one that had logged in originally. The specified rolename must be a role that the current session user is a member of. If the session user is a superuser, any role can be selected. The NONE and RESET forms reset the current role identifier to be the current session role identifier. These forms may be executed by any user. Parameters SESSION - Specifies that the command takes effect for the current session. This is the default.LOCAL -Specifies that the command takes effect for only the current transaction. After COMMIT or ROLLBACK, the session-level setting takes effect again. Note that SET LOCAL will appear to have no effect if it is executed outside of a transaction. rolename -The name of a role to us -e for permissions checking in this session. NONE and RESET Reset the current role identifier to be the current session role identifier (that of the role used to log in). Using this command, it is possible to either add privileges or restrict privileges. If the session user role has the INHERITS attribute, then it automatically has all the privileges of every role that it could SET ROLE to; in this case SET ROLE effectively drops all the privileges assigned directly to the session user and to the other roles it is a member of, leaving only the privileges available to the named role. On the other hand, if the session user role has the NOINHERITS attribute, SET ROLE drops the privileges assigned directly to the session user and instead acquires the privileges available to the named role. In particular, when a superuser chooses to SET ROLE to a non-superuser role, she loses her superuser privileges. SET ROLE has effects comparable to SET SESSION AUTHORIZATION, but the privilege checks involved are quite different. Also, SET SESSION AUTHORIZATION determines which roles are allowable for later SET ROLE commands, whereas changing roles with SET ROLE does not change the set of roles allowed to a later SET ROLE. Examples gpadmin=# SELECT SESSION_USER, CURRENT_USER; session_user | current_user --------------+-------------- gpadmin | gpadmin (1 row) gpadmin=# SET ROLE 'sachi'; SET gpadmin=> SELECT SESSION_USER, CURRENT_USER; session_user | current_user --------------+-------------- gpadmin | sachi (1 row) gpadmin=> Another way of doing this and little extra is with SET SESSION AUTHORIZATION command. SET SESSION AUTHORIZATION command sets the session role identifier and the current role identifier of the current session. Synopsis SET [SESSION | LOCAL] SESSION AUTHORIZATION rolename SET [SESSION | LOCAL] SESSION AUTHORIZATION DEFAULT RESET SESSION AUTHORIZATION Description This command sets the session role identifier and the current role identifier of the current SQL-session context to be rolename. The role name may be written as either an identifier or a string literal. Using this command, it is possible, for example, to temporarily become an unprivileged user and later switch back to being a superuser.The session role identifier is initially set to be the (possibly authenticated) role name provided by the client. The current role identifier is normally equal to the session user identifier, but may change temporarily in the context of setuid functions and similar mechanisms; it can also be changed by SET ROLE. The current user identifier is relevant for permission checking. The session user identifier may be changed only if the initial session user (the authenticated user) had the superuser privilege. Otherwise, the command is accepted only if it specifies the authenticated user name. The DEFAULT and RESET forms reset the session and current user identifiers to be the originally authenticated user name. These forms may be executed by any user. Parameters SESSION Specifies that the command takes effect for the current session. This is the default. LOCAL Specifies that the command takes effect for only the current transaction. After COMMIT or ROLLBACK, the session-level setting takes effect again. Note that SET LOCAL will appear to have no effect if it is executed outside of a transaction. rolename The name of the role to assume. NONE RESET Reset the session and current role identifiers to be that of the role used to log in. Examples gpadmin=# SELECT SESSION_USER, CURRENT_USER; session_user | current_user --------------+-------------- gpadmin | gpadmin (1 row) gpadmin=# SET ROLE 'sachi'; SET gpadmin=> SELECT SESSION_USER, CURRENT_USER; session_user | current_user --------------+-------------- gpadmin | sachi (1 row) gpadmin=> SET ROLE NONE; SET gpadmin=# SELECT SESSION_USER, CURRENT_USER; session_user | current_user --------------+-------------- gpadmin | gpadmin (1 row) gpadmin=# SET SESSION AUTHORIZATION 'sachi'; SET gpadmin=> SELECT SESSION_USER, CURRENT_USER; session_user | current_user --------------+-------------- sachi | sachi (1 row) gpadmin=> |
Enable PQO for a specific user/role
gpadmin=# select * from pg_roles where rolname='sachi'; -[ RECORD 1 ]-----+--------- rolname | sachi rolsuper | f rolinherit | t rolcreaterole | f rolcreatedb | f rolcatupdate | f rolcanlogin | f rolconnlimit | -1 rolpassword | ******** rolvaliduntil | rolconfig | rolresqueue | 6055 oid | 26097 rolcreaterextgpfd | f rolcreaterexthttp | f rolcreatewextgpfd | f rolcreaterexthdfs | f rolcreatewexthdfs | f gpadmin=# alter role sachi set optimizer=on; ALTER ROLE gpadmin=# select * from pg_roles where rolname='sachi'; -[ RECORD 1 ]-----+--------------- rolname | sachi rolsuper | f rolinherit | t rolcreaterole | f rolcreatedb | f rolcatupdate | f rolcanlogin | f rolconnlimit | -1 rolpassword | ******** rolvaliduntil | rolconfig | {optimizer=on} rolresqueue | 6055 oid | 26097 rolcreaterextgpfd | f rolcreaterexthttp | f rolcreatewextgpfd | f rolcreaterexthdfs | f rolcreatewexthdfs | f gpadmin=# optimizer_minidump : The PQO generates minidump files to describe the optimization context for a given query. The minidump files are used by Pivotal support to analyze Greenplum Database issues. The information in the file is not in a format that can be easily used by customers for debugging or troubleshooting. The minidump file is located under the master data directory and uses the following naming format: Minidump_date_time.mdp The minidump file contains this query related information:
Setting this parameter to ALWAYS generates a minidump for all queries. Pivotal recommends that you set this parameter to ONERROR in production environments to minimize total optimization time. Value Range Default Set Classifications ONERROR ONERROR master ALWAYS session reload |
Enable PQO (Pivotal Query Optimizer) at the database level
Step 1: Check the status of the optimizer gpadmin=# show optimizer; optimizer ----------- off (1 row) gpadmin=# show optimizer_control; optimizer_control ------------------- on (1 row) Step 1: Check state of the database (optional) [gpadmin@gpdb-sandbox ~]$ gpstate 20160114:12:38:42:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:-Starting gpstate with args: 20160114:12:38:42:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:-local Greenplum Version: 'postgres (Greenplum Database) 4.3.6.1 build 2' 20160114:12:38:42:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:-master Greenplum Version: 'PostgreSQL 8.2.15 (Greenplum Database 4.3.6.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Oct 1 2015 15:14:22' 20160114:12:38:42:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Segment details from master... 20160114:12:38:42:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:-Gathering data from segments... . 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:-Greenplum instance status summary 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------- 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Master instance = Active 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Master standby = No master standby configured 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total segment instance count from metadata = 2 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------- 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Primary Segment Status 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------- 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total primary segments = 2 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total primary segment valid (at master) = 2 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total primary segment failures (at master) = 0 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total number of postmaster.pid files missing = 0 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total number of postmaster.pid files found = 2 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total number of postmaster.pid PIDs missing = 0 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total number of postmaster.pid PIDs found = 2 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total number of /tmp lock files missing = 0 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total number of /tmp lock files found = 2 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total number postmaster processes missing = 0 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Total number postmaster processes found = 2 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------- 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Mirror Segment Status 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------- 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:- Mirrors not configured on this array 20160114:12:38:43:169214 gpstate:gpdb-sandbox:gpadmin-[INFO]:----------------------------------------------------- Step 3: Turn on optimizer_analyze_root_partition [gpadmin@gpdb-sandbox ~]$ gpconfig -c optimizer_analyze_root_partition -v on --masteronly 20160114:12:39:18:169310 gpconfig:gpdb-sandbox:gpadmin-[INFO]:-completed successfully Step 4: Turn on the optimizer [gpadmin@gpdb-sandbox ~]$ gpconfig -c optimizer -v on --masteronly 20160114:12:39:59:169532 gpconfig:gpdb-sandbox:gpadmin-[INFO]:-completed successfully Step 5: Apply the changes [gpadmin@gpdb-sandbox ~]$ gpstop -u 20160114:12:40:23:169600 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Starting gpstop with args: -u 20160114:12:40:23:169600 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Gathering information and validating the environment... 20160114:12:40:23:169600 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information 20160114:12:40:23:169600 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Obtaining Segment details from master... 20160114:12:40:23:169600 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.6.1 build 2' 20160114:12:40:23:169600 gpstop:gpdb-sandbox:gpadmin-[INFO]:-Signalling all postmaster processes to reload . Step 6: Check the status [gpadmin@gpdb-sandbox ~]$ psql psql (8.2.15) Type "help" for help. gpadmin=# show optimizer; optimizer ----------- on (1 row) gpadmin=# |
New Parameters in GPDB 4.3 and onward
There were 22 new parameters introduced in Greenplum starting GPDB 4.3.0. Mostly these parameters are related to specific features. Click here to learn more about these 22 parameters. |
Spill or workfile configuration parameters
Greenplum introduced 2 server configuration parameter. 1. gp_workfile_limit_per_query – This sets the maximum disk size (in kilobytes) an individual query is allowed to use for creating temporary spill files at each segment. The default value is 0, which means a limit is not enforced. 2. gp_workfile_limit_per_segment – Sets the maximum total disk size (in kilobytes) that all running queries are allowed to use for creating temporary spill files at each segment. The default value is 0, which means a limit is not enforced. |