How data is stored in Greenplum
Post date: Oct 12, 2014 2:52:19 AM
Each directory represents a database (created via create database). The number is the oid of the database. To see the oid and its name, run the following statement:
sachi=# select oid, datname from pg_database;
oid | datname
-------+-----------
16992 | gpadmin
10900 | postgres
1 | template1
10899 | template0
33476 | himanshu
16993 | sachi
33553 | gpperfmon
(7 rows)
sachi=#
Inside each directory each file corresponds to the an entry in pg_class where the oid matches the number of the file in the directory. You can see the oids and to which relation they relate by running the statement:
PSQL>select cl.relfilenode, nsp.nspname as schema_name, cl.relname, cl.relkind from pg_class cl join pg_namespace nsp on cl.relnamespace = nsp.oid;
sachi=# select cl.relfilenode, nsp.nspname as schema_name, cl.relname, cl.relkind from pg_class cl join pg_namespace nsp on cl.relnamespace = nsp.oid and cl.relname like 'gp_%';
relfilenode | schema_name | relname | relkind
-------------+-------------+------------------------------------------------+---------
10873 | pg_catalog | gp_pgdatabase | v
10876 | pg_catalog | gp_distributed_xacts | v
10879 | pg_catalog | gp_transaction_log | v
10882 | pg_catalog | gp_distributed_log | v
6429 | pg_catalog | gp_verification_history | r
5008 | pg_catalog | gp_master_mirroring | r
5043 | pg_catalog | gp_fastsequence | r
5096 | pg_catalog | gp_global_sequence | r
5090 | pg_catalog | gp_persistent_relation_node | r
5094 | pg_catalog | gp_relation_node | r
5091 | pg_catalog | gp_persistent_database_node | r
5092 | pg_catalog | gp_persistent_tablespace_node | r
5093 | pg_catalog | gp_persistent_filespace_node | r
16497 | gp_toolkit | gp_log_system | v
16500 | gp_toolkit | gp_log_database | v
16503 | gp_toolkit | gp_log_master_concise | v
16506 | gp_toolkit | gp_log_command_timings | v
16520 | gp_toolkit | gp_param_setting_t | c
16535 | gp_toolkit | gp_param_settings_seg_value_diffs | v
16538 | gp_toolkit | gp_pgdatabase_invalid | v
16552 | gp_toolkit | gp_skew_details_t | c
16576 | gp_toolkit | gp_skew_analysis_t | c
16590 | gp_toolkit | gp_skew_coefficients | v
16595 | gp_toolkit | gp_skew_idle_fractions | v
16598 | gp_toolkit | gp_stats_missing | v
16612 | gp_toolkit | gp_disk_free | r
16624 | gp_toolkit | gp_bloat_expected_pages | v
16628 | gp_toolkit | gp_bloat_diag | v
16631 | gp_toolkit | gp_resq_activity | v
6111 | pg_catalog | gp_san_config_mountid_index | i
6101 | pg_catalog | gp_configuration_content_definedprimary_index | i
6102 | pg_catalog | gp_configuration_dbid_index | i
6108 | pg_catalog | gp_db_interfaces_dbid_index | i
6109 | pg_catalog | gp_interfaces_interface_index | i
16634 | gp_toolkit | gp_resq_activity_by_queue | v
6106 | pg_catalog | gp_segment_config_content_preferred_role_index | i
6107 | pg_catalog | gp_segment_config_dbid_index | i
16637 | gp_toolkit | gp_resq_priority_backend | v
16640 | gp_toolkit | gp_resq_priority_statement | v
16646 | gp_toolkit | gp_locks_on_resqueue | v
6431 | pg_catalog | gp_verification_history_vertoken_index | i
16643 | gp_toolkit | gp_resq_role | v
16649 | gp_toolkit | gp_locks_on_relation | v
5000 | pg_catalog | gp_configuration | r
5006 | pg_catalog | gp_configuration_history | r
5029 | pg_catalog | gp_db_interfaces | r
5030 | pg_catalog | gp_interfaces | r
5001 | pg_catalog | gp_id | r
5002 | pg_catalog | gp_distribution_policy | r
5003 | pg_catalog | gp_version_at_initdb | r
5036 | pg_catalog | gp_segment_configuration | r
5039 | pg_catalog | gp_fault_strategy | r
5035 | pg_catalog | gp_san_configuration | r
63239 | pg_catalog | gp_fastsequence_objid_objmod_index | i
63249 | pg_catalog | gp_relation_node_index | i
16652 | gp_toolkit | gp_roles_assigned | v
16655 | gp_toolkit | gp_size_of_index | v
16658 | gp_toolkit | gp_size_of_table_disk | v
16661 | gp_toolkit | gp_size_of_table_uncompressed | v
16664 | gp_toolkit | gp_table_indexes | v
16667 | gp_toolkit | gp_size_of_all_table_indexes | v
16670 | gp_toolkit | gp_size_of_table_and_indexes_disk | v
16673 | gp_toolkit | gp_size_of_table_and_indexes_licensing | v
16676 | gp_toolkit | gp_size_of_partition_and_indexes_disk | v
16679 | gp_toolkit | gp_size_of_schema_disk | v
16682 | gp_toolkit | gp_size_of_database | v
16685 | gp_toolkit | gp_resqueue_status | v
63324 | pg_catalog | gp_policy_localoid_index | i
44088 | public | gp_log_master_ext | r
(69 rows)
sachi=#
--More--
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'test3';
sachi=# select * from pg_catalog.pg_filespace_entry;
fsefsoid | fsedbid | fselocation
----------+---------+------------------------
3052 | 1 | /home/gpmaster/gpsne-1
3052 | 2 | /disk1/gpdata1/gpsne0
3052 | 3 | /disk2/gpdata2/gpsne1
(3 rows)
sachi=# select * from pg_catalog.pg_filespace;
fsname | fsowner
-----------+---------
pg_system | 10
(1 row)
sachi=#
To show the space used by TOAST tables, use a query like the following:
SELECT relname, relpages FROM pg_class,
(SELECT reltoastrelid FROM pg_class WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR oid =
(SELECT reltoastidxid FROM pg_class WHERE oid = ss.reltoastrelid)
ORDER BY relname; relname
| relpages ----------------------+---------- pg_toast_16806 | 0 pg_toast_16806_index | 1
You can easily display index sizes, too:
SELECT c2.relname, c2.relpages FROM pg_class c, pg_class c2, pg_index i WHERE c.relname = 'customer' AND c.oid = i.indrelid AND c2.oid = i.indexrelid ORDER BY c2.relname; relname | relpages ----------------------+---------- customer_id_indexdex | 26
It is easy to find your largest tables and indexes using this information:
SELECT relname, relpages FROM pg_class ORDER BY relpages DESC; relname | relpages ----------------------+---------- bigtable | 3290 customer | 3144