Greenplum database new features - Table storage options and distribution policy
If you recall, we use [ WITH ( storage_parameter=value [, ... ] ) clause to specify storage options when creating a table in Greenplum. As you see from the create table syntax below, this is optional.
To simplify using storage clause recursively for many tables. Now you can set this default using gp_default_storage_options. This feature was introduced in GPDB 4.3.4.0. You can set the storage options with this parameter instead of specifying the table storage options in the WITH clause of the CREATE TABLE command. The table storage options that are specified with the CREATE TABLE command override the values specified by this parameter.
Table storage options and distribution policy
(4.3.4)
Default Database Table Distribution Policy
CREATE TABLE
Defines a new table.
CREATE [[GLOBAL | LOCAL] {TEMPORARY | TEMP}] TABLE table_name (
[ { column_name data_type [ DEFAULT default_expr ] [column_constraint [ ... ]
[ ENCODING ( storage_directive [,...] ) ]
]
| table_constraint
| LIKE other_table [{INCLUDING | EXCLUDING}
{DEFAULTS | CONSTRAINTS}] ...}
[, ... ] ]
[column_reference_storage_directive [, …] ]
)
[ INHERITS ( parent_table [, ... ] ) ]
[ WITH ( storage_parameter=value [, ... ] )
[ ON COMMIT {PRESERVE ROWS | DELETE ROWS | DROP} ]
[ TABLESPACE tablespace ]
[ DISTRIBUTED BY (column, [ ... ] ) | DISTRIBUTED RANDOMLY ]
[ PARTITION BY partition_type (column)
[ SUBPARTITION BY partition_type (column) ]
[ SUBPARTITION TEMPLATE ( template_spec ) ]
[...]
( partition_spec )
| [ SUBPARTITION BY partition_type (column) ]
[...]
( partition_spec
[ ( subpartition_spec
[(...)]
) ]
)
where storage_parameter is:
APPENDONLY={TRUE|FALSE}
BLOCKSIZE={8192-2097152}
ORIENTATION={COLUMN|ROW}
COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
COMPRESSLEVEL={0-9}
FILLFACTOR={10-100}
OIDS[=TRUE|FALSE]
where column_constraint is:
[CONSTRAINT constraint_name]
NOT NULL | NULL
| UNIQUE [USING INDEX TABLESPACE tablespace]
[WITH ( FILLFACTOR = value )]
| PRIMARY KEY [USING INDEX TABLESPACE tablespace]
[WITH ( FILLFACTOR = value )]
| CHECK ( expression )
and table_constraint is:
[CONSTRAINT constraint_name]
UNIQUE ( column_name [, ... ] )
[USING INDEX TABLESPACE tablespace]
[WITH ( FILLFACTOR=value )]
| PRIMARY KEY ( column_name [, ... ] )
[USING INDEX TABLESPACE tablespace]
[WITH ( FILLFACTOR=value )]
| CHECK ( expression )
where partition_type is:
LIST
| RANGE
where partition_specification is:
partition_element [, ...]
and partition_element is:
DEFAULT PARTITION name
| [PARTITION name] VALUES (list_value [,...] )
| [PARTITION name]
START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]
[ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]
[ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]
| [PARTITION name]
END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]
[ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]
[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[column_reference_storage_directive [, …] ]
[ TABLESPACE tablespace ]
where subpartition_spec or template_spec is:
subpartition_element [, ...]
and subpartition_element is:
DEFAULT SUBPARTITION name
| [SUBPARTITION name] VALUES (list_value [,...] )
| [SUBPARTITION name]
START ([datatype] 'start_value') [INCLUSIVE | EXCLUSIVE]
[ END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE] ]
[ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]
| [SUBPARTITION name]
END ([datatype] 'end_value') [INCLUSIVE | EXCLUSIVE]
[ EVERY ([datatype] [number | INTERVAL] 'interval_value') ]
[ WITH ( partition_storage_parameter=value [, ... ] ) ]
[column_reference_storage_directive [, …] ]
[ TABLESPACE tablespace ]
where storage_parameter is:
APPENDONLY={TRUE|FALSE}
BLOCKSIZE={8192-2097152}
ORIENTATION={COLUMN|ROW}
COMPRESSTYPE={ZLIB|QUICKLZ|RLE_TYPE|NONE}
COMPRESSLEVEL={0-9}
FILLFACTOR={10-100}
OIDS[=TRUE|FALSE]
where storage_directive is:
COMPRESSTYPE={ZLIB | QUICKLZ | RLE_TYPE | NONE}
| COMPRESSLEVEL={0-9}
| BLOCKSIZE={8192-2097152}
Where column_reference_storage_directive is:
COLUMN column_name ENCODING (storage_directive [, ... ] ), ...
|
DEFAULT COLUMN ENCODING (storage_directive [, ... ] )
gp_default_storage_options : Set the default values for the following table storage options when creating a table with the CREATE TABLE command.
APPENDONLY
BLOCKSIZE
CHECKSUM
COMPRESSTYPE
COMPRESSLEVEL
ORIENTATION
Note: Specify multiple storage option values as a comma separated list.
Not all combinations of storage option values are valid. If the specified storage options are not valid, an error is returned.
The defaults can be set for a database and user. If the server configuration parameter is set at different levels, this the order of precedence, from highest to lowest, of the table storage values when a user logs into a database and creates a table:
The values specified in a CREATE TABLE command with the WITH clause or ENCODING clause
The value of gp_default_storage_options that set for the user with the ALTER ROLE...SET command
The value of gp_default_storage_options that is set for the database with the ALTER DATABASE...SET command
The value of gp_default_storage_options that is set for the Greenplum Database system with the gpconfig utility
The parameter value is not cumulative. For example, if the parameter specifies the APPENDONLY and COMPRESSTYPE options for a database and a user logs in and sets the parameter to specify the value for the ORIENTATION option, the APPENDONLY, and COMPRESSTYPE values set at the database level are ignored.
This example ALTER DATABASE command sets the default ORIENTATION and COMPRESSTYPE table storage options for the database sachi.
ALTER DATABASE sachi SET gp_default_storage_options = 'orientation=column, compresstype=rle_type'
To create an append-optimized table in the sachi database with column-oriented table and RLE compression. The user needs to specify only APPENDONLY=TRUE in the WITH clause.
This example gpconfig utility command sets the default storage option for a Greenplum Database system. If you set the defaults for multiple table storage options, the value must be enclosed in single quotes and then in double quotes.
gpconfig -c 'gp_default_storage_options' -v "'appendonly=true, orientation=column'"
This example gpconfig utility command shows the value of the parameter. The parameter value must be consistent across the Greenplum Database master and all segments.
gpconfig -s 'gp_default_storage_options'
VALUE RANGE
APPENDONLY= TRUE | FALSE
DEFAULT
APPENDONLY=FALSE
SET CLASSIFICATIONS
master
BLOCKSIZE= integer between 8192 and 2097152
CHECKSUM= TRUE | FALSE
COMPRESSTYPE= ZLIB | QUICKLZ |RLE_TYPE | NONE
COMPRESSLEVEL= integer between 0 and 9
ORIENTATION= ROW | COLUMN
BLOCKSIZE=32768
CHECKSUM=TRUE
COMPRESSTYPE=none
COMPRESSLEVEL=0
ORIENTATION=ROW
session
reload
Note: The set classification when the parameter is set at the system level with the gpconfig utility.
Default Database Table Distribution Policy
When you create a table without distributed by clause, Greenplum chose first valid column to distribute (hash) the table data. This is the default behavior. In GPDB 4.3.4 and onward you can change this default behavior to random distribution using parameter gp_create_table_random_default_distribution. By default this parameter is set to OFF.
sachi=# show gp_create_table_random_default_distribution
sachi-# ;
gp_create_table_random_default_distribution
---------------------------------------------
off
(1 row)
sachi=# create table testdefaultdist( id numeric(2), name varchar(12), dob timestamp, address text);
NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column named 'id' as the Greenplum Database data distribution key for this table.
HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.
CREATE TABLE
sachi=#
Note that the value of the parameter is OFF, (the default), and the table creation command does not contain a DISTRIBUTED BY clause, Greenplum Database chooses the distribution key ( Using column named 'id' as the Greenplum Database data distribution key for this table) based on the table creation command.
If the value of the parameter is ON, the table is created with a random distribution if the command does not specify PRIMARY KEY or UNIQUE columns.
If the value of the parameter is OFF, (the default), and the table creation command does not contain a DISTRIBUTED BY clause, Greenplum Database chooses the table distribution key based on the command. If the LIKE or INHERITS clause is specified in table creation command, the created table uses the same distribution key as the source or parent table.
If the value of the parameter is set to ON, Greenplum Database follows these rules to create a table when the DISTRIBUTED BYclause is not specified:
If PRIMARY KEY or UNIQUE columns are not specified, the distribution of the table is random (DISTRIBUTED RANDOMLY). Table distribution is random even if the table creation command contains the LIKE or INHERITS clause.
If PRIMARY KEY or UNIQUE columns are specified, a DISTRIBUTED BY clause must also be specified. If a DISTRIBUTED BY clause is not specified as part of the table creation command, the command fails.
VALUE RANGE
boolean
DEFAULT
off
SET CLASSIFICATIONS
master
system
reload