Data types to be used and avoid in a distribution in Greenplum

Post date: Dec 04, 2014 11:8:35 PM

The columns with the following data types should be avoided and not be part of any DISTRIBUTION key selected for any given table

TEXT is data type with unlimited length. A very noxious data type.

DECIMAL not to be used at all in the DISTRIBUTION key

DOUBLE-PRECISION (Floating-Point binary64)

REAL or real number

SERIAL mimics a good sequential number but penalizes a Singleton Select JOIN practice

BIGSERIAL

MONEY which stores a monetary types of data (it has been deprecated)

Boolean types (0/1, true/false, etc)

The columns with the following data types can be part of any DISTRIBUTION key selected for any given table

CHAR is allowed but must contain a length that is reasonable with high cardinality to mimic the same distribution effect that of DATE.

VARCHAR is allowed but must contain a length that is reasonable with high cardinality to mimic the same distribution effect that of DATE.

NUMERIC has no limit but it can store numbers with up to 1000 digits of precision.

BYTEA 1 or 4 bytes plus the actual binary string. It is variable-length.

TIMESTAMP behaves just as DATE does

TIME behaves just as DATE and TIMESTAMP does

DATE fields are also favorable for these act as if the data is basically distributed simulating that of a partition. Do not use it alone in the distribution.

The type smallint, integer, and bigint store whole numbers, that is, numbers without fractional components, of various ranges. Attempts to store values outside of the allowed range will result in an error.

The type integer is the common choice, as it offers the best balance between range, storage size, and performance. The smallint type is generally only used if disk space is at a premium. The bigint type should only be used if the integer range is insufficient, because the latter is definitely faster.

On very minimal operating system the bigint type might not function correctly, because it relies on compiler support for eight-byte integers. On such machines, bigint acts the same as integer, but still takes up eight bytes of storage. (We are not aware of any modern platform where this is the case.)

SQL only specifies the integer types integer (or int), smallint, and bigint. The type names int2, int4, and int8 are extensions, which are also used by some other SQL based DBMS.