Choose data types that use the least possible space

posted Aug 25, 2014, 11:02 AM by Sachchida Ojha   [ updated Aug 25, 2014, 11:02 AM ]
The goal of any performance tuning and optimization is to arrive at a point where the workload performance is acceptable. It is a best practice, when this goal is met to gather the execution statistics of the workload. These statistics can then be used as a baseline to help identify problems areas if and when the workload’s performance characteristic changes.

1. There are no performance differences among the character data types but use TEXT or VARCHAR rather than CHAR to decrease storage size. Use the smallest numeric data type that will accommodate your numeric data. Using  BIGINT for data that fits in INT or SMALLINT wastes storage space. 

2. Use the same data types for columns used in cross-table joins for very large tables. When the data types are different, DB must dynamically convert the data type of one the columns so For the optimum performance lay out the columns in the table to achieve data type byte alignment.

3. Lay out columns in heap tables by listing the distribution and partition columns first, followed by fixed number types with variable data types last. Lay out data types from largest to smallest. So BIGINT and TIMESTAMP before INT and DATE and all of these types before TEXT, VARCHAR or NUMERIC(x,y). For example, 8 byte first (BIGINT, TIMESTAMP), 4 byte next (INT, DATE,), 2 byte next (SMALLINT), variable data type last (VARCHAR).

So instead of this -> Int, Bigint, Timestamp, Bigint, Timestamp, Int (distribution key), Date (partition key), Bigint, Smallint. 
Do this -> Int (distribution key), Date (partition key), Bigint, Bigint, Timestamp, Bigint, Timestamp, Int, Smallint.
Comments