Append-Optimized Tables - New Feature in Greenplum V3.0

Post date: Jan 21, 2014 7:54:1 PM

a) What are Append-Optimized Tables in Greenplum 3.0?

Append-optimized tables are similar to append-only tables and also allow UPDATE and DELETE operations on the table data.

Append-optimized tables works best with denormalized fact tables in a data warehouse environment, where the data is static after it is loaded. Denormalized fact tables are typically the largest tables in the system. Fact tables are usually loaded in batches and accessed by read-only queries. Moving large fact tables to an append-only storage model eliminates the storage overhead of the per-row update visibility information, saving about 20 bytes per row.

When migrating Greenplum Database from 4.2.x.x to 4.3, append-only tables are migrated to append-optimized tables.Migration of database with a large number of append-only tables might require a significant amount of time.

b) Restrictions on Append-optimized : Append-optimized tables cannot be used with the following functionality

1. Transactions with serializable isolation levels

2. Updatable cursors

With append-optimized tables, you use the VACUUM command to reclaim the storage capacity from table data that was deleted or updated.

c) Creating append-optimized tables

T0 create an append-optimized table with no compression:

CREATE TABLE aotable (a int, b text)

WITH (appendonly=true)

DISTRIBUTED BY (a);

Note that we used WITH clause of the CREATE TABLE command to declare the table storage options.The default option without WITH clause creates table as a regular row-oriented heap-storage table.

UPDATE and DELETE are not allowed on append-optimized tables in a serializable transaction and will cause the transaction to abort. CLUSTER, DECLARE...FOR UPDATE, and triggers are not supported with append-optimized tables.

system-level schemas pg_aoseg stores append-optimized table objects. This schema is used internally by the Greenplum Database system.

d) Append-Optimized Storage

Append-optimized table storage works best with denormalized fact tables in a data warehouse environment. Denormalized fact tables are typically the largest tables in the system. Fact tables are usually loaded in batches and accessed by read-only queries. Moving large fact tables to an append-optimized storage model eliminates the storage overhead of the per-row update visibility information, saving about 20 bytes per row. This allows for a leaner and easier-to-optimize page structure.

The storage model of append-optimized tables is optimized for bulk data loading. Single row INSERT statements are not recommended.

d) Using Compression in Append-Optimized Tables Only

There are two types of in-database compression available in the Greenplum Database for append-optimized tables:

1. Table-level compression is applied to an entire table.

2. Column-level compression is applied to a specific column. You can apply different column-level compression algorithms to different columns.

The following table summarizes the available compression algorithms.

==============================================================================

Table Orientation Available Compression Types Supported Algorithms

==============================================================================

Row Table ZLIB and QUICKLZ

Column Column and Table RLE_TYPE, ZLIB, and QUICKLZ

==============================================================================

When choosing a compression type and level for append-optimized tables, consider these factors:

1. CPU usage. Your segment systems must have the available CPU power to compress and uncompress the data.

2. Compression ratio/disk size. Minimizing disk size is one factor, but also consider the time and CPU capacity required to compress and scan data. Find the optimal settings for efficiently compressing data without causing excessively long compression times or slow scan rates.

3. Speed of compression. QuickLZ compression generally uses less CPU capacity and compresses data faster at a lower compression ratio than zlib. zlib provides higher compression ratios at lower speeds.

For example, at compression level 1 (compresslevel=1), QuickLZ and zlib have comparable compression ratios, though at different speeds. Using zlib with compresslevel=6 can significantly increase the compression ratio compared to QuickLZ, though with lower compression speed.

4. Speed of decompression/scan rate. Performance with compressed append-optimized tables depends on hardware, query tuning settings, and other factors. Perform comparison testing to determine the actual performance in your environment.

Note: Do not use compressed append-optimized tables on file systems that use compression. If the file system on which your segment data directory resides is a compressed file system, your append-optimized table must not use compression.

Performance with compressed append-optimized tables depends on hardware, query tuning settings, and other factors. Greenplum recommends performing comparison testing to determine the actual performance in your environment.

Note: QuickLZ compression level can only be set to level 1; no other options are available. Compression level with zlib can be set at any value from 1 - 9.

Note: When an ENCODING clause conflicts with a WITH clause, the ENCODING clause has higher precedence than the WITH clause.

To create a compressed table

The WITH clause of the CREATE TABLE command declares the table storage options. Tables that use compression must be append-optimized tables. For example, to create an append-optimized table with zlib compression at a compression level of 5:

CREATE TABLE aotable1 (

a int,

b text

)

WITH (appendonly=true, compresstype=zlib, compresslevel=5);

Checking the Compression and Distribution of an Append-Optimized Table

Greenplum provides built-in functions to check the compression ratio and the distribution of an append-optimized table. The functions take either the object ID or a table name. You can qualify the table name with a schema name.

The compression ratio is returned as a common ratio. For example, a returned value of 3.19, or 3.19:1, means that the uncompressed table is slightly larger than three times the size of the compressed table.

The distribution of the table is returned as a set of rows that indicate how many tuples are stored on each segment. For example, in a system with four primary segments with dbid values ranging from 0 - 3, the function returns four rows similar to the following:

SELECT get_ao_distribution('aotable_comp');