What is Run-length Encoding in Greenplum?

Post date: Nov 02, 2012 11:19:23 PM

The Greenplum Database supports Run-length Encoding (RLE) for column-level compression. RLE is a type of data compression in which repeated data is stored as a single data value and a count. RLE is most useful on data with duplicated elements. For example, in a table with two columns, a date and a description that contains 200,000 entries for date1 and 400,000 entries for date2, RLE compression for the date field is similar to date1 200000 date2 400000. RLE is not useful with files that do not have many runs of repeated data as it could greatly increase the file size.

There are four levels of RLE compression available.The levels progressively increase the compression ratio, but decrease the compression speed.

A table with colum-oriented RLE compression is not compatible with any releases of the Greenplum Database prior to the 4.2.1 version. To backup these types of tables and restore them to a prior version of the Greenplum Database, alter the table to have no compression or a compression type supported in the older version (ZLIB or QUICKLZ) before beginning the backup operation.