Greenplum Database Storage Model

posted Aug 25, 2014, 11:11 AM by Sachchida Ojha
Greenplum DB supports following storage model,
1. Heap storage, 
2. Append only storage, 
3. Row oriented storage and 
4. Column oriented storage. 

1. Heap storage is the default storage type and uses the same storage model as PostgreSQL. Heap storage performs best with small tables that are often updated. 

2. Append only storage is recommended for denormalized tables that are not updated after loading, as UPDATE and DELETE operations are not allowed for append only tables. 

3. Row oriented storage is recommended for OLTP types of workloads with iterative transactions where updates are required and frequent inserts are performed. Use row oriented storage when selects against the table are wide, where many columns of a single row is needed in a query. If the majority of columns in the SELECT list or WHERE clause is selected in queries, use a row oriented model. Row oriented storage is more efficient when many columns are selected at the same time, or when the row size of a table is relatively small. Use row oriented storage for general purpose or mixed workloads, as it offers the best combination of flexibility and performance.

4. Column oriented tables can offer optimal query performance on tables with many columns where only a small subset of columns is accessed by the queries. Column oriented storage is recommended for data warehouse analytic workloads where selects are narrow and with aggregations of data computed over a small number of columns, or for single columns that require regular updates without modifying other column data. Column oriented tables are best suited to queries that aggregate many values of a single column where the WHERE or HAVING predicate is also on the aggregate column. Re-creating a complete row in a wide columnar table is more time consuming than reading the same row from a heap table. However, queries that use fewer columns run much faster on columnar because the system is moving less (sometimes much less) data around. Column oriented storage is optimized for I/O but is not optimized for write operations as column values for a row must be written to different places on disk. 

Storage size compression optimizations are available in column oriented data that are not available in row oriented data. However keep in mind, the greater adjacent compression achieved, the more difficult random access can become, as data must be uncompressed to be read. Minimizing disk size is one factor, but consider the time and CPU required to compress, uncompress and scan the data. Find the optimal settings for efficiently compressing data without causing excessively long compression times or slow scan rates.
Comments