Using Indexes to improve performance

Post date: Aug 25, 2014 7:7:44 PM

In most data warehouse and analytic environments queries operate on large volumes of data. In Greenplum database a sequential scan is an efficient method to read data as each segment contains a portion of the data and all segments read in parallel. 

For queries with high selectivity, indexes may improve query performance. 

If it is determined that indexes are needed consider best practices when creating and managing indexes. It is important to avoid indexes on columns that are frequently updated. Creating an index on a column that is frequently updated increases the number of writes required when the column is updated. Avoid overlapping indexes; indexes that have the same leading column are redundant. 

Always drop indexes before loading data into a table. After the load re-create the indexes for the table. This will run an order of magnitude faster than loading data into a table with indexes. 

Indexes can improve performance on compressed append-only tables for queries that return a targeted set of rows. For compressed data, an index access method results in only the necessary rows being uncompressed. Create selective B-tree indexes; single row lookups are good candidates. Index selectivity is a ratio of the number of distinct values a column has divided by the number of rows in a table. 

For example, if a table has 1000 rows and a column has 800 distinct values, the selectivity of the index is 0.8, which is considered good. Unique indexes always have a selectivity ratio of 1.0, which is the best possible. PDB allows unique indexes only on distribution key columns. 

Bitmap indexes are best suited on columns for querying than updating. Use Bitmap indexes for low cardinality/selectivity columns. Bitmap indexes perform best when the column has a low cardinality from 100 to 100,000 distinct values. Do not use Bitmap indexes for unique columns, very high cardinality data (for example, phone numbers), or for very low cardinality data (for example, gender). Do not use Bitmap indexes for OLTP workloads.

In some scenarios it may be required to force the use of indexes for a query. The following GUCs should be either set at the session or query level to prevent other query performance from being affected. 

1. random_page_cost sets the query planner’s estimate of the cost of a non-sequentially fetched disk page. A lower value increases the opportunity for an index scan to be chosen. The default value is 100.

2. enable_indexscan enables or disables the query planner’s use of index-scan plan types. The default value is ON. 

3. enable_nestloop enables or disables the query planner’s use of nested-loop join plans. This should be enabled for using indexes in nested loop joins. The default value is OFF. 

4. enable_bitmapscan enables or disables the query planner’s use of bitmap-scan plan types. Generally a bitmap scan provides faster access, however it can be disabled specifically if very few rows are selected from the index. The default value is ON. 

5. enable_seqscan disabling sequential scans results in use of index scan. This GUC should be used with great care and only as a last resort. The default value is ON. 

Use an iterative approach to tuning the GUCs to favor index usage. Begin by setting or confirming that enable_indexscan is set to ON and for joins via an index lookup enable_nestloop is set to ON. 

Start by lowering random_page_cost to 20 and if the index is still not used, set it to 10. If this does not result in index usage increase seq_page_cost to 10 and if the index is still not used, set it to 15. If this does not result in index usage finally as a last resort, set enable_seqscan to OFF.