Best practice for creating indexes in greenplum

Post date: Nov 27, 2013 2:47:40 AM

Consider the following points when you create indexes.

Do's

1. Query Workload. Indexes improve performance for workloads where queries return a single record or a very small data set, such as OLTP workloads.

2. Compressed Tables. Indexes can improve performance on compressed append-only tables for queries that return a targeted set of rows. For compressed

data, an index access method means only the necessary rows are uncompressed.

3. Create selective B-tree indexes. Index selectivity is a ratio of the number of distinct values a column has divided by the number of rows in a table. For example, if a table has 1000 rows and a column has 800 distinct values, the selectivity of the index is 0.8, which is considered good. Unique indexes always have a selectivity ratio of 1.0, which is the best possible. Greenplum Database allows unique indexes only on distribution key columns.

4. Use Bitmap indexes for low selectivity columns. Bitmap indexes are best suited to data warehousing applications and decision support systems with large amounts of data, many ad hoc queries, and few data modification (DML) transactions. Bitmap indexes perform best for columns that have between 100 and 100,000 distinct values and when the indexed column is often queried in conjunction with other indexed columns. Columns with fewer than 100 distinct values, such as a gender column with two distinct values (male and female), usually do not benefit much from any type of index. On a column with more than 100,000 distinct values, the performance and space efficiency of a bitmap index decline.

5. Index columns used in joins. An index on a column used for frequent joins (such as a foreign key column) can improve join performance by enabling more join

methods for the query planner to use.

6. Index columns frequently used in predicates. Columns that are frequently referenced in WHERE clauses are good candidates for indexes.

7. Drop indexes for bulk loads. For mass loads of data into a table, consider dropping the indexes and re-creating them after the load completes. This is often

faster than updating the indexes.

8. Consider a clustered index. Clustering an index means that the records are physically ordered on disk according to the index. If the records you need are

distributed randomly on disk, the database has to seek across the disk to fetch the records requested. If the records are stored close together, the fetching operation is more efficient. For example, a clustered index on a date column where the data is ordered sequentially by date. A query against a specific date range results in an

ordered fetch from the disk, which leverages fast sequential access.

Don'ts

1. Avoid overlapping indexes. Indexes that have the same leading column are redundant.

2. Avoid indexes on frequently updated columns. Creating an index on a column that is frequently updated increases the number of writes required when the

column is updated.