How to fix function based index that is not working in Greenplum

Post date: Oct 05, 2012 10:29:39 PM

The optimizer of Postgres/Greenplum is cost based and selects the execution plan with the lowest cost. In Greenplum, the cost for one scattered IO access is measured by the random_page_cost parameter, which is 100 by default. Comparing to 1 in postgres, it is quite large. It is being set in this way because we want to choose seq scan as frequently as possible in most cases.

For some special cases, the random_page_cost change could be used to make the execution plan manually optimized.

Example:

gpadmin=# create table t as select generate_series x from generate_series(1,10000);

NOTICE: Table doesn't have 'DISTRIBUTED BY' clause -- Using column(s) named 'x' as the Greenplum Database data distribution key for this table.

HINT: The 'DISTRIBUTED BY' clause determines the distribution of data. Make sure column(s) chosen are the optimal data distribution key to minimize skew.

SELECT 10000

gpadmin=# create index f_idx on t( lower(x));

CREATE INDEX

gpadmin=# explain select * from t where lower(x) = '1';

QUERY PLAN

------------------------------------------------------------------------------

Gather Motion 8:1 (slice1; segments: 8) (cost=0.00..191.00 rows=2 width=4)

-> Seq Scan on t (cost=0.00..191.00 rows=2 width=4)

Filter: lower(x::text) = '1'::text

(3 rows)

gpadmin=# set random_page_cost to 1;

SET

gpadmin=# explain select * from t where lower(x) = '1';

QUERY PLAN

----------------------------------------------------------------------------

Gather Motion 8:1 (slice1; segments: 8) (cost=0.00..9.43 rows=2 width=4)

-> Index Scan using f_idx on t (cost=0.00..9.43 rows=2 width=4)

Index Cond: lower(x::text) = '1'::text

Settings: random_page_cost=1

(4 rows)