Analytic, Grouping and Stored functions in Greenplum

GPDB supports a number of built-in aggregate functions. GPDB performs better when dealing with operations over a set of records. 1 query plan for the whole set of records versus 1 main query plan + 1 per row

All are extensions to the standard GROUP BY clause

GROUP BY ROLLUP ( col1, col2, col3 )

GROUP BY CUBE ( col1, col2, col3 )

GROUP BY GROUPING SETS( (c1, c2), (c1, c3))

GPDB supports analytical and window functions. These are used to apply an aggregation over partitions of the result set. For example, sum( population ) over ( partition by city )

They can also be used to produce row numbers which can be of use. For example, row_number() over ( order by id )

These are powerful and perform all of the work ‘in database’ providing speed advantages over front-end tools

Aggregate (traditional) query

select a.clientID, a.amount, a.amount/b.sum_amount

from orders a inner join (select clientid, sum(amount) sum_amount from orders group by clientid) b

on a.clientid = b.clientid

Analytical query

select a.clientid, a.amount, a.amount/(sum(a.amount) over (partition by a.clientid)) , a.amount/(sum(a.amount) over ()) from orders a

Note: “roll-up” functionality (by using over() )

Window functions

The WINDOW clause is used to define a window that can be used in the OVER() expression of a window function such as rank or avg. For information on OLAP extensions and Window Functions refer to the GPDB Reference Guide. The WINDOW clause is used to define a window that can be used in the OVER() expression of a window function such as rank or avg. For information on OLAP extensions and Window Functions refer to the GPDB Reference Guide.

select a.clientid, first_value(a.purchase_date) over (partition by a.client_id order by a.purchase_date), sum(a.amount) over (partition by a.client_id order by a.purchase_date rows unbounded preceding) from orders a

ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW - the same as ROWS UNBOUNDED PRECEDING

ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING