Greenplum Database New Features - Query Optimizer and Management of query execution

1. Improved Handling of SQL Queries (GPDB 4.3.2): Pivotal added a new Server configuration parameter to control the handling of SQL Queries that read from External Tables: Parameter name is

readable_external_table_timeout - When an SQL query reads from an external table, the parameter value specifies the amount of time in seconds that Greenplum Database waits before cancelling the query when data stops being returned from the external table.

The default value of 0, specifies no time out. Greenplum Database does not cancel the query.

If queries that use gpfdist run a long time and then return the error “intermittent network connectivity issues”, you can specify a value for readable_external_table_timeout. If no data is returned by gpfdist for the specified length of time, Greenplum Database cancels the query.

2. Management of query execution (GPDB 4.3.4):  

Pivotal introduced a new view session_level_memory_consumption. This view provides information about the current memory utilization for sessions that are running queries on Greenplum Database. The view contains session information and information such as the database that the session is connected to, the query that the session is currently running, and memory consumed by the session processes.

[gpadmin@localhost contrib]$ pwd

/usr/local/greenplum-db/share/postgresql/contrib

[gpadmin@localhost contrib]$ ls

fuzzystrmatch.sql           gp_sfv_test.sql      indexscan.sql                uninstall_gp_distribution_policy.sql  uninstall_orafunc.sql

gp_distribution_policy.sql  gp_svec_test.sql     orafunc.sql                  uninstall_gp_session_state.sql

gp_session_state.sql        gp_workfile_mgr.sql  uninstall_fuzzystrmatch.sql  uninstall_gp_workfile_mgr.sql

[gpadmin@localhost contrib]$ psql -d sachi -f gp_session_state.sql

CREATE SCHEMA

SET

BEGIN

CREATE FUNCTION

GRANT

CREATE VIEW

GRANT

COMMIT

sachi=#  select viewname from pg_views where viewname like '%session_level_memory_consumption%';

             viewname             

----------------------------------

 session_level_memory_consumption

(1 row)

sachi=# \dv session_level_memory_consumption

No matching relations found.

sachi=# select schemaname,viewname from pg_views where viewname like '%session_level_memory_consumption%';

  schemaname   |             viewname             

---------------+----------------------------------

 session_state | session_level_memory_consumption

(1 row)

sachi=# \dv session_state.session_level_memory_consumption

                              List of relations

    Schema     |               Name               | Type |  Owner  | Storage 

---------------+----------------------------------+------+---------+---------

 session_state | session_level_memory_consumption | view | gpadmin | none

(1 row)

sachi=# \d session_state.session_level_memory_consumption

View "session_state.session_level_memory_consumption"

Column | Type | Modifiers

---------------------+---------+-----------

datname | name |

sess_id | integer |

usename | name |

current_query | text |

segid | integer |

vmem_mb | integer |

is_runaway | boolean |

qe_count | integer |

active_qe_count | integer |

dirty_qe_count | integer |

runaway_vmem_mb | integer |

runaway_command_cnt | integer |

View definition: WITH all_entries AS ( SELECT c.segid, c.sessionid, c.vmem_mb, c.runaway_status, c.qe_count, c.active_qe_count, c.dirty_qe_count, c.runaway_vmem_mb, c.runaway_command_cnt

FROM ONLY gp_toolkit.__gp_localid, session_state.session_state_memory_entries_f() c(segid integer, sessionid integer, vmem_mb integer, runaway_status integer, qe_count integer, active_qe_count integer, dirty_qe_count integer, runaway_vmem_mb integer, runaway_command_cnt integer)

UNION ALL

SELECT c.segid, c.sessionid, c.vmem_mb, c.runaway_status, c.qe_count, c.active_qe_count, c.dirty_qe_count, c.runaway_vmem_mb, c.runaway_command_cnt

FROM ONLY gp_toolkit.__gp_masterid, session_state.session_state_memory_entries_f() c(segid integer, sessionid integer, vmem_mb integer, runaway_status integer, qe_count integer, active_qe_count integer, dirty_qe_count integer, runaway_vmem_mb integer, runaway_command_cnt integer)

)

SELECT s.datname, m.sessionid AS sess_id, s.usename, s.current_query, m.segid, m.vmem_mb,

CASE

WHEN m.runaway_status = 0 THEN false

ELSE true

END AS is_runaway, m.qe_count, m.active_qe_count, m.dirty_qe_count, m.runaway_vmem_mb, m.runaway_command_cnt

FROM all_entries m

LEFT JOIN pg_stat_activity s ON m.sessionid = s.sess_id;

sachi=# select * from session_state.session_level_memory_consumption;

datname | sess_id | usename | current_query | segid | vmem_mb | is_runaway | qe_count | active_qe_count | d

irty_qe_count | runaway_vmem_mb | runaway_command_cnt

---------+---------+---------+---------------------------------------------------------------+-------+---------+------------+----------+-----------------+--

--------------+-----------------+---------------------

sachi | 8564 | sachi | select * from session_state.session_level_memory_consumption; | 0 | 1 | f | 1 | 1 |

-1 | 0 | 0

sachi | 8564 | sachi | select * from session_state.session_level_memory_consumption; | 1 | 1 | f | 1 | 1 |

-1 | 0 | 0

sachi | 8564 | sachi | select * from session_state.session_level_memory_consumption; | -1 | 4 | f | 1 | 1 |

-1 | 0 | 0

(3 rows)

The Greenplum Database server configuration parameter runaway_detector_activation_percent controls the percentage of Greenplum Database vmem memory that is utilized that triggers the termination of queries. If the percentage of vmem memory that is utilized for a Greenplum Database segment exceeds the specified value, Greenplum Database terminates queries based on memory usage, starting with the query consuming the largest amount of memory. Queries are terminated until the percentage of utilized vmem is below the specified percentage.

3. Query Optimizer (4.3.5.0):  Pivotal Query Optimizer which was introduced in GPDB 4.3.5.0 extends the planning and optimization capabilities of Greenplum Database.

Note that PQO includes enhancements for specific types of queries and operations:

PQO contains these optimization enhancements:

Note: By default, Greenplum Database uses the legacy query optimizer. 

sachi=# show optimizer;

 optimizer 

-----------

 off

(1 row)

sachi=# 

To enable or disable PQO use following command.

ALTER DATABASE <databasename> SET OPTIMIZER = OFF

sachi=# alter database sachi  SET OPTIMIZER =ON;

ALTER DATABASE

sachi=# show optimizer;

 optimizer 

-----------

 off

(1 row)

sachi=# \q

[sachi@localhost ~]$ su - gpadmin

Password: 

Last login: Sun May 24 10:31:15 EDT 2015 on pts/0

[gpadmin@localhost ~]$ cd /gpmaster/

[gpadmin@localhost gpmaster]$ ls

gpsne-1

[gpadmin@localhost gpmaster]$ cd gpsne-1/

[gpadmin@localhost gpsne-1]$ gpstop -r

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Starting gpstop with args: -r

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Gathering information and validating the environment...

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Obtaining Segment details from master...

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.5.1 build 1'

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:---------------------------------------------

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Master instance parameters

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:---------------------------------------------

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   Master Greenplum instance process active PID   = 16742

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   Database                                       = template1

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   Master port                                    = 5432

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   Master directory                               = /gpmaster/gpsne-1

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   Shutdown mode                                  = smart

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   Timeout                                        = 120

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   Shutdown Master standby host                   = Off

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:---------------------------------------------

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Segment instances that will be shutdown:

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:---------------------------------------------

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   Host                    Datadir           Port    Status

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   localhost.localdomain   /gpdata1/gpsne0   40000   u

20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-   localhost.localdomain   /gpdata2/gpsne1   40001   u

Continue with Greenplum instance shutdown Yy|Nn (default=N):

> y

20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-There are 0 connections to the database

20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'

20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-Master host=localhost.localdomain

20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=smart

20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-Master segment instance directory=/gpmaster/gpsne-1

20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process

20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-Terminating processes for segment /gpmaster/gpsne-1

20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-No standby master host configured

20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...

20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-0.00% of jobs completed

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-100.00% of jobs completed

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-----------------------------------------------------

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-   Segments stopped successfully      = 2

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-   Segments with errors during stop   = 0

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-----------------------------------------------------

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-Successfully shutdown 2 of 2 segment instances 

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-Database successfully shutdown with no errors reported

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-Cleaning up leftover gpmmon process

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-No leftover gpmmon process found

20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-Cleaning up leftover gpsmon processes

20150524:10:58:17:006794 gpstop:localhost:gpadmin-[INFO]:-No leftover gpsmon processes on some hosts. not attempting forceful termination on these hosts

20150524:10:58:17:006794 gpstop:localhost:gpadmin-[INFO]:-Cleaning up leftover shared memory

20150524:10:58:22:006794 gpstop:localhost:gpadmin-[INFO]:-Restarting System...

[gpadmin@localhost gpsne-1]$ psql -d sachi

psql (8.2.15)

Type "help" for help.

sachi=# show optimizer;

 optimizer 

-----------

 on

(1 row)

sachi=# 

When the Pivotal Query Optimizer is enabled, Greenplum Database uses the Pivotal Query Optimizer to generate an execution plan for a query when possible. If the Pivotal Query Optimizer cannot be used, the legacy query optimizer is used.

Notes: These notes apply when the Pivotal Query Optimizer is enabled:

1. If you intend to execute queries on partitioned tables with the Pivotal Query Optimizer enabled, the server configuration parameter optimizer_analyze_root_partition must be set to on and you must collect statistics on the partitioned table root partition.

2. For Greenplum Command Center monitoring performance, Pivotal recommends the default setting for Pivotal Query Optimizer (off) for the gpperfmon database that is used by Greenplum Command Center. Enabling Pivotal Query Optimizer for the gpperfmon database is not supported. To ensure that the Pivotal Query Optimizer is disabled for thegpperfmon database, run this command on the system where the database is installed:

ALTER DATABASE gpperfmon SET OPTIMIZER = OFF