Greenplum Database New Features - Query Optimizer and Management of query execution
1. Improved Handling of SQL Queries (GPDB 4.3.2): Pivotal added a new Server configuration parameter to control the handling of SQL Queries that read from External Tables: Parameter name is
readable_external_table_timeout - When an SQL query reads from an external table, the parameter value specifies the amount of time in seconds that Greenplum Database waits before cancelling the query when data stops being returned from the external table.
The default value of 0, specifies no time out. Greenplum Database does not cancel the query.
If queries that use gpfdist run a long time and then return the error “intermittent network connectivity issues”, you can specify a value for readable_external_table_timeout. If no data is returned by gpfdist for the specified length of time, Greenplum Database cancels the query.
Query Optimizer (4.3.5.0)
Management of query execution(4.3.4)
Improved Handling of SQL Queries That Read From External Tables (4.3.2 )
2. Management of query execution (GPDB 4.3.4):
Pivotal introduced a new view session_level_memory_consumption. This view provides information about the current memory utilization for sessions that are running queries on Greenplum Database. The view contains session information and information such as the database that the session is connected to, the query that the session is currently running, and memory consumed by the session processes.
[gpadmin@localhost contrib]$ pwd
/usr/local/greenplum-db/share/postgresql/contrib
[gpadmin@localhost contrib]$ ls
fuzzystrmatch.sql gp_sfv_test.sql indexscan.sql uninstall_gp_distribution_policy.sql uninstall_orafunc.sql
gp_distribution_policy.sql gp_svec_test.sql orafunc.sql uninstall_gp_session_state.sql
gp_session_state.sql gp_workfile_mgr.sql uninstall_fuzzystrmatch.sql uninstall_gp_workfile_mgr.sql
[gpadmin@localhost contrib]$ psql -d sachi -f gp_session_state.sql
CREATE SCHEMA
SET
BEGIN
CREATE FUNCTION
GRANT
CREATE VIEW
GRANT
COMMIT
sachi=# select viewname from pg_views where viewname like '%session_level_memory_consumption%';
viewname
----------------------------------
session_level_memory_consumption
(1 row)
sachi=# \dv session_level_memory_consumption
No matching relations found.
sachi=# select schemaname,viewname from pg_views where viewname like '%session_level_memory_consumption%';
schemaname | viewname
---------------+----------------------------------
session_state | session_level_memory_consumption
(1 row)
sachi=# \dv session_state.session_level_memory_consumption
List of relations
Schema | Name | Type | Owner | Storage
---------------+----------------------------------+------+---------+---------
session_state | session_level_memory_consumption | view | gpadmin | none
(1 row)
sachi=# \d session_state.session_level_memory_consumption
View "session_state.session_level_memory_consumption"
Column | Type | Modifiers
---------------------+---------+-----------
datname | name |
sess_id | integer |
usename | name |
current_query | text |
segid | integer |
vmem_mb | integer |
is_runaway | boolean |
qe_count | integer |
active_qe_count | integer |
dirty_qe_count | integer |
runaway_vmem_mb | integer |
runaway_command_cnt | integer |
View definition: WITH all_entries AS ( SELECT c.segid, c.sessionid, c.vmem_mb, c.runaway_status, c.qe_count, c.active_qe_count, c.dirty_qe_count, c.runaway_vmem_mb, c.runaway_command_cnt
FROM ONLY gp_toolkit.__gp_localid, session_state.session_state_memory_entries_f() c(segid integer, sessionid integer, vmem_mb integer, runaway_status integer, qe_count integer, active_qe_count integer, dirty_qe_count integer, runaway_vmem_mb integer, runaway_command_cnt integer)
UNION ALL
SELECT c.segid, c.sessionid, c.vmem_mb, c.runaway_status, c.qe_count, c.active_qe_count, c.dirty_qe_count, c.runaway_vmem_mb, c.runaway_command_cnt
FROM ONLY gp_toolkit.__gp_masterid, session_state.session_state_memory_entries_f() c(segid integer, sessionid integer, vmem_mb integer, runaway_status integer, qe_count integer, active_qe_count integer, dirty_qe_count integer, runaway_vmem_mb integer, runaway_command_cnt integer)
)
SELECT s.datname, m.sessionid AS sess_id, s.usename, s.current_query, m.segid, m.vmem_mb,
CASE
WHEN m.runaway_status = 0 THEN false
ELSE true
END AS is_runaway, m.qe_count, m.active_qe_count, m.dirty_qe_count, m.runaway_vmem_mb, m.runaway_command_cnt
FROM all_entries m
LEFT JOIN pg_stat_activity s ON m.sessionid = s.sess_id;
sachi=# select * from session_state.session_level_memory_consumption;
datname | sess_id | usename | current_query | segid | vmem_mb | is_runaway | qe_count | active_qe_count | d
irty_qe_count | runaway_vmem_mb | runaway_command_cnt
---------+---------+---------+---------------------------------------------------------------+-------+---------+------------+----------+-----------------+--
--------------+-----------------+---------------------
sachi | 8564 | sachi | select * from session_state.session_level_memory_consumption; | 0 | 1 | f | 1 | 1 |
-1 | 0 | 0
sachi | 8564 | sachi | select * from session_state.session_level_memory_consumption; | 1 | 1 | f | 1 | 1 |
-1 | 0 | 0
sachi | 8564 | sachi | select * from session_state.session_level_memory_consumption; | -1 | 4 | f | 1 | 1 |
-1 | 0 | 0
(3 rows)
The Greenplum Database server configuration parameter runaway_detector_activation_percent controls the percentage of Greenplum Database vmem memory that is utilized that triggers the termination of queries. If the percentage of vmem memory that is utilized for a Greenplum Database segment exceeds the specified value, Greenplum Database terminates queries based on memory usage, starting with the query consuming the largest amount of memory. Queries are terminated until the percentage of utilized vmem is below the specified percentage.
3. Query Optimizer (4.3.5.0): Pivotal Query Optimizer which was introduced in GPDB 4.3.5.0 extends the planning and optimization capabilities of Greenplum Database.
Note that PQO includes enhancements for specific types of queries and operations:
Queries against partitioned tables
Queries that contain sub-queries
Queries that contain a common table expression (CTE)
DML operations
PQO contains these optimization enhancements:
Improved join ordering
Join-Aggregate reordering
Sort order optimization
Data skew estimates included in query optimization
Note: By default, Greenplum Database uses the legacy query optimizer.
sachi=# show optimizer;
optimizer
-----------
off
(1 row)
sachi=#
To enable or disable PQO use following command.
ALTER DATABASE <databasename> SET OPTIMIZER = OFF
sachi=# alter database sachi SET OPTIMIZER =ON;
ALTER DATABASE
sachi=# show optimizer;
optimizer
-----------
off
(1 row)
sachi=# \q
[sachi@localhost ~]$ su - gpadmin
Password:
Last login: Sun May 24 10:31:15 EDT 2015 on pts/0
[gpadmin@localhost ~]$ cd /gpmaster/
[gpadmin@localhost gpmaster]$ ls
gpsne-1
[gpadmin@localhost gpmaster]$ cd gpsne-1/
[gpadmin@localhost gpsne-1]$ gpstop -r
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Starting gpstop with args: -r
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Gathering information and validating the environment...
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Obtaining Greenplum Master catalog information
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Obtaining Segment details from master...
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.5.1 build 1'
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:---------------------------------------------
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Master instance parameters
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:---------------------------------------------
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- Master Greenplum instance process active PID = 16742
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- Database = template1
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- Master port = 5432
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- Master directory = /gpmaster/gpsne-1
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- Shutdown mode = smart
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- Timeout = 120
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- Shutdown Master standby host = Off
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:---------------------------------------------
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:-Segment instances that will be shutdown:
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:---------------------------------------------
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- Host Datadir Port Status
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- localhost.localdomain /gpdata1/gpsne0 40000 u
20150524:10:58:03:006794 gpstop:localhost:gpadmin-[INFO]:- localhost.localdomain /gpdata2/gpsne1 40001 u
Continue with Greenplum instance shutdown Yy|Nn (default=N):
> y
20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-There are 0 connections to the database
20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-Commencing Master instance shutdown with mode='smart'
20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-Master host=localhost.localdomain
20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-Commencing Master instance shutdown with mode=smart
20150524:10:58:05:006794 gpstop:localhost:gpadmin-[INFO]:-Master segment instance directory=/gpmaster/gpsne-1
20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-Attempting forceful termination of any leftover master process
20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-Terminating processes for segment /gpmaster/gpsne-1
20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-No standby master host configured
20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-Commencing parallel segment instance shutdown, please wait...
20150524:10:58:06:006794 gpstop:localhost:gpadmin-[INFO]:-0.00% of jobs completed
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-100.00% of jobs completed
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-----------------------------------------------------
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:- Segments stopped successfully = 2
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:- Segments with errors during stop = 0
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-----------------------------------------------------
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-Successfully shutdown 2 of 2 segment instances
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-Database successfully shutdown with no errors reported
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-Cleaning up leftover gpmmon process
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-No leftover gpmmon process found
20150524:10:58:16:006794 gpstop:localhost:gpadmin-[INFO]:-Cleaning up leftover gpsmon processes
20150524:10:58:17:006794 gpstop:localhost:gpadmin-[INFO]:-No leftover gpsmon processes on some hosts. not attempting forceful termination on these hosts
20150524:10:58:17:006794 gpstop:localhost:gpadmin-[INFO]:-Cleaning up leftover shared memory
20150524:10:58:22:006794 gpstop:localhost:gpadmin-[INFO]:-Restarting System...
[gpadmin@localhost gpsne-1]$ psql -d sachi
psql (8.2.15)
Type "help" for help.
sachi=# show optimizer;
optimizer
-----------
on
(1 row)
sachi=#
When the Pivotal Query Optimizer is enabled, Greenplum Database uses the Pivotal Query Optimizer to generate an execution plan for a query when possible. If the Pivotal Query Optimizer cannot be used, the legacy query optimizer is used.
Notes: These notes apply when the Pivotal Query Optimizer is enabled:
1. If you intend to execute queries on partitioned tables with the Pivotal Query Optimizer enabled, the server configuration parameter optimizer_analyze_root_partition must be set to on and you must collect statistics on the partitioned table root partition.
2. For Greenplum Command Center monitoring performance, Pivotal recommends the default setting for Pivotal Query Optimizer (off) for the gpperfmon database that is used by Greenplum Command Center. Enabling Pivotal Query Optimizer for the gpperfmon database is not supported. To ensure that the Pivotal Query Optimizer is disabled for thegpperfmon database, run this command on the system where the database is installed:
ALTER DATABASE gpperfmon SET OPTIMIZER = OFF