Greenplum Workload Management

The purpose of workload management is to limit the number of active queries in the system at any given time in order to avoid exhausting system resources such as memory, CPU, and disk I/O. This is accomplished in Greenplum Database with role-based resource queues. A resource queue has attributes that limit the size and/or total number of queries that can be executed by the users (or roles) in that queue. Also, you can assign a priority level that controls the relative share of available CPU used by queries associated with the resource queue. By assigning all database roles to the appropriate resource queue, administrators can control concurrent user queries and prevent the system from being overloaded.

What is Resource Queues and How it works?  

Greenplum recommends that administrators create resource queues for the various types of workloads in their organization. For example, you may have resource queues for power users, web users, and management reports. You would then set limits on the resource queue based your estimate of how resource-intensive the queries associated with that workload are likely to be. Currently, the configurable limits on a queue include:

After resource queues are created, database roles (users) are then assigned to the appropriate resource queue. A resource queue can have multiple roles, but a role can have only one assigned resource queue.

Types of Queries Evaluated for Resource Queues

Not all SQL statements submitted through a resource queue are evaluated against the queue limits. By default only SELECT, SELECT INTO, CREATE TABLE AS SELECT, and DECLARE CURSOR statements are evaluated. If the server configuration parameter resource_select_only is set to off, then INSERT, UPDATE, and DELETE statements will be evaluated as well.

Resource scheduling is enabled by default when you install Greenplum Database. All database roles must be assigned to a resource queue. If an administrator creates a role without explicitly assigning it to a resource queue, the role is assigned to the default resource queue, pg_default.

Resource Queue planning is often neglected part in Greenplum configuration. It may be due to not so clear Greenplum documentation on Resource Queue or not properly understanding the application requirement for Resource Queue.

Resource Queue basically makes your ad-hoc query load on the system more predictable. You must give some amount of time in planning for resource queue and this will surely pay you in the long run.

Resource Queue can be defined in three ways ( Greenplum documentation mentions about only two ways ) :

Greenplum versions prior to 3.3 had limitation on Active Statement. If you make it more than 3 for any queue, then it will give you infrequent errors. Version 3.3 seems to have rectified this bug and it works fine for Active Statement threshold of 4.

If you know the range of cost your most queries have, then you can put the Cost Threshold on the queue. But this can lead to many queries executing at the same time if their total coast is within the Cost Threshold limit. I have found this type of resource Queue is not optimal if you have 4-5 Resource Queues in your system. This can lead to system performance issue if all Resource Queues are being used to their threshold cost value.

Resource Queue setup using Active Statement and Cost Threshold gives the best performance. Suppose Resource Queue has been defined with Active Statement of 3 and cost threshold of 5 million. Following are the scenarios :-

Three statements are executing in the queue with their total cost to 3 million. A fourth query comes which has a cost of 1.1 million. The combined cost of these four queries is 4.1 million which is less than the threshold cost of 5 million, but since three queries are already running and this fourth query is crossing the threshold value of 3 for Active Statement, this query will wait in the queue.

Two queries are executing in the queue with total cost of 3.5 million. A third query comes with cost of 2 million. The combined cost of three query will be 5.5 million which will cross the threshold cost limit of 5 million. So this query will wait in the queue even though only 2 queries are executing but Active Statement Threshold is 3.

When planning Resource Queue, you need to consider for smaller queries that need not wait in Queue. I treat a query a small query if it executes in around one minute. To configure Resource Queue to allow smaller bypass the queue, you need to provide Ignore Cost value while defining of alerting the Resource Queue. Do not make it too small. Determine the value by getting cost of query which executed in one minute. This value can be in the range of 200,000 t0 300,000.

Other important parameter of Resource Queue is Overcommit and Noovercommit. During ad-hoc query window, you should alter the resource queue to make it Noovercommit so that database resources can not be hogged by one bad query.

For a medium sized Datawarehouse, you should have 3-4 resource queues defined and users should be distributed across these resource queues.

Finally, to make Cost Threshold work, you need to Analyze and Vacuum your tables regularly. Fast changing tables should be analyzed and Vacuumed daily.

How Resource Queue Limits Work

At runtime, when the user submits a query for execution, that query is evaluated against the resource queue’s limits. If the query does not cause the queue to exceed its resource limits, then that query will run immediately. If the query causes the queue to exceed its limits (for example, if the maximum number of active statement slots are currently in use), then the query must wait until queue resources are free before it can run. Queries are evaluated on a first in, first out basis. If query prioritization is enabled, the active workload on the system is periodically assessed and processing resources are reallocated according to query priority Roles with the SUPERUSER attribute are always exempt from resource queue limits. Superuser queries are always allowed to run immediately regardless of the limits of their assigned resource queue.

How Memory Limits Work

Setting a memory limit on a resource queue sets the maximum amount of memory that all queries submitted through the queue can consume on a segment host. The amount of memory allotted to a particular query is based on the queue memory limit divided by the active statement limit (Greenplum recommends that memory limits be used in conjunction with statement-based queues rather than cost-based queues). For example, if a queue has a memory limit of 2000MB and an active statement limit of 10, each query submitted through the queue is allotted 200MB of memory by default. The default memory allotment can be overridden on a per-query basis using the statement_mem server configuration parameter (up to the queue memory limit). Once a query has started executing, it holds its allotted memory in the queue until it completes (even if during execution it actually consumes less than its allotted amount of memory).

How Priorities Work

Resource limits on active statement count, memory and query cost are admission limits, which determine whether a query is admitted into the group of actively running statements, or whether it is queued with other waiting statements. After a query becomes active, it must share available CPU resources as determined by the priority settings for its resource queue. When a statement from a high-priority queue enters the group of actively running statements, it may claim a significant share of the available CPU, reducing the share allotted to already-running statements.

The comparative size or complexity of the queries does not affect the allotment of CPU. If a simple, low-cost query is running simultaneously with a large, complex query, and their priority settings are the same, they will be allotted the same share of available CPU resources. When a new query becomes active, the exact percentage shares of CPU will be recalculated, but queries of equal priority will still have equal amounts of CPU allotted.

For example, an administrator creates three resource queues: adhoc for ongoing queries submitted by business analysts, reporting for scheduled reporting jobs, and executive for queries submitted by executive user roles. The administrator wants to ensure that scheduled reporting jobs are not heavily affected by unpredictable resource demands from ad-hoc analyst queries. Also, the administrator wants to make sure that queries submitted by executive roles are allotted a significant share of CPU. Accordingly, the resource queue priorities are set as shown:

•adhoc — Low priority

•reporting — High priority

•executive — Maximum priority

At runtime, the CPU share of active statements is determined by these priority settings. If queries 1 and 2 from the reporting queue are running simultaneously, they have equal shares of CPU. When an ad-hoc query becomes active, it claims a smaller share of CPU. The exact share used by the reporting queries is adjusted, but remains equal due to their equal priority setting.

Steps to Enable Workload Management

Enabling and using workload management in Greenplum Database involves the following high-level tasks:

1.Creating the resource queues and setting limits on them. 

2.Assigning a queue to one or more user roles. 

3.Using the workload management system views to monitor and manage the resource queues. 

Configuring Workload Management

Resource scheduling is enabled by default when you install Greenplum Database, and is required for all roles. The default resource queue, pg_default, has an active statement limit of 20, no cost limit, no memory limit, and a medium priority setting. Greenplum recommends that you create resource queues for the various types of

To configure workload management

1.The following parameters are for the general configuration of resource queues:

2.The following parameters are related to memory utilization:

3.The following parameters are related to query prioritization. Note that the following parameters are all local parameters, meaning they must be set in the postgresql.conf files of the master and all segments:

            Master and standby master 

            gp_resqueue_priority_cpucores_per_segment = 8

       

           Segment hosts

            gp_resqueue_priority_cpucores_per_segment = 2

        Important: If you have fewer than one segment per CPU core on your segment hosts, make sure you adjust this value accordingly. An improperly low value for this parameter can result in under-utilization of CPU resources.

4.If you wish to view or change any of the workload management parameter values, you can use the gpconfig utility.

5.For example, to see the setting of a particular parameter:

$ gpconfig --show gp_vmem_protect_limit

6.For example, to set one value on all segments and a different value on the master:

$ gpconfig -c gp_resqueue_priority_cpucores_per_segment -v 2 -m 8

7.Restart Greenplum Database to make the configuration changes effective:

$ gpstop -r

Creating Resource Queues

Creating a resource queue involves giving it a name and setting either a query cost limit or an active query limit (or both), and optionally a query priority on the resource queue. Use the CREATE RESOURCE QUEUE command to create new resource queues.

Creating Queues with an Active Query Limit

Resource queues with an ACTIVE_STATEMENTS setting limit the number of queries that can be executed by roles assigned to that queue. For example, to create a resource queue named adhoc with an active query limit of three:

=# CREATE RESOURCE QUEUE adhoc WITH (ACTIVE_STATEMENTS=3);

This means that for all roles assigned to the adhoc resource queue, only three active queries can be running on the system at any given time. If this queue has three queries running, and a fourth query is submitted by a role in that queue, that query must wait until a slot is free before it can run.

Creating Queues with Memory Limits

Resource queues with a MEMORY_LIMIT setting control the amount of memory that all queries submitted through the queue that should not exceed the amount of physical memory of a segment host. This limit determines the total amount of memory that all worker processes of a query can consume and should not exceed the amount of physical memory of a segment host during query execution. If workloads are staggered over multiple queues, memory allocations can be oversubscribed. However, queries can be cancelled during execution if the segment host memory limit specified in gp_vmem_protect_limit is exceeded.

When used in conjunction with ACTIVE_STATEMENTS, the default amount of memory allotted per query is: MEMORY_LIMIT / ACTIVE_STATEMENTS. When used in conjunction with MAX_COST, the default amount of memory allotted per query is: MEMORY_LIMIT * (query_cost / MAX_COST). Greenplum recommends that MEMORY_LIMIT be used in conjunction with ACTIVE_STATEMENTS rather than with MAX_COST.

For example, to create a resource queue with an active query limit of 10 and a total memory limit of 2000MB (each query will be allocated 200MB of segment host memory at execution time):

=# CREATE RESOURCE QUEUE myqueue WITH (ACTIVE_STATEMENTS=20, MEMORY_LIMIT='2000MB');

The default memory allotment can be overridden on a per-query basis using the statement_mem server configuration parameter, provided that MEMORY_LIMIT or max_statement_mem is not exceeded. For example, to allocate more memory to a particular query:

=> SET statement_mem='2GB';

=> SELECT * FROM my_big_table WHERE column='value' ORDER BY id;

=> RESET statement_mem;

As a general guideline, MEMORY_LIMIT for all of your resource queues should not exceed the amount of physical memory of a segment host. If workloads are staggered over multiple queues, it may be OK to oversubscribe memory allocations, keeping in mind that queries may be cancelled during execution if the segment host memory limit (gp_vmem_protect_limit) is exceeded.

Creating Queues with a Query Planner Cost Limits

Resource queues with a MAX_COST setting limit the total cost of queries that can be executed by roles assigned to that queue. Cost is specified as a floating point number (for example 100.0) or can also be specified as an exponent (for example 1e+2).

Cost is measured in the estimated total cost for the query as determined by the Greenplum query planner (as shown in the EXPLAIN output for a query). Therefore, an administrator must be familiar with the queries typically executed on the system in order to set an appropriate cost threshold for a queue. Cost is measured in units of disk page fetches; 1.0 equals one sequential disk page read.

For example, to create a resource queue named webuser with a query cost limit of 100000.0 (1e+5):

=# CREATE RESOURCE QUEUE webuser WITH (MAX _COST=100000.0);

or

=# CREATE RESOURCE QUEUE webuser WITH (MAX _COST=1e+5);

This means that for all roles assigned to the webuser resource queue, it will only allow queries into the system until the cost limit of 100000.0 is reached. So for example, if this queue has 200 queries with a 500.0 cost all running at the same time, and query 201 with a 1000.0 cost is submitted by a role in that queue, that query must wait until space is free before it can run.

Allowing Queries to Run on Idle Systems

If a resource queue is limited based on a cost threshold, then the administrator can allow COST_OVERCOMMIT (the default). Resource queues with a cost threshold and overcommit enabled will allow a query that exceeds the cost threshold to run, provided that there are no other queries in the system at the time the query is submitted. The cost threshold will still be enforced if there are concurrent workloads on the system.

If COST_OVERCOMMIT is false, then queries that exceed the cost limit will always be rejected and never allowed to run.

Allowing Small Queries to Bypass Queue Limits

Workloads may have certain small queries that administrators want to allow to run without taking up an active statement slot in the resource queue. For example, simple queries to look up metadata information in the system catalogs do not typically require significant resources or interfere with query processing on the segments. An administrator can set MIN_COST to denote a query planner cost associated with a small query. Any query that falls below the MIN_COST limit will be allowed to run immediately. MIN_COST can be used on resource queues with either an active statement or a maximum query cost limit. For example:

=# CREATE RESOURCE QUEUE adhoc WITH (ACTIVE_STATEMENTS=10, MIN_COST=100.0);

Setting Priority Levels

To control a resource queue’s consumption of available CPU resources, an administrator can assign an appropriate priority level. When high concurrency causes contention for CPU resources, queries and statements associated with a high-priority resource queue will claim a larger share of available CPU than lower priority queries and statements.

Priority settings are created or altered using the WITH parameter of the commands CREATE RESOURCE QUEUE and ALTER RESOURCE QUEUE. For example, to specify priority settings for the adhoc and reporting queues, an administrator would use the following commands:

=# ALTER RESOURCE QUEUE adhoc WITH (PRIORITY=LOW);

=# ALTER RESOURCE QUEUE reporting WITH (PRIORITY=HIGH);

To create the executive queue with maximum priority, an administrator would use the following command:

=# CREATE RESOURCE QUEUE executive WITH (ACTIVE_STATEMENTS=3, PRIORITY=MAX);

When the query prioritization feature is enabled, resource queues are given a MEDIUM priority by default if not explicitly assigned. For more information on how priority settings are evaluated at runtime,

Important: In order for resource queue priority levels to be enforced on the active query workload, you must enable the query prioritization feature by setting the associated server configuration parameters. 

Assigning Roles (Users) to a Resource Queue

Once a resource queue is created, you must assign roles (users) to their appropriate resource queue. If roles are not explicitly assigned to a resource queue, they will go to the default resource queue, pg_default. The default resource queue has an active statement limit of 20, no cost limit, and a medium priority setting.

Use the ALTER ROLE or CREATE ROLE commands to assign a role to a resource queue. For example:

=# ALTER ROLE name RESOURCE QUEUE queue_name;

=# CREATE ROLE name WITH LOGIN RESOURCE QUEUE queue_name;

A role can only be assigned to one resource queue at any given time, so you can use the ALTER ROLE command to initially assign or change a role’s resource queue.

Resource queues must be assigned on a user-by-user basis. If you have a role hierarchy (for example, a group-level role) then assigning a resource queue to the group does not propagate down to the users in that group.

Superusers are always exempt from resource queue limits. Superuser queries will always run regardless of the limits set on their assigned queue.

Removing a Role from a Resource Queue

All users must be assigned to a resource queue. If not explicitly assigned to a particular queue, users will go into the default resource queue, pg_default. If you wish to remove a role from a resource queue and put them in the default queue, change the role’s queue assignment to none. For example:

=# ALTER ROLE role_name RESOURCE QUEUE none;

Modifying Resource Queues

After a resource queue has been created, you can change or reset the queue limits using the ALTER RESOURCE QUEUE command. You can remove a resource queue using the DROP RESOURCE QUEUE command.

Altering a Resource Queue

The ALTER RESOURCE QUEUE command changes the limits of a resource queue. A resource queue must have either an ACTIVE_STATEMENTS or a MAX_COST value (or it can have both). To change the limits of a resource queue, specify the new values you want for the queue. For example:

=# ALTER RESOURCE QUEUE adhoc WITH (ACTIVE_STATEMENTS=5);

=# ALTER RESOURCE QUEUE exec WITH (MAX_COST=100000.0);

To reset active statements or memory limit to no limit, enter a value of -1. To reset the maximum query cost to no limit, enter a value of -1.0. For example:

=# ALTER RESOURCE QUEUE adhoc WITH (MAX_COST=-1.0, MEMORY_LIMIT='2GB');

You can use the ALTER RESOURCE QUEUE command to change the priority of queries associated with a resource queue. For example, to set a queue to the minimum priority level:

ALTER RESOURCE QUEUE webuser WITH (PRIORITY=MIN);

Dropping a Resource Queue

The DROP RESOURCE QUEUE command drops a resource queue. To drop a resource queue, the queue cannot have any roles assigned to it, nor can it have any statements waiting in the queue. See “Removing a Role from a Resource Queue” on page 62 and “Clearing a Waiting Statement From a Resource Queue” on page 65 for instructions on emptying a resource queue. To drop a resource queue:

=# DROP RESOURCE QUEUE name;

Checking Resource Queue Status

Checking resource queue status involves the following tasks:

•Viewing Queued Statements and Resource Queue Status

•Viewing Resource Queue Statistics

•Viewing the Roles Assigned to a Resource Queue

•Viewing the Waiting Queries for a Resource Queue

•Clearing a Waiting Statement From a Resource Queue

•Viewing the Priority of Active Statements

•Resetting the Priority of an Active Statement

Viewing Queued Statements and Resource Queue Status

The gp_toolkit.gp_resqueue_status view allows administrators to see status and activity for a workload management resource queue. It shows how many queries are waiting to run and how many queries are currently active in the system from a particular resource queue. To see the resource queues created in the system, their limit attributes, and their current status:

=# SELECT * FROM gp_toolkit.gp_resqueue_status;

Viewing Resource Queue Statistics

If you want to track statistics and performance of resource queues over time, you can enable statistics collecting for resource queues. This is done by setting the following server configuration parameter in your master postgresql.conf file:

stats_queue_level = on

Once this is enabled, you can use the pg_stat_resqueues system view to see the statistics collected on resource queue usage. Note that enabling this feature does incur slight performance overhead, as each query submitted through a resource queue must be tracked. It may be useful to enable statistics collecting on resource queues for initial diagnostics and administrative planning, and then disable the feature for continued use.

See the section on the Statistics Collector in the PostgreSQL documentation for more information about collecting statistics in Greenplum Database.

Viewing the Roles Assigned to a Resource Queue

To see the roles assigned to a resource queue, perform the following query of the pg_roles and gp_toolkit.gp_resqueue_status system catalog tables:

=# SELECT rolname, rsqname FROM pg_roles, gp_toolkit.gp_resqueue_status

WHERE pg_roles.rolresqueue=gp_toolkit.gp_resqueue_status.queueid;

You may want to create a view of this query to simplify future inquiries. For example:

=# CREATE VIEW role2queue AS

SELECT rolname, rsqname FROM pg_roles, pg_resqueue

WHERE pg_roles.rolresqueue=gp_toolkit.gp_resqueue_status.queueid;

Then you can just query the view:

=# SELECT * FROM role2queue;

Viewing the Waiting Queries for a Resource Queue

When a slot is in use for a resource queue, it is recorded in the pg_locks system catalog table. This is where you can see all of the currently active and waiting queries for all resource queues. To check that statements are being queued (even statements that are not waiting), you can also use the gp_toolkit.gp_locks_on_resqueue view. For example:

=# SELECT * FROM gp_toolkit.gp_locks_on_resqueue WHERE lorwaiting='true';

If this query returns no results, then that means there are currently no statements waiting in a resource queue.

Clearing a Waiting Statement From a Resource Queue

In some cases, you may want to clear a waiting statement from a resource queue. For example, you may want to remove a query that is waiting in the queue but has not been executed yet. You may also want to stop a query that has been started if it is taking too long to execute, or if it is sitting idle in a transaction and taking up resource queue slots that are needed by other users. To do this, you must first identify the statement you want to clear, determine its process id (pid), and then, use pg_cancel_backend with the process id to end that process, as shown below.

For example, to see process information about all statements currently active or waiting in all resource queues, run the following query:

=# SELECT rolname, rsqname, pid, granted,current_query, datname

FROM pg_roles, gp_toolkit.gp_resqueue_status, pg_locks, pg_stat_activity

WHERE pg_roles.rolresqueue=pg_locks.objid

AND pg_locks.objid=gp_toolkit.gp_resqueue_status.queueid

AND pg_stat_activity.procpid=pg_locks.pid;

If this query returns no results, then that means there are currently no statements in a resource queue. A sample of a resource queue with two statements in it looks something like this:

rolname | rsqname | pid | granted | current_query | datname

-----------------------------------------------------------------------

sammy | webuser | 31861 | t | <IDLE> in transaction | namesdb

daria | webuser | 31905 | f | SELECT * FROM topten; | namesdb

Use this output to identify the process id (pid) of the statement you want to clear from the resource queue. To clear the statement, you would then open a terminal window (as the gpadmin database superuser or as root) on the master host and cancel the corresponding process. For example:

=# pg_cancel_backend(31905)

Note: Do not use any operating system KILL command.

Viewing the Priority of Active Statements

The gp_toolkit administrative schema has a view called gp_resq_priority_statement, which lists all statements currently being executed and provides the priority, session ID, and other information.

This view is only available through the gp_toolkit administrative schema. See Appendix I, “The gp_toolkit Administrative Schema” for more information.

Resetting the Priority of an Active Statement

Superusers can adjust the priority of a statement currently being executed using the built-in function gp_adjust_priority(session_id, statement_count, priority). Using this function, superusers can raise or lower the priority of any query. For example:

=# SELECT gp_adjust_priority(752, 24905, 'HIGH')

To obtain the session ID and statement count parameters required by this function, Superusers can use the gp_toolkit administrative schema view, gp_resq_priority_statement. This function affects only the specified statement . Subsequent statements in the same resource queue are executed using the queue’s normally assigned priority.