We have migrated to new Google sites!

Workload configuration parameters in greenplum

Post date: May 02, 2013 2:45:17 PM

To configure workload management

1. The following parameters are for the general configuration of resource queues:

• max_resource_queues - Sets the maximum number of resource queues.

• max_resource_portals_per_transaction - Sets the maximum number of simultaneously open cursors allowed per transaction. Note that an open cursor will hold an active query slot in a resource queue.

• resource_select_only - If set to on, then SELECT, SELECT INTO, CREATE TABLE AS SELECT, and DECLARE CURSOR commands are evaluated. If set to off INSERT, UPDATE, and DELETE commands will be evaluated as well.

• resource_cleanup_gangs_on_wait - Cleans up idle segment worker processes before taking a slot in the resource queue.

• stats_queue_level - Enables statistics collection on resource queue usage, which can then be viewed by querying the pg_stat_resqueues system view.

2. The following parameters are related to memory utilization:

• gp_resqueue_memory_policy - Enables Greenplum memory management features. When set to none, memory management is the same as in Greenplum Database releases prior to 4.1. When set to auto, query memory usage is controlled by statement_mem and resource queue memory limits.

• statement_mem and max_statement_mem - Used to allocate memory to a particular query at runtime (override the default allocation assigned by the resource queue). max_statement_mem is set by database superusers to prevent regular database users from over-allocation.

• gp_vmem_protect_limit - Sets the upper boundary that all query processes can consume that should not exceed the amount of physical memory of a segment host. When a segment host reaches this limit during query execution, the queries that cause the limit to be exceeded will be cancelled.

• gp_vmem_idle_resource_timeout and gp_vmem_protect_segworker_cache_limit - used to free memory on segment hosts held by idle database processes. Administrators may want to adjust these settings on systems with lots of concurrency.

3. The following parameters are related to query prioritization. Note that the following parameters are all local parameters, meaning they must be set in the postgresql.conf files of the master and all segments:

• gp_resqueue_priority - The query prioritization feature is enabled by default.

• gp_resqueue_priority_sweeper_interval - Sets the interval at which CPU usage is recalculated for all active statements. The default value for this parameter should be sufficient for typical database operations.

• gp_resqueue_priority_cpucores_per_segment - Specifies the number of CPU cores per segment. The default is 4 for segments and 24 for the master, the correct values for the EMC Greenplum Data Computing

Appliance. Each host checks its own postgresql.conf file for the value of this parameter.This parameter also affects the master node, where it should be set to a value reflecting the higher ratio of CPU cores. For example, on a cluster that has 8 CPU cores per host and 4 segments per host, you would use the following settings:

Master and standby master

gp_resqueue_priority_cpucores_per_segment = 8

Segment hosts

gp_resqueue_priority_cpucores_per_segment = 2

Important: If you have fewer than one segment per CPU core on your segment hosts, make sure you adjust this value accordingly. An improperly low value for this parameter can result in under-utilization of CPU resources.

4. If you wish to view or change any of the workload management parameter values, you can use the gpconfig utility.

5. For example, to see the setting of a particular parameter:

$ gpconfig --show gp_vmem_protect_limit

6. For example, to set one value on all segments and a different value on the master:

$ gpconfig -c gp_resqueue_priority_cpucores_per_segment -v 2 -m 8

7. Restart Greenplum Database to make the configuration changes effective:

$ gpstop -r

Creating Queues with an Active Query Limit

Resource queues with an ACTIVE_STATEMENTS setting limit the number of queries that can be executed by roles assigned to that queue. For example, to create a resource queue named adhoc with an active query limit of three:

=# CREATE RESOURCE QUEUE adhoc WITH (ACTIVE_STATEMENTS=3);

This means that for all roles assigned to the adhoc resource queue, only three active queries can be running on the system at any given time. If this queue has three queries running, and a fourth query is submitted by a role in that queue, that query must wait until a slot is free before it can run.

Creating Queues with Memory Limits

Resource queues with a MEMORY_LIMIT setting control the amount of memory for all the queries submitted through the queue. The total memory should not exceed the physical memory available per-segment. Greenplum recommends that you set MEMORY_LIMIT to 90% of memory available on a per-segment basis. For example, if a host has 48 GB of physical memory and 6 segments, then the memory available per

segment is 8 GB. You can calculate the recommended MEMORY_LIMIT for a single queue as 0.90*8=7.2 GB. If there are multiple queues created on the system, their total memory limits must also add up to 7.2 GB.

When used in conjunction with ACTIVE_STATEMENTS, the default amount of memory allotted per query is: MEMORY_LIMIT / ACTIVE_STATEMENTS. When used in conjunction with MAX_COST, the default amount of memory allotted per query is:

MEMORY_LIMIT * (query_cost / MAX_COST). Greenplum recommends that MEMORY_LIMIT be used in conjunction with ACTIVE_STATEMENTS rather than with MAX_COST.

For example, to create a resource queue with an active query limit of 10 and a total memory limit of 2000MB (each query will be allocated 200MB of segment host memory at execution time):

=# CREATE RESOURCE QUEUE myqueue WITH (ACTIVE_STATEMENTS=20, MEMORY_LIMIT='2000MB');

The default memory allotment can be overridden on a per-query basis using the statement_mem server configuration parameter, provided that MEMORY_LIMIT or max_statement_mem is not exceeded. For example, to allocate more memory to a particular query:

=> SET statement_mem='2GB';

=> SELECT * FROM my_big_table WHERE column='value' ORDER BY id;

=> RESET statement_mem;

As a general guideline, MEMORY_LIMIT for all of your resource queues should not exceed the amount of physical memory of a segment host. If workloads are staggered over multiple queues, it may be OK to oversubscribe memory allocations, keeping in mind that queries may be cancelled during execution if the segment host memory limit (gp_vmem_protect_limit) is exceeded.

Creating Queues with a Query Planner Cost Limits

Resource queues with a MAX_COST setting limit the total cost of queries that can be executed by roles assigned to that queue. Cost is specified as a floating point number (for example 100.0) or can also be specified as an exponent (for example 1e+2).

Cost is measured in the estimated total cost for the query as determined by the Greenplum query planner (as shown in the EXPLAIN output for a query). Therefore, an administrator must be familiar with the queries typically executed on the system in order to set an appropriate cost threshold for a queue. Cost is measured in units of disk page fetches; 1.0 equals one sequential disk page read.

For example, to create a resource queue named webuser with a query cost limit of 100000.0 (1e+5):

=# CREATE RESOURCE QUEUE webuser WITH (MAX _COST=100000.0);

=# CREATE RESOURCE QUEUE webuser WITH (MAX _COST=1e+5);

This means that for all roles assigned to the webuser resource queue, it will only allow queries into the system until the cost limit of 100000.0 is reached. So for example, if this queue has 200 queries with a 500.0 cost all running at the same time, and query 201 with a 1000.0 cost is submitted by a role in that queue, that query must wait until space is free before it can run.

Allowing Queries to Run on Idle Systems

If a resource queue is limited based on a cost threshold, then the administrator can allow COST_OVERCOMMIT (the default). Resource queues with a cost threshold and overcommit enabled will allow a query that exceeds the cost threshold to run, provided that there are no other queries in the system at the time the query is submitted. The cost threshold will still be enforced if there are concurrent workloads

on the system.

If COST_OVERCOMMIT is false, then queries that exceed the cost limit will always be rejected and never allowed to run.

Allowing Small Queries to Bypass Queue Limits

Workloads may have certain small queries that administrators want to allow to run without taking up an active statement slot in the resource queue. For example, simple queries to look up metadata information in the system catalogs do not typically require significant resources or interfere with query processing on the segments. An administrator can set MIN_COST to denote a query planner cost associated with a small

query. Any query that falls below the MIN_COST limit will be allowed to run immediately. MIN_COST can be used on resource queues with either an active statement or a maximum query cost limit.

For example:

=# CREATE RESOURCE QUEUE adhoc WITH (ACTIVE_STATEMENTS=10, MIN_COST=100.0);

Setting Priority Levels

To control a resource queue’s consumption of available CPU resources, an administrator can assign an appropriate priority level. When high concurrency causes contention for CPU resources, queries and statements associated with a high-priority resource queue will claim a larger share of available CPU than lower priority queries and statements.

Priority settings are created or altered using the WITH parameter of the commands CREATE RESOURCE QUEUE and ALTER RESOURCE QUEUE. For example, to specify priority settings for the adhoc and reporting queues, an administrator would use the following commands:

=# ALTER RESOURCE QUEUE adhoc WITH (PRIORITY=LOW);

=# ALTER RESOURCE QUEUE reporting WITH (PRIORITY=HIGH);

To create the executive queue with maximum priority, an administrator would use the following command:

=# CREATE RESOURCE QUEUE executive WITH (ACTIVE_STATEMENTS=3, PRIORITY=MAX);

When the query prioritization feature is enabled, resource queues are given a MEDIUM priority by default if not explicitly assigned.

Assigning Roles (Users) to a Resource Queue

Once a resource queue is created, you must assign roles (users) to their appropriate resource queue. If roles are not explicitly assigned to a resource queue, they will go to the default resource queue, pg_default. The default resource queue has an active statement limit of 20, no cost limit, and a medium priority setting.

Use the ALTER ROLE or CREATE ROLE commands to assign a role to a resource queue.

For example:

=# ALTER ROLE name RESOURCE QUEUE queue_name;

=# CREATE ROLE name WITH LOGIN RESOURCE QUEUE queue_name;

A role can only be assigned to one resource queue at any given time, so you can use the ALTER ROLE command to initially assign or change a role’s resource queue. Resource queues must be assigned on a user-by-user basis. If you have a role hierarchy (for example, a group-level role) then assigning a resource queue to the group does not propagate down to the users in that group.

Superusers are always exempt from resource queue limits. Superuser queries will always run regardless of the limits set on their assigned queue.

Removing a Role from a Resource Queue

All users must be assigned to a resource queue. If not explicitly assigned to a particular queue, users will go into the default resource queue, pg_default. If you wish to remove a role from a resource queue and put them in the default queue, change the role’s queue assignment to none.

For example:

=# ALTER ROLE role_name RESOURCE QUEUE none;

Report abuse