Avoiding “Out of Memory” Issues

posted Dec 6, 2012, 6:18 AM by Sachi Ojha
DBA's occasionally experience “out of memory” errors that can cause failed queries and degrade system performance. Fortunately, the Greenplum Database provides facilities to avoid this. 

We will discuss two of those facilities: the gp_vmem_protect_limit parameter, and Greenplum resource queues. The parameters and techniques mentioned here are explained in detail in the Greenplum Database Administrator Guide.

The gp_vmem_protect_limit parameter: The “gp_vmem_protect_limit” parameter sets the amount of memory that all processes of an active segment instance can consume. Queries that cause the limit to be exceeded will be cancelled.

Note that this is a local parameter and must be set for each segment in the system. The system must be restarted for parameter changes to take effect.

How to set the gp_vmem_protect_limit

As a general rule-of-thumb, gp_vmem_protect_limit should be set to: 
( X * physical_memory_in_MB ) /#_of_primary_segments

X should be a value between 1.0 and 1.5. A value of X=1.0 offers the best overall system performance; a value of X=1.5 may impact system performance because of swap activity but will result in fewer canceled queries.

For example, to set gp_vmem_protect_limit conservatively (X=1.0) on a segment host with 16GB (16384 MB) of physical memory with 4 primary segment instances, the calculation would be: (1 * 16384) / 4 = 4096.  

The MEMORY_LIMIT parameter for Greenplum Resource Queues:Greenplum resource queues provide a way to manage and prioritize workloads. Resource queues can be created with a MEMORY_LIMIT setting to restrict the total amount of memory that queries can consume in any segment instance. Queries that cause a queue to exceed the MEMORY_LIMIT must wait until queue resources are free before they can execute.

By assigning each user to a queue and limiting the amount of memory queues can consume, administrators can ensure proper resource allocation across the system.

Note that roles with the SUPERUSER attribute are exempt from queue limits.

How to set MEMORY_LIMIT to avoid Out of Memory errors:As a general rule-of-thumb, the sum of all the MEMORY_LIMITs across all the queues should be no more than 90% of the gp_vmem_protect_limit.

Common Out of Memory Errors
The two most common errors are described below. They look similar but have different reasons and solutions. 

Error code 53200
Example:
"ERROR","53200","Out of memory.  Failed on request of size 156 bytes. (context 'CacheMemoryContext') (aset.c:840)"

Description:
The system canceled a query because a segment server’s OS did not have enough memory to satisfy a postmaster process allocation request.

How to Avoid
1. Set gp_vmem_protect_limit according to the formula above.
2. Adding memory can help greatly if lowering gp_vmem_protect_limit results in too many canceled queries.(Gp_vmem_protect_limit can be raised after adding memory.)
3. Adding swap space may help, although increased swap activity will impact system performance.

Error code 53400 
Example
"ERROR","53400","Out of memory  (seg13 slice13 sdw1-1:40001 pid=10183)","VM Protect failed to allocate 8388608 bytes, 6 MB available"

Description
The system canceled a query because a postmaster process tried to request more memory than the gp_vmem_protect_limit parameter allows.

How to Avoid 
1. Make sure the sum of all MEMORY_LIMITs across all active queues is <= 90% of gp_vmem_protect_limit.
2. Increase gp_vmem_protect_limit, if possible, using the formula described above.
3. Ensure the system is not unbalanced (i.e., some segments down). Use gpstate -e to verify.
Comments