DISK SPILL

Post date: Aug 25, 2014 7:39:54 PM

Dynamic Pipelining tm enables parallel data flow between segment hosts where data from upstream components in the dynamic pipeline are transmitted to downstream components enabling queries to run without materializing intermediate contents to disk. However certain operations, for example internal sort operations, that cannot be fully performed in memory will spill to disk impacting overall query performance. 

The work_mem GUC specifies the amount of memory to be used by internal sort operations and hash tables before spilling to file system. If there are several sort or hash operations running in parallel for a given query, each operation will be permitted to use the memory specified before spilling to file system. The EXPLAIN ANALYZE query plan displays the amount of memory (work_mem) used by the operation. If the work_mem was insufficient to perform the operation in memory, the plan shows the amount of data spilled to disk for the lowest-performing segment. The work_mem wanted is based on the amount of data written to work files and is not exact.

If the memory used versus the memory wanted for a plan node operation is significant then increase work_mem for the query using the SQL SET statement to give the query additional memory. Keep in mind arbitrarily setting work_mem to a very large number for a given query may actually decrease query performance due to the overhead of allocating and managing memory. It is also important to consider arbitrarily setting work_mem to a very large number and the impact on other concurrent queries with sort and hash operations. 

The statement_mem GUC replaces work_mem when gp_resqueue_memory_policy is set to AUTO. Statement_mem allocates segment host memory per query and increasing the value may improve query performance allowing more memory to be used by various operators. The default is 125MB. 

When query operations spill to disk, for example hash aggregation or hash join operations use the gp_workfile_compress_algorithm GUC to compress spill files if the system is I/O bound. The default is NONE (no compression is performed on spill files).