Understanding Query Planning and Dispatch in Greenplum
Query Planning and Dispatch
When clients execute a query in a Greenplum database, the master receives, parses, and optimizes the query. The resulting query plan is either parallel or targeted.
a) The master dispatches parallel query plans to all segments.
b) The master dispatches targeted query plans to a single segment.
Each segment is responsible for executing local database operations on its own set of data. Most database operations—such as table scans, joins, aggregations, and sorts—execute across all segments in parallel. Each operation is performed on a segment database independent of the data stored in the other segment databases.
What is a query plan?
A query plan is the set of operations Greenplum Database will perform to produce the answer to a query. Each node or step in the plan represents a database operation such as a table scan, join, aggregation, or sort. Plans are read and executed from bottom to top.
In addition to common database operations such as tables scans, joins, and so on, Greenplum Database has an additional operation type called motion.
What is motion?
A motion operation involves moving tuples between the segments during query processing. Note that not every query requires a motion. For example, a targeted query plan does not require data to move across the interconnect.
To achieve maximum parallelism during query execution, Greenplum divides the work of the query plan into slices.
What is Slice?
A slice is a portion of the plan that segments can work on independently. A query plan is sliced wherever a motion operation occurs in the plan, with one slice on each side of the motion.
What is QD?
Greenplum creates a number of database processes to handle the work of a query. On the master, the query worker process is called the query dispatcher (QD). The QD is responsible for creating and dispatching the query plan. It also accumulates and presents the final results.
What is QE?
On the segments, a query worker process is called a query executor (QE). A QE is responsible for completing its portion of work and communicating its intermediate results to the other worker processes.
There is at least one worker process assigned to each slice of the query plan. A worker process works on its assigned portion of the query plan independently. During query execution, each segment will have a number of processes working on the query in parallel.
What is Gangs?
Related processes that are working on the same slice of the query plan but on different segments are called gangs. As a portion of work is completed, tuples flow up the query plan from one gang of processes to the next.
What is interconnect component of Greenplum Database?
The inter-process communication between the segments is referred to as the interconnect component of Greenplum Database.