Iterator Metric Terminology

Post date: Nov 17, 2014 1:29:39 AM

The following information explains some of the database terms and concepts that appear in iterator metrics in Greenplum Database:

1. Node: Refers to a step in a query plan. A query plan has sets of operations that Greenplum Database performs to produce the answer to a given query. A node in the plan represents a specific database operation, such as a table scan, join, aggregation, sort, etc.

2. Iterator:Represents the actual execution of the node in a query plan. Node and iterator are sometimes used interchangeably.

3. Tuple: Refers to a row returned as part of a result set from a query, as well as a record in a table. 

4. Spill: When there is not enough memory to perform a database operation, data must be written (or spilled) to disk. 

5. Passes: Occur when an iterator must scan (or pass) over spilled data to obtain a result. A pass represents one pass through all input tuples, or all data in batch files generated after spill, which happens hierarchically. In the first pass, all input tuples are read, and intermediate results are spilled to a specified number of batch files. In the second pass, the data in all batch files is processed. If the results are still too large to store in memory, the intermediate results are spilled to the second level of spill files, and the process repeats again.

6. Batches: Refers to the actual files created when data is spilled to disk. This is most often associated to Hash operations.

7. Join: This clause in a query joins two or more tables. There are three types of Join algorithms in Greenplum Database instance:

a. Hash Join

b. Merge Join

c. Nested Loop

Each of these operations include their own respective Join semantics. The Command Center Console displays iterator metrics for each of these semantics.