Command Center Database (gpperfmon) Reference

gpperfmon consists of three sets of tables

-- Tail tables are for internal use only and should not be queried by users. 

-- The now and tail data are stored as text files on the master host file system, and accessed by the Command Center database via external tables. 

-- The history tables are regular database tables stored within the Command Center (gpperfmon) database.

There are the following categories of tables:

•The database_* tables store query workload information for a Greenplum Database instance.

•The emcconnect_history table displays information about ConnectEMC events and alerts. ConnectEMC events are triggered based on a hardware failure, a fix to a failed hardware component, or a Greenplum Database startup. Once an ConnectEMC event is triggered, an alert is sent to EMC Support.

•The filerep_* tables store health and status metrics for the file replication process. This process is how high-availability/mirroring is achieved in Greenplum Database instance. Statistics are maintained for each primary-mirror pair.

•The health_* tables store system health metrics for the EMC Data Computing Appliance.

•The interface_stats_* tables store statistical metrics for each active interface of a Greenplum Database instance. Note: These tables are in place for future use and are not currently populated.

•The iterators_* tables store information about query plan iterators and their metrics. A query iterator refers to a node or operation in a query plan.

•The queries_* tables store high-level query status information.

•The segment_* tables store memory allocation statistics for the Greenplum Database segment instances.

•The socket_stats_* tables store statistical metrics about socket usage for a Greenplum Database instance. Note: These tables are in place for future use and are not currently populated.

•The system_* tables store system utilization metrics.

•The tcp_stats_* tables store statistical metrics about TCP communications for a Greenplum Database instance. Note: These tables are in place for future use and are not currently populated.

•The udp_stats_* tables store statistical metrics about UDP comm unications for a Greenplum Database instance. Note: These tables are in place for future use and are not currently populated.

The Command Center database also contains the following views:

•The dynamic_memory_info view shows an aggregate of all the segments per host and the amount of dynamic memory used per host.

•The iterators_*_rollup set of views summarize the query iterator metrics across all segments in the system.

•The memory_info view shows per-host memory information from the system_history and segment_history tables.

database_*

The database_* tables store query workload information for a Greenplum Database instance. There are three database tables, all having the same columns:

•database_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. Current query workload data is stored in database_now during the period between data collection from the Command Center agents and automatic commitment to the database_history table.

•database_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for query workload data that has been cleared from database_now but has not yet been committed to database_history. It typically only contains a few minutes worth of data.

•database_history is a regular table that stores historical database-wide query workload data. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

emcconnect_history

The emcconnect_history table displays information about ConnectEMC events and alerts.  ConnectEMC events are triggered based on a hardware failure, a fix to a failed hardware component, or a Greenplum Database instance startup. Once an ConnectEMC event is triggered, an alert is sent to EMC Support.

This table is pre-partitioned into monthly partitions. Partitions are automatically added in one month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

Note: This table only applies to the Greenplum Data Computing Appliance platform.

filerep_*

The database_* tables store high-availability file replication process information for a Greenplum Database instance. There are three database tables, all having the same columns:

•filerep_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. Current file replication data is stored in filerep_now during the period between data collection from the Command Center agents and automatic commitment to the filerep_history table.

•filerep_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for file replication data that has been cleared from filerep_now but has not yet been committed to filerep_history. It typically only contains a few minutes worth of data.

•filerep_history is a regular table that stores historical database-wide file replication data. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

health_*

The health_* tables store system health metrics for the EMC Data Computing Appliance. There are three system tables, all having the same columns:

Note: This table only applies to the Greenplum Data Computing Appliance platform.

•health_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. Current system health data is stored in system_now during the period between data collection from the Command Center agents and automatic commitment to the system_history table.

•health_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for system health data that has been cleared from system_now but has not yet been committed to system_history. It typically only contains a few minutes worth of data.

•health_history is a regular table that stores historical system health metrics. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

interface_stats_*

The interface_stats_* tables store statistical metrics about communications over each active interface for a Greenplum Database instance.

Note: These tables are in place for future use and are not currently populated.

There are three system tables, all having the same columns:

•interface_stats_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data.

•interface_stats_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for statistical 

interface metrics that has been cleared from interface_stats_now but has not yet been committed to interface_stats_history. It typically only contains a few minutes worth of data.

•interface_stats_history is a regular table that stores statistical interface metrics. It is pre-partitioned into monthly partitions. Partitions are automatically added in one month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

iterators_*

The iterators_* tables store information about query plan iterators and their metrics. A query iterator refers to a node or operation in a query plan. For example, a sequential scan operation on a table may be one type of iterator in a particular query plan.

The tmid, ssid and ccnt columns are the composite key that uniquely identifies a particular query. These columns can be used to join with the queries_* data tables.

There are three iterator tables, all having the same columns:

•iterators_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. Current query plan iterator data is stored in iterators_now during the period between data collection from the Command Center agents and automatic commitment to the iterators_history table.

•iterators_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for query plan iterator data that has been cleared from iterators_now but has not yet been committed to iterators_history. It typically only contains a few minutes worth of data.

•iterators_history is a regular table that stores historical query plan iterator data. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

See also the iterator_rollup views for summary metrics of the query plan iterator data.

Iterator Metrics

The tables in this section list all possible iterators in a query on Greenplum Database instance. The iterator tables include the metric name, the column in the iterators_* table in the gpperfmon database where the metric appears, how the metric is measured (unit), and a description of the metric. The following iterators are listed:

Metric Terminology

The following information explains some of the database terms and concepts that appear in iterator metrics in Greenplum Database:

Node: Refers to a step in a query plan. A query plan has sets of operations that Greenplum Database performs to produce the answer to a given query. A node in the plan represents a specific database operation, such as a table scan, join, aggregation, sort, etc.

Iterator: Represents the actual execution of the node in a query plan. Node and iterator are sometimes used interchangeably.

Tuple: Refers to a row returned as part of a result set from a query, as well as a record in a table.

Spill: When there is not enough memory to perform a database operation, data must be written (or spilled) to disk.

Passes: Occur when an iterator must scan (or pass) over spilled data to obtain a result. A pass represents one pass through all input tuples, or all data in batch files generated after spill, which happens hierarchically. In the first pass, all input tuples are read, and intermediate results are spilled to a specified number of batch files. In the second pass, the data in all batch files is processed. If the results are still too large to store in memory, the intermediate results are spilled to the second level of spill files, and the process repeats again.

Batches: Refers to the actual files created when data is spilled to disk. This is most often associated to Hash operations.

Join: This clause in a query joins two or more tables. There are three types of Join algorithms in Greenplum Database instance:

•Hash Join

•Merge Join

•Nested Loop

Each of these operations include their own respective Join semantics. The Command Center Console displays iterator metrics for each of these semantics.

Append

An Append iterator has two or more input sets. Append returns all rows from the first input set, then all rows from the second input set, and so on, until all rows from all input sets are processed. Append is also used when you select from a table involved in an inheritance hierarchy.

Append-only Scan

This iterator scans append-only type-tables.

Append-only Columnar Scan

This iterator scans append-only columnar-type tables.

Aggregate

The query planner produces an aggregate iterator whenever the query includes an aggregate function. For example, the following functions are aggregate functions: AVG(), COUNT(), MAX(), MIN(), STDDEV(), SUM(), and VARIANCE(). Aggregate reads all the rows in the input set and computes the aggregate values. If the input set is not grouped, Aggregate produces a single result row.

BitmapAnd

This iterator takes the bitmaps generated from multiple BitmapIndexScan iterators, puts them together with an AND clause, and generates a new bitmap as its output.

BitmapOr

This iterator takes the bitmaps generated from multiple BitmapIndexScan iterators, puts them together with an OR clause, and generates a new bitmap as its output.

Bitmap Append-Only Scan

This iterator retrieves all rows from the bitmap generated by BitmapAnd, BitmapOr, or BitmapIndexScan and accesses the append-only table to retrieve the relevant rows.

Bitmap Heap Scan

This iterator retrieves all rows from the bitmap generated by BitmapAnd, BitmapOr, or BitmapIndexScan and accesses the heap table to retrieve the relevant rows.

Table 16 Bitmap Heap Scan Iterator Metrics

Bitmap Index Scan

This iterator produces a bitmap that corresponds to the rules that satisfy the query plan.

Broadcast Motion

This iterator sends an outbound tuple to all of its destinations.

Note that the Motion metrics for the iterator are useful when investigating potential networking issues in the Greenplum Database system. Typically, the "Ack Time" values should be very small (microseconds or milliseconds). However if the "Ack Time" values are one or more seconds (particularly the "Motion Min Ack Time" metric), then a network performance issue likely exists.

Also, if there are a large number of packets being dropped because of queue overflow, you can increase the value for the gp_interconnect_queue_depth system configuration parameter to improve performance. See the Greenplum Database Administrator Guide for more in formation about system configuration parameters.

.

Explicit Redistribute Motion

The Explicit Redistribute iterator moves tuples to segments explicitly specified in the segment ID column of the tuples. This differs from a Redistribute Motion iterator, where target segments are indirectly specified through hash expressions. The Explicit Redistribute iterator is used when the query portion of a DML planned statement requires moving tuples across distributed tables.

Note that the Motion metrics for the iterator are useful when investigating potential networking issues in the Greenplum Database system. Typically, the "Ack Time" values should be very small (microseconds or milliseconds). However if the "Ack Time" values are one or more seconds (particularly the "Motion Min Ack Time" metric), then a network performance issue likely exists.

Also, if there are a large number of packets being dropped because of queue overflow, you can increase the value for the gp_interconnect_queue_depth system configuration parameter to improve performance. See the Greenplum Database Administrator Guide for more in formation about system configuration parameters.

External Scan

This iterator scans an external table.

Function Scan

This iterator returns tuples produced by a function

.

Gather Motion

This iterator gathers streams of tuples that are sent by "sending" motions. If a merge key is specified, it merges many streams into a single order-preserved stream.

Note that the Motion metrics for the iterator are useful when investigating potential networking issues in the Greenplum Database system. Typically, the "Ack Time" values should be very small (microseconds or milliseconds). However if the "Ack Time" values are one or more seconds (particularly the "Motion Min Ack Time" metric), then a network performance issue likely exists.

Also, if there are a large number of packets being dropped because of queue overflow, you can increase the value for the gp_interconnect_queue_depth system configuration parameter to improve performance. See the Greenplum Database Administrator Guide for more in formation about system configuration parameters.

GroupAggregate

The GroupAggregate iterator is a way to compute vector aggregates, and it is used to satisfy a GROUP BY clause. A single input set is required by the GroupAggregate iterator, and it must be ordered by the grouping column(s). This iterator returns a single row for a unique value of grouping columns.

Hash Join

The Hash Join iterator requires two input sets - the outer and inner tables.

The Hash Join iterator starts by creating its inner table using the Hash operator. The Hash operator creates a temporary Hash index that covers the join column in the inner table. When the hash table (that is, the inner table) is created, Hash Join reads each row in the outer table, hashes the join column (from the outer table), and searches the temporary Hash index for a matching value.

In a Greenplum Database instance, a Hash Join algorithm can be used with the following join semantics:

HashAggregate

The HashAggregate iterator is similar to the GroupAggregate iterator. A single input set is required by the HashAggregate iterator and it creates a hash table from the input. However, it does not require its input to be ordered.

Index Scan

An Index Scan operator traverses an index structure. If you specify a starting value for an indexed column, the Index Scan will begin at the appropriate value. If you specify an ending value, the Index Scan will complete as soon as it finds an index entry greater than the ending value. A query planner uses an Index Scan operator when it can reduce the size of the result set by traversing a range of indexed values, or when it can avoid a sort because of the implicit ordering offered by an index.

Limit

The Limit operator is used to limit the size of a result set. A Greenplum Database instance uses the Limit operator for both Limit and Offset processing. The Limit operator works by discarding the first x rows from its input set, returning the next y rows, and discarding the remainder. If the query includes an OFFSET clause, x represents the offset amount; otherwise, x is zero. If the query includes a LIMIT clause, y represents the Limit amount; otherwise, y is at least as large as the number of rows in the input set

Materialize

The materialize iterator is used for some sub-select operations. The query planner can decide that it is less expensive to materialize a sub-select one time than it is to repeat the work for each top-level row. Materialize is also used for some merge/join operations.

.

Merge Join

The Merge Join iterator joins two tables. Like the Nested Loop iterator, Merge Join requires two input sets: An outer table and an inner table. Each input set must be ordered by the join columns. In a Greenplum Database instance, the Merge Join algorithm can be used with the following join semantics:

•Left Join

•Left Anti Semi Join

•Full Join

•Right Join

•EXISTS Join

•Reverse In Join

•Unique Outer joins

•Unique Inner Join

Nested Loop

The Nested Loop iterator is used to perform a join between two tables, and as a result requires two input sets. It fetches each table from one of the input sets (called the outer table). For each row in the outer table, the other input (called the inner table) is searched for a row that meets the join qualifier. In a Greenplum Database instance, a Merge Join algorithm can be used with the following join semantics:

•Left Join

•Left Anti Semi Join

•Full Join

•Right Join

•EXISTS Join

•Reverse In Join

•Unique Outer Join

•Unique Inner Join

Redistribute Motion

This iterator sends an outbound tuple to only one destination based on the value of a hash.

Note that the Motion metrics for the iterator are useful when investigating potential networking issues in the Greenplum Database system. Typically, the "Ack Time" values should be very small (microseconds or milliseconds). However if the "Ack Time" values are one or more seconds (particularly the "Motion Min Ack Time" metric), then a network performance issue likely exists.

Also, if there are a large number of packets being dropped because of queue overflow, you can increase the value for the gp_interconnect_queue_depth system configuration parameter to improve performance. See the Greenplum Database Administrator Guide for more in formation about system configuration parameters.

Result

The Result iterator is used to either (1) execute a query that does not retrieve data from a table, or evaluate the parts of a WHERE clause that do not depend on data retrieved from a table. It can also be used if the top node in the query plan is an Append iterator

.

Repeat

This iterator repeats every input operator a certain number of times. This is typically used for certain grouping operations.

Seq Scan

The Seq Scan iterator scans heap tables, and is the most basic query iterator. Any single-table query can be done by using the Seq Scan iterator. Seq Scan starts at the beginning of a heap table and scans to the end of the heap table. For each row in the heap table, Seq Scan evaluates the query constraints (the WHERE clause). If the constraints are satisfied, the required columns are added to the result set.

SetOp

There are four SetOp iterators:

•Intersect

•Intersect All

•Except

•Except All

These iterators are produced only when the query planner encounters an INTERSECT, INTERSECT ALL, EXCEPT, or EXCEPT ALL clause, respectively.

All SetOp iterators require two input sets. They combine the input sets into a sorted list, and then groups of identical rows are identified. For each group, the SetOp iterators counts the number of rows contributed by each input set, then uses the counts to determine the number of rows to add to the result set.

Shared Scan

This iterator evaluates the common parts of a query plan. It shares the output of the common sub-plans with all other iterators, so that the sub-plan only needs to execute one time.

Sort

The Sort iterator imposes an ordering on the result set. A Greenplum Database instance uses two different sort strategies: An in-memory sort and an on-disk sort. If the size of the result set exceeds the available memory, the Sort iterator distributes the input set to a collection of sorted work files and then merges the work files back together again. If the result set is less than the available memory, the sort is done in memory.

The Sort iterator is used for many purposes. A Sort can be used to satisfy an ORDER BY clause. Also, some query operators require their input sets to be ordered.

Subquery Scan

A Subquery Scan iterator is a pass-through iterator. It scans through its input set, adding each row to the result set. This iterator is used for internal purposes and has no affect on the overall query plan.

Tid Scan

The Tid Scan (tuple ID scan) iterator is used whenever the query planner encounters a constraint of the form ctid = expression or expression = ctid. This specifies a tuple ID, an identifier that is unique within a table. The tuple ID works like a bookmark, but is valid only within a single transaction. After the transaction completes, the tuple ID is not used again.

Unique

The Unique iterator eliminates duplicate values from the input set. The input set must be ordered by the columns, and the columns must be unique. The Unique operator removes only rows — it does not remove columns and it does not change the ordering of the result set. Unique can return the first row in the result set before it has finished processing the input set. The query planner uses the Unique operator to satisfy a DISTINCT clause. Unique is also used to eliminate duplicates in a UNION.

Values Scan

The Value Scan iterator is used to iterate over a set of constant tuples.

Window

The Window function performs calculations across sets of rows that are related to the current query row. The Window iterator computes Window functions on the input set of rows

queries_*

The queries_* tables store high-level query status information.

The tmid, ssid and ccnt columns are the composite key that uniquely identifies a particular query. These columns can be used to join with the iterators_* tables.

There are three queries tables, all having the same columns:

•queries_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. Current query status is stored in queries_now during the period between data collection from the Command Center agents and automatic commitment to the queries_history table.

•queries_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for query status data that has been cleared from queries_now but has not yet been committed to queries_history. It typically only contains a few minutes worth of data.

•queries_history is a regular table that stores historical query status data. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

segment_*

The segment_* tables contain memory allocation statistics for the Greenplum Database segment instances. This tracks the amount of memory consumed by all postgres processes of a particular segment instance, and the remaining amount of memory available to a segment as per the setting of the postgresql.conf configuration parameter: gp_vmem_protect_limit. Query processes that cause a segment to exceed this limit will be cancelled in order to prevent system-level out-of-memory errors. See the Greenplum Database Administrator Guide for more information about this parameter.

There are three segment tables, all having the same columns:

•segment_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. Current memory allocation data is stored in segment_now during the period between data collection from the Command Center agents and automatic commitment to the segment_history table.

•segment_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for memory allocation data that has been cleared from segment_now but has not yet been committed to segment_history. It typically only contains a few minutes worth of data.

•segment_history is a regular table that stores historical memory allocation metrics. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

A particular segment instance is identified by its hostname and dbid (the unique segment identifier as per the gp_segment_configuration system catalog table).

socket_stats_*

The socket_stats_* tables store statistical metrics about socket usage for a Greenplum Database instance. There are three system tables, all having the same columns:

Note: These tables are in place for future use and are not currently populated.

•socket_stats_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data.

•socket_stats_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for socket statistical metrics that has been cleared from socket_stats_now but has not yet been committed to socket_stats_history. It typically only contains a few minutes worth of data.

•socket_stats_history is a regular table that stores historical socket statistical metrics. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

system_*

The system_* tables store system utilization metrics. There are three system tables, all having the same columns:

•system_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. Current system utilization data is stored in system_now during the period between data collection from the Command Center agents and automatic commitment to the system_history table.

•system_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for system utilization data that has been cleared from system_now but has not yet been committed to system_history. It typically only contains a few minutes worth of data.

•system_history is a regular table that stores historical system utilization metrics. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

tcp_stats_*

The tcp_stats_* tables store statistical metrics about TCP communications for a Greenplum Database instance.

Note: These tables are in place for future use and are not currently populated.

There are three system tables, all having the same columns:

•tcp_stats_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data.

•tcp_stats_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for TCP statistical data that has been cleared from tcp_stats_now but has not yet been committed to tcp_stats_history. It typically only contains a few minutes worth of data.

•tcp_stats_history is a regular table that stores historical TCP statistical data. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

udp_stats_*

The udp_stats_* tables store statistical metrics about UDP communications for a Greenplum Database instance.

Note: These tables are in place for future use and are not currently populated.

There are three system tables, all having the same columns:

•udp_stats_now is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data.

•udp_stats_tail is an external table whose data files are stored in $MASTER_DATA_DIRECTORY/gpperfmon/data. This is a transitional table for UDP statistical data that has been cleared from udp_stats_now but has not yet been committed to udp_stats_history. It typically only contains a few minutes worth of data.

•udp_stats_history is a regular table that stores historical UDP statistical metrics. It is pre-partitioned into monthly partitions. Partitions are automatically added in two month increments as needed. Administrators must drop old partitions for the months that are no longer needed.

iterators_*_rollup

The iterators_*_rollup set of views aggregate the metrics stored in the iterators_* tables. A query iterator refers to a node or operation in a query plan. For example, a sequential scan operation may be one type of iterator in a particular query plan. For each iterator in a query plan, the iterators_* tables store the metrics collected from each segment instance. The iterators_*_rollup views summarize the query iterator metrics across all segments in the system.

The tmid, ssid and ccnt columns are the composite key that uniquely identifies a particular query.

There are three iterators rollup views, all having the same columns:

•The iterators_now_rollup view shows iterator data from the interators_now table aggregated across all segments in the system.

•The iterators_tail_rollup view shows iterator data from the interators_tail table aggregated across all segments in the system.

•The iterators_history_rollup shows iterator data from the interators_history table aggregated across all segments in the system.

See also the iterators_* tables for more information about the query plan iterator types and the metrics collected for each iterator.

dynamic_memory_info

The dynamic_memory_info view shows a sum of the used and available dynamic memory for all segment instances on a segment host. Dynamic memory refers to the maximum amount of memory that Greenplum Database instance will allow the query processes of a single segment instance to consume before it starts cancelling processes. This limit is set by the gp_vmem_protect_limit server configuration parameter, and is evaluated on a per-segment basis.