DOUBLE FAULTS -  HOW TO AVOID IT?

Post date: Dec 04, 2012 7:7:57 PM

Definition: A double fault occurs when a Greenplum Database primary and corresponding mirror segment pair fails. When you have a double fault, the Greenplum Database is not available for query processing.

(Reference: Greenplum Database Administration Guide, Chapter 15: Enabling High Availability Features,available on Powerlink.)

Avoidance: Avoid double faults by monitoring single faults in the Greenplum Database as follows:

Use the Greenplum utility gpstate –e from the command line to monitor if segments are in changetracking or if they are re-synchronizing from fault.

Run

select * from gp_segment_configuration where status=’d’

from the psql prompt to check for failed segments.

If you find even one failed segment in the output of the above two commands, immediately run gprecoverseg. (Reference: Greenplum Database Administration Guide, Appendix B: Management Utility Reference, available on Powerlink.)

Monitor: You can monitor for failed segments in the following ways:

Greenplum Database Performance Monitor (gpperfmon): Install and configure the Greenplum Performance Monitor console package to use the UI Dashboard. The dashboard clearly indicates segment failures in the Database Health section. (Reference: Greenplum Performance Monitor Administration Guide, available on Powerlink.)

Enabling System Alerts and Notifications: You can configure the Greenplum Database to trigger SNMP (Simple Network Management Protocol) alerts using Greenplum?s daemon gpsnmpd. You can configure alerts for segment failures and messages such as FATAL, PANIC, etc. (Reference:Greenplum Database Administration Guide, Chapter 18: Monitoring a Greenplum System, available on Powerlink.)

Greenplum Utility gpstate: gpstate has various options that can help monitor segments at fault.

For example, run gpstate with the following parameters to learn more about your system?s health:

–e shows segments with mirror state issues.

–f shows standby master details. If the master standby is out of sync, then run gpinitstandby.

–s shows the detailed status of gpstate.

Recovery from Double Faults: In case of a double fault, restart the Greenplum Database and then run gprecoverseg. To confirm that segments have recovered, run gpstate with appropriate options (-m for mirrors, –s for detailed status) or select * from gp_segment_configuration. (Reference: Greenplum Database Administration Guide, Chapter 15: Enabling High Availability Features, available on Powerlink.)