Exploring High Availability Feature of Greenplum Database

Greenplum Database provides few optional features to ensure maximum uptime and high availability of your system. Let us look into details of  these features:

1. Segment Mirroring

2. Master Mirroring

3. Fault Detection and Recovery

1. Segment Mirroring: Mirror segments allow database queries to fail over to a backup segment if the primary segment becomes unavailable. To configure mirroring, your Greenplum Database system must have enough nodes for a primary segment and its mirror to reside on different hosts. 

Only primary segments are active during database operations. 

The system uses a file block replication process to copy changes from a primary segment to its mirror; if a failure has not occurred, only this process runs on the mirror host.

If a segment fails, the file replication process stops and the mirror segment automatically starts as the active segment instance. The active mirror’s system state is Change Tracking, and all database operations as it logs changes made by transactions.

When the failed segment is repaired and ready to be brought back online, administrators initiate a recovery process and the system goes into Resynchronization state. The recovery process copies the changes made to the mirror onto the repaired segment. The system state is Synchronized when the recovery process completes.

2. Master Mirroring

You can deploy a backup or mirror of the master instance on a separate host machine or on the same host machine. A backup master or standby master serves as a warm standby if the primary master becomes nonoperational. You create a standby master from the primary master while the primary is online.

The primary master continues to provide service to users while a transactional snapshot of the primary master instance is taken. While the transactional snapshot is taken and deployed on the standby master, changes to the primary master are also recorded. After the snapshot is deployed on the standby master, the updates are deployed to synchronize the standby master with the primary master.

Once the primary master and standby master are synchronized, the standby master is kept up to date by the walsender and walreceiver a replication processes. The walreceiver is a standby master process. The walsender process is a primary master process. The two processes use WAL based streaming replication to keep the primary and standby masters synchronized.

Since the master does not house user data, only system catalog tables are synchronized between the primary and standby masters. When these tables are updated, changes are automatically copied to the standby master to keep it current with the primary.

If the primary master fails, the replication process stops, and an administrator can activate the standby master. Upon activation of the standby master, the replicated logs reconstruct the state of the primary master at the time of the last successfully committed transaction. The activated standby then functions as the Greenplum Database master, accepting connections on the port specified when standby master was initialized.

3. Fault Detection and Recovery

The Greenplum Database server (postgres) subprocess named ftsprobe handles fault detection. ftsprobe monitors the Greenplum array; it connects to and scans all segments and database processes at intervals that you can configure.

If ftsprobe cannot connect to a segment, it marks the segment as “down” in the Greenplum Database system catalog. The segment remains nonoperational until an administrator initiates the recovery process.

With mirroring enabled, Greenplum Database automatically fails over to a mirror copy if a primary copy becomes unavailable. The system is operational if a segment instance or host fails provided all data is available on the remaining active segments.

To recover failed segments, a Greenplum administrator runs the gprecoverseg recovery utility. This utility locates the failed segments, verifies they are valid, and compares the transactional state with the currently active segment to determine changes made while the segment was offline. gprecoverseg synchronizes the changed database files with the active segment and brings the segment back online. Administrators perform the recovery while Greenplum Database is up and running.

With mirroring disabled, the system automatically shuts down if a segment instance fails. Administrators manually recover all failed segments before operations resume.