Troubleshooting Greenplum Database System and DCA issues
Standby Master DCA showing "not synchronized" . what to do?
Problem: running gpstat show standby master not synchronized. gpadmin@master$gpstate -f 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:-Standby master details 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:----------------------- 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:- Standby address = smdw 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:- Standby data directory = /data/master/gpseg-1 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:- Standby port = 1589 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:- Standby PID = 12498 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:- Standby status = Standby host passive 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:-------------------------------------------------------------- 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:--gp_master_mirroring table 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:-------------------------------------------------------------- 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:--Summary state: Not Synchronized 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:--Detail state: Unexpected error 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:--Log time: 2012-08-29 31:44:21-03 20120829:12:31:05:025122 gpstate:abc0122pl000200:gpadmin-[INFO]:-------------------------------------------------------------- Solution:run the following command in the master from gpadmin account to fix the problem, gpadmin@master$gpinitstandby -n -M fast gpinitstandby Adds and/or initializes a standby master host for a Greenplum Database system. Synopsis gpinitstandby { -s standby_hostname | -r | -n } [-M smart | -M fast] [-a] [-q] [-D] [-L] [-l logfile_directory] gpinitstandby -? | -v Description The gpinitstandby utility adds a backup master host to your Greenplum Database system. If your system has an existing backup master host configured, use the -r option to remove it before adding the new standby master host. Before running this utility, make sure that the Greenplum Database software is installed on the backup master host and that you have exchanged SSH keys between hosts. Also make sure that the master port is set to the same port number on the master host and the backup master host. See the Greenplum Database Installation Guide for instructions. This utility should be run on the currently active primary master host. The utility will perform the following steps: •Shutdown your Greenplum Database system •Update the Greenplum Database system catalog to remove the existing backup master host information (if the -r option is supplied) •Update the Greenplum Database system catalog to add the new backup master host information (use the -n option to skip this step) •Edit the pg_hba.conf files of the segment instances to allow access from the newly added standby master. •Setup the backup master instance on the alternate master host •Start the synchronization process •Restart your Greenplum Database system A backup master host serves as a ‘warm standby’ in the event of the primary master host becoming unoperational. The backup master is kept up to date by a transaction log replication process (gpsyncagent), which runs on the backup master host and keeps the data between the primary and backup master hosts synchronized. If the primary master fails, the log replication process is shut down, and the backup master can be activated in its place by using the utility. Upon activation of the backup master, the replicated logs are used to reconstruct the state of the master host at the time of the last successfully committed transaction. The activated standby master effectively becomes the Greenplum Database master, accepting client connections on the master port and performing normal master operations such as SQL command processing and workload management. Options -a (do not prompt) Do not prompt the user for confirmation. -D (debug) Sets logging level to debug. -l logfile_directory The directory to write the log file. Defaults to ~/gpAdminLogs. -L (leave database stopped) Leave Greenplum Database in a stopped state after removing the warm standby master. -M fast (fast shutdown - rollback) Use fast shut down when stopping Greenplum Database at the beginning of the standby initialization process. Any transactions in progress are interrupted and rolled back. -M smart (smart shutdown - warn) Use smart shut down when stopping Greenplum Database at the beginning of the standby initialization process. If there are active connections, this command fails with a warning. This is the default shutdown mode. -n (resynchronize) Use this option if you already have a standby master configured, and just want to resynchronize the data between the primary and backup master host. The Greenplum system catalog tables will not be updated. -q (no screen output) Run in quiet mode. Command output is not displayed on the screen, but is still written to the log file. -r (remove standby master) Removes the currently configured standby master host from your Greenplum Database system. -s standby_hostname The host name of the standby master host. -v (show utility version) Displays the version, status, last updated date, and check sum of this utility. -? (help) Displays the online help. Examples Add a backup master host to your Greenplum Database system and start the synchronization process: gpinitstandby -s host09 Remove the existing backup master from your Greenplum system configuration: gpinitstandby -r Start an existing backup master host and synchronize the data with the primary master host - do not add a new Greenplum backup master host to the system catalog: gpinitstandby -n Note: Do not specify the -n and -s options in the same command. |
Can't create or drop a table in schema
Problem: Create table command fails with error ERROR: relation "abc" already exists (seg4 sdw1:40004 pid=13649). drop table command fails with error relation does not exists. Step 1: Run gpcheckcat - Run as gpadmin $GPHOME/bin/lib/gpcheckcat -p 5432 -A>checkcat_A.out 2>&1 & Step 2. Look at the checkcat_A.out and log from /home/gpadmin/gpAdminLogs/gpcheckcat_20140315.log **20140315 is the date when you ran the utility. Step 3: Shutdown the database Step 4: Starts Greenplum Database in restricted mode $gptart -R (only database superusers are allowed to connect). step 5: Run gpcheckcat - Run as gpadmin $GPHOME/bin/lib/gpcheckcat -p 5432 -B>checkcat_B.out 2>&1 & Step 6. Run gpcheckcat repair script Step 7: Run gpcheckcat - Run as gpadmin $GPHOME/bin/lib/gpcheckcat -p 5432 -C>checkcat_C.out 2>&1 & Step 8: Fix Catalog issue
Step 9: Run gpcheckcat - Run as gpadmin $GPHOME/bin/lib/gpcheckcat -p 5432 -D>checkcat_D.out 2>&1 & Note: When you run create table script inside an stored procedure/function in Greenplum and function fails then it rolls back only master but table definition in the segments exist. This is the reason of the above problem. |
How to enable and disable healthmon?
To disable healthmond "dca_healthmon_ctl -d" To enable healthmond "dca_healthmon_ctl -e". |
How to diagnose DCA controller x status: error: degraded
To diagnose controller issues,Run the following commands and look at the output. omreport system esmlog omreport system alertlog omreport storage battery omreport storage pdisk controller=0 omreport storage vdisk cat /etc/gpdb-appliance-version omreport storage controller For example [gpadmin@mdw ~]$ omreport system esmlog Embedded System Management (ESM) Log Health : Ok Severity : Ok Date and Time : Wed Jul 13 17:50:45 2011 Description : Log cleared. Severity : Ok Date and Time : Wed Jul 13 17:51:08 2011 Description : This is an OEM record. Severity : Ok Date and Time : Wed Jul 13 17:51:08 2011 Description : An OS graceful shut-down occured. ********************************************************************************************* [gpadmin@mdw ~]$ omreport storage battery List of Batteries in the System Controller PERC H700 Integrated (Slot Embedded) ID : 0 Status : Ok Name : Battery 0 State : Ready Recharge Count : Not Applicable Max Recharge Count : Not Applicable Predicted Capacity Status : Ready Learn State : Idle Next Learn Time : 82 days 20 hours Maximum Learn Delay : 7 days 0 hours Learn Mode : Auto ******************************************************************************************* [gpadmin@mdw ~]$ omreport storage pdisk controller=0 List of Physical Disks on Controller PERC H700 Integrated (Embedded) Controller PERC H700 Integrated (Embedded) ID : 0:0:0 Status : Ok Name : Physical Disk 0:0:0 State : Online Power Status : Spun Up Bus Protocol : SAS Media : HDD Revision : FM08 Failure Predicted : No Certified : Yes Encryption Capable : No Encrypted : Not Applicable Progress : Not Applicable Mirror Set ID : Not Applicable Capacity : 558.38 GB (599550590976 bytes) Used RAID Disk Space : 558.38 GB (599550590976 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : No Vendor ID : DELL(tm) Product ID : ST9600204SS Serial No. : 3WN0GSYQ Part Number : SG07T0DW125310BP02H1A00 Negotiated Speed : 6.00 Gbps Capable Speed : 6.00 Gbps Manufacture Day : 07 Manufacture Week : 48 Manufacture Year : 2010 SAS Address : 5000C50028C725B1 ID : 0:0:1 Status : Ok Name : Physical Disk 0:0:1 State : Online Power Status : Spun Up Bus Protocol : SAS Media : HDD Revision : FM08 Failure Predicted : No Certified : Yes Encryption Capable : No Encrypted : Not Applicable Progress : Not Applicable Mirror Set ID : Not Applicable Capacity : 558.38 GB (599550590976 bytes) Used RAID Disk Space : 558.38 GB (599550590976 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : No Vendor ID : DELL(tm) Product ID : ST9600204SS Serial No. : 3WN0GTQK Part Number : SG07T0DW125310BP018SA00 Negotiated Speed : 6.00 Gbps Capable Speed : 6.00 Gbps Manufacture Day : 07 Manufacture Week : 48 Manufacture Year : 2010 SAS Address : 5000C50028C70169 ID : 0:0:2 Status : Ok Name : Physical Disk 0:0:2 State : Online Power Status : Spun Up Bus Protocol : SAS Media : HDD Revision : FM08 Failure Predicted : No Certified : Yes Encryption Capable : No Encrypted : Not Applicable Progress : Not Applicable Mirror Set ID : Not Applicable Capacity : 558.38 GB (599550590976 bytes) Used RAID Disk Space : 558.38 GB (599550590976 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : No Vendor ID : DELL(tm) Product ID : ST9600204SS Serial No. : 3WN0GTP4 Part Number : SG07T0DW125310BP0175A00 Negotiated Speed : 6.00 Gbps Capable Speed : 6.00 Gbps Manufacture Day : 07 Manufacture Week : 48 Manufacture Year : 2010 SAS Address : 5000C50028C70069 ID : 0:0:3 Status : Ok Name : Physical Disk 0:0:3 State : Online Power Status : Spun Up Bus Protocol : SAS Media : HDD Revision : FM08 Failure Predicted : No Certified : Yes Encryption Capable : No Encrypted : Not Applicable Progress : Not Applicable Mirror Set ID : Not Applicable Capacity : 558.38 GB (599550590976 bytes) Used RAID Disk Space : 558.38 GB (599550590976 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : No Vendor ID : DELL(tm) Product ID : ST9600204SS Serial No. : 3WN0GSTC Part Number : SG07T0DW125310BP02D0A00 Negotiated Speed : 6.00 Gbps Capable Speed : 6.00 Gbps Manufacture Day : 07 Manufacture Week : 48 Manufacture Year : 2010 SAS Address : 5000C50028C72389 ID : 1:0:4 Status : Ok Name : Physical Disk 1:0:4 State : Online Power Status : Spun Up Bus Protocol : SAS Media : HDD Revision : FM08 Failure Predicted : No Certified : Yes Encryption Capable : No Encrypted : Not Applicable Progress : Not Applicable Mirror Set ID : Not Applicable Capacity : 558.38 GB (599550590976 bytes) Used RAID Disk Space : 558.38 GB (599550590976 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : No Vendor ID : DELL(tm) Product ID : ST9600204SS Serial No. : 3WN0G0X5 Part Number : SG07T0DW125310BP01BLA00 Negotiated Speed : 6.00 Gbps Capable Speed : 6.00 Gbps Manufacture Day : 07 Manufacture Week : 48 Manufacture Year : 2010 SAS Address : 5000C50028C7068D ID : 1:0:5 Status : Ok Name : Physical Disk 1:0:5 State : Ready Power Status : Spun Up Bus Protocol : SAS Media : HDD Revision : FM08 Failure Predicted : No Certified : Yes Encryption Capable : No Encrypted : Not Applicable Progress : Not Applicable Mirror Set ID : Not Applicable Capacity : 558.38 GB (599550590976 bytes) Used RAID Disk Space : 558.38 GB (599550590976 bytes) Available RAID Disk Space : 0.00 GB (0 bytes) Hot Spare : Global Vendor ID : DELL(tm) Product ID : ST9600204SS Serial No. : 3WN0GV1P Part Number : SG07T0DW125310BP01LYA00 Negotiated Speed : 6.00 Gbps Capable Speed : 6.00 Gbps Manufacture Day : 07 Manufacture Week : 48 Manufacture Year : 2010 SAS Address : 5000C50028C70D59 ********************************************************************************************* [gpadmin@mdw ~]$ omreport storage vdisk List of Virtual Disks in the System Controller PERC H700 Integrated (Embedded) ID : 0 Status : Ok Name : boot State : Ready Encrypted : Not Applicable Layout : RAID-5 Size : 48.00 GB (51539607552 bytes) Device Name : /dev/sda Bus Protocol : SAS Media : HDD Read Policy : Adaptive Read Ahead Write Policy : Write Back Cache Policy : Not Applicable Stripe Element Size : 128 KB Disk Cache Policy : Disabled ID : 1 Status : Ok Name : swap State : Ready Encrypted : Not Applicable Layout : RAID-5 Size : 48.00 GB (51539607552 bytes) Device Name : /dev/sdb Bus Protocol : SAS Media : HDD Read Policy : Adaptive Read Ahead Write Policy : Write Back Cache Policy : Not Applicable Stripe Element Size : 128 KB Disk Cache Policy : Disabled ID : 2 Status : Ok Name : data State : Ready Encrypted : Not Applicable Layout : RAID-5 Size : 2,137.50 GB (2295123148800 bytes) Device Name : /dev/sdc Bus Protocol : SAS Media : HDD Read Policy : Adaptive Read Ahead Write Policy : Write Back Cache Policy : Not Applicable Stripe Element Size : 128 KB Disk Cache Policy : Disabled ********************************************************************************************** [gpadmin@mdw ~]$ cat /etc/gpdb-appliance-version 1.2.0.0s ********************************************************************************************* [gpadmin@mdw ~]$ omreport storage controller Controller PERC H700 Integrated (Embedded) Controllers ID : 0 Status : Ok Name : PERC H700 Integrated Slot ID : Embedded State : Ready Firmware Version : 12.10.2-0004 Minimum Required Firmware Version : Not Applicable Driver Version : 00.00.05.38-rh1 Minimum Required Driver Version : Not Applicable Storport Driver Version : Not Applicable Minimum Required Storport Driver Version : Not Applicable Number of Connectors : 2 Rebuild Rate : 30% BGI Rate : 30% Check Consistency Rate : 30% Reconstruct Rate : 30% Alarm State : Not Applicable Cluster Mode : Not Applicable SCSI Initiator ID : Not Applicable Cache Memory Size : 512 MB Patrol Read Mode : Auto Patrol Read State : Stopped Patrol Read Rate : 30% Patrol Read Iterations : 5 Abort Check Consistency on Error : Disabled Allow Revertible Hot Spare and Replace Member : Enabled Load Balance : Not Applicable Auto Replace Member on Predictive Failure : Disabled Redundant Path view : Not Applicable CacheCade Capable : Not Applicable Persistent Hot Spare : Disabled Encryption Capable : Yes Encryption Key Present : No Encryption Mode : None Spin Down Unconfigured Drives : Disabled Spin Down Hot Spares : Disabled |
How to diagnose DCA memory device x status: error: critical
Run the following command omreport chassis memory [gpadmin@mdw ~]$ omreport chassis memory You will see output like this Memory Information Health : Ok Memory Operating Mode Fail Over State : Inactive Memory Operating Mode Configuration : Optimizer Attributes of Memory Array(s) Attributes of Memory Array(s) Location : System Board or Motherboard Use : System Memory Installed Capacity : 49152 MB Maximum Capacity : 196608 MB Slots Available : 12 Slots Used : 6 Error Correction : Multibit ECC Total of Memory Array(s) Total Installed Capacity : 49152 MB Total Installed Capacity Available to the OS : 48161 MB Total Maximum Capacity : 196608 MB Details of Memory Array 1 Index : 0 Status : Ok Connector Name : DIMM_A1 Type : DDR3 - Synchronous Registered (Buffered) Size : 8192 MB Index : 1 Status : Ok Connector Name : DIMM_A2 Type : DDR3 - Synchronous Registered (Buffered) Size : 8192 MB Index : 2 Status : Ok Connector Name : DIMM_A3 Type : DDR3 - Synchronous Registered (Buffered) Size : 8192 MB Index : Status : Unknown Connector Name : DIMM_A4 Type : [Not Occupied] Size : Index : Status : Unknown Connector Name : DIMM_A5 Type : [Not Occupied] Size : Index : Status : Unknown Connector Name : DIMM_A6 Type : [Not Occupied] Size : Index : 3 Status : Ok Connector Name : DIMM_B1 Type : DDR3 - Synchronous Registered (Buffered) Size : 8192 MB Index : 4 Status : Ok Connector Name : DIMM_B2 Type : DDR3 - Synchronous Registered (Buffered) Size : 8192 MB Index : 5 Status : Ok Connector Name : DIMM_B3 Type : DDR3 - Synchronous Registered (Buffered) Size : 8192 MB Index : Status : Unknown Connector Name : DIMM_B4 Type : [Not Occupied] Size : Index : Status : Unknown Connector Name : DIMM_B5 Type : [Not Occupied] Size : Index : Status : Unknown Connector Name : DIMM_B6 Type : [Not Occupied] Size : |
How to diagnose Greenplum appliance issue using the 'omreport' tool
Disks: omreport storage controller | grep -e "^ID" -e "^Status" omreport storage pdisk controller=0 | grep -e "^ID" -e "^State" omreport storage vdisk controller=0 | grep -e "^ID" -e "^State" You want to see that the Status is OK for the controller and the State is "Online" for all physical and virtual disks. Memory: omreport chassis memory | grep ^Health - Should show OK. omreport chassis memory | grep -C 1 "Total Installed" | head -3 - Should indicate that 49152 MB are installed. System Events: omreport system alertlog - Lists alerts issued. Review list for anything that's unexpected. omreport system esmlog - Hardware events logged. Review list for anything that's unexpected. omreport system postlog - Lists the power on self test log. Review list for anything that's unexpected. Chassis commands: omreport chassis biossetup (must be root to run) - Reports all the non-default BIOS settings to validate that Arrow has the BIOS to spec. omreport chassis firmware - Reports the iDRAC firmware version. omreport chassis bios - Reports the server BIOS version. omreport chassis memory - Lists the installed memory by board slot. omreport chassis processors - Lists the installed processors by processor slot. omreport chassis nics - Lists the NICs and includes the slot each NIC occupies (embedded, PCIE3, etc.). omreport chassis slots | more - Lists the expansion slot info. Look for cards that should be in the list but aren't. omreport chassis pwrsupplies | grep -e "^Power Supply Redundancy" -e ^Index -e ^Status -e "^Online Status" - Shows the overall status of the power supply redundancy and the status of each individual power supply. omreport chassis temps - Lists the temps of various bits. omreport chassis batteries - Lists the status of the CMOS battery. omreport chassis volts | grep -e ^Index -e ^Status -e ^Probe - Lists the voltages of the individual parts and whether that's OK. omconfig chassis leds led=identify flash=on timeout=10 - Flashes the local machine's identity light for 10 seconds. If no timeout is given, sets i to flash permanently. omconfig chassis leds led=hdfault action=clear - Clears an HD fault light. omreport chassis fans | grep -e Index -e Status - Lists the status of the fans. omreport chassis processors | grep -e Index -e Status - Lists the status of the processors. ipmitool -H [hostname or IP address] -U root -P calvin sel list - Shows the server event list of the host owning the iDRAC named in the -H parameter |
1-7 of 7