Greenplum database env - Best practice
INVESTIGATING AND PREVENTING PERFORMANCE ISSUES
Every database statement executes on several levels: 1. Hardware level (hardware performance) 2. OS level (processes, resource utilization, configuration) 3. Database level (statement, objects involved, execution plans, etc.) When you note a performance problem in statement execution, verify performance at each level. Verifying Performance at the Hardware Level: Disk I/O and network performance must be optimal to achieve the best performance. Verify that the hardware components perform according to specifications. This is critical for disk input/output activity and for network activity. Ensure that disk I/O meets the hardware specifications for each segment node. Because Greenplum Database is distributed, network performance is critical to the overall database performance. To help you evaluate your hardware performance, Greenplum provides the gpcheckperf tool. gpcheckperf runs tests on all segments to determine the performance of the disk and network subsystems. gpcheckperf must be run on a quiet system. This ensures that the results are consistent across runs and with the hardware specifications, and avoids other processes 'stealing' resources, causing a false impression of sub-optimal performance. Verifying Performance at the OS Level: The operating system manages processes and platform resources. Check the following OS-level items: 1. CPU load: Are Postgres processes CPU-starved by other processes? 2. Memory: Are database processes swapping? 3. Network: Are NICs performing properly and without errors or dropping packets? OS tools such as vmstat, iostat, top, and netstat -i provide the best picture of OS-level performance. Another good source is the OS log files, including /var/log/ messages and dmesg (the kernel ring buffer). Verifying Performance at the Database Level Database-level performance problems are generally related to the following: 1. The query execution plan prepared by the Greenplum query optimizer 2. Regular maintenance of the database catalog and user tables (Vacuum) 3. Minimizing data skew Verify the Query Execution Plan : Performance at the database level is mostly associated with the execution plan that the Greenplum optimizer prepares and executes for a statement. The first step in analyzing query performance problems is to run Explain Analyze on the slow statement and analyze the output. Explain Analyze provides information such as the plan tree, estimated row counts, actual row counts, and time spent in each plan node. If the statement does not finish, you can collect the Explain output, but the output contains only the plan tree and estimated row counts, no execution statistics. If the estimated row counts and actual row counts in Explain Analyze do not match for table scan nodes, this usually means that the tables are not analyzed. Accurate table statistics are necessary for proper plan generation and proper optimization. If after Explain Analyze the table statistics are not correct, consider increasing the target statistics level for the specific column. See Alter Table in the Greenplum Database Administration Guide for details. Perform Regular Maintenance on the Catalog and User Tables : Make sure that regular maintenance (Vacuum) is performed on the catalog and user tables. Not performing regular Vacuum operations can result in table „bloat? and poor performance. Bloat affects tables that had sizeable numbers of rows deleted or updated, and severely impacts the catalog when many table manipulation operations are performed, such as create, drop, alter, and so on. This can significantly affect performance on user queries; they require data from the catalog tables for execution. Check Data Skew : Data skew means that data is not uniformly distributed on all segments. The result is that one segment spends more time working than others, and the entire query finishes slowly; queries finish only when all segments finish their processing. See the Greenplum Database Administration Guide for information about checking for data skew. Configure and maintain your Greenplum Database cluster properly and it will run fast and keep your users happy! |
Avoiding “Out of Memory” Issues
DBA's occasionally experience “out of memory” errors that can cause failed queries and degrade system performance. Fortunately, the Greenplum Database provides facilities to avoid this. We will discuss two of those facilities: the gp_vmem_protect_limit parameter, and Greenplum resource queues. The parameters and techniques mentioned here are explained in detail in the Greenplum Database Administrator Guide. The gp_vmem_protect_limit parameter: The “gp_vmem_protect_limit” parameter sets the amount of memory that all processes of an active segment instance can consume. Queries that cause the limit to be exceeded will be cancelled. Note that this is a local parameter and must be set for each segment in the system. The system must be restarted for parameter changes to take effect. How to set the gp_vmem_protect_limit As a general rule-of-thumb, gp_vmem_protect_limit should be set to: ( X * physical_memory_in_MB ) /#_of_primary_segments X should be a value between 1.0 and 1.5. A value of X=1.0 offers the best overall system performance; a value of X=1.5 may impact system performance because of swap activity but will result in fewer canceled queries. For example, to set gp_vmem_protect_limit conservatively (X=1.0) on a segment host with 16GB (16384 MB) of physical memory with 4 primary segment instances, the calculation would be: (1 * 16384) / 4 = 4096. The MEMORY_LIMIT parameter for Greenplum Resource Queues:Greenplum resource queues provide a way to manage and prioritize workloads. Resource queues can be created with a MEMORY_LIMIT setting to restrict the total amount of memory that queries can consume in any segment instance. Queries that cause a queue to exceed the MEMORY_LIMIT must wait until queue resources are free before they can execute. By assigning each user to a queue and limiting the amount of memory queues can consume, administrators can ensure proper resource allocation across the system. Note that roles with the SUPERUSER attribute are exempt from queue limits. How to set MEMORY_LIMIT to avoid Out of Memory errors:As a general rule-of-thumb, the sum of all the MEMORY_LIMITs across all the queues should be no more than 90% of the gp_vmem_protect_limit. Common Out of Memory Errors The two most common errors are described below. They look similar but have different reasons and solutions. Error code 53200 Example: "ERROR","53200","Out of memory. Failed on request of size 156 bytes. (context 'CacheMemoryContext') (aset.c:840)" Description: The system canceled a query because a segment server’s OS did not have enough memory to satisfy a postmaster process allocation request. How to Avoid 1. Set gp_vmem_protect_limit according to the formula above. 2. Adding memory can help greatly if lowering gp_vmem_protect_limit results in too many canceled queries.(Gp_vmem_protect_limit can be raised after adding memory.) 3. Adding swap space may help, although increased swap activity will impact system performance. Error code 53400 Example "ERROR","53400","Out of memory (seg13 slice13 sdw1-1:40001 pid=10183)","VM Protect failed to allocate 8388608 bytes, 6 MB available" Description The system canceled a query because a postmaster process tried to request more memory than the gp_vmem_protect_limit parameter allows. How to Avoid 1. Make sure the sum of all MEMORY_LIMITs across all active queues is <= 90% of gp_vmem_protect_limit. 2. Increase gp_vmem_protect_limit, if possible, using the formula described above. 3. Ensure the system is not unbalanced (i.e., some segments down). Use gpstate -e to verify. |
Avoiding Data Skew - Best Practice
Set up even data distribution when you create a table by using a DISTRIBUTED BY clause in the CREATE TABLE statement. Setting the DISTRIBUTED BY based on the unique or primary key, or a key that is relatively unique, is an easy way to avoid skew. You can use the distributed by clause in an ALTER TABLE statement to change distribution in an existing table. 1. You can specify more than one distribution column with a clause like DISTRIBUTED BY (prod_id, name, company_id). If the combined columns are unique or nearly so, even distribution is guaranteed. Otherwise use your knowledge of the data to pick a distribution key that will not have skew. 2. With no DISTRIBUTED clause, the distribution key defaults to the PRIMARY KEY or UNIQUE constraint on the table. If neither constraint is specified the first column of the table is the default distribution key. If that column is not suitable, data skews will occur. In that case you need to explicitly set the DISTRIBUTED BY to a column or set of columns that will distribute evenly. 3. If you cannot figure out a combination of columns that will distribute without skew, using the DISTRIBUTED RANDOMLY clause will distribute the data randomly and evenly. 4. Use the views gp_skew_coefficients and gp_skew_idle_fractions in gp_toolkit to find out how data is distributed on the segments. Skewed Distribution Can 1. Degrade overall performance 2. Overflow a disk 3. Significantly slow down query processing Choosing a Distribution Key 1. Optimizes space usage 2. Improves I/O performance 3. Bases the distribution on your knowledge of the data and the queries that are run against it |
HANDLING BLOAT WITH VACUUM AND VACUUM FULL VACUUM
PostgreSQL, Greenplum Database, and Oracle use a database implementation called MVCC (multi-version concurrency control) to avoid locking and improve throughput. MVCC causes the database to make private versions or copies of data at runtime, which become current after a commit. As a result, this leaves parts of database memory blocks marked as "dirty" and unavailable for reuse, forcing the database to allocate new memory blocks. This allocation is called “bloat”. Vacuuming the database or table regularly prevents bloat, making the dirty blocks available for reuse and preventing allocation of new blocks. Vacuuming should be part of regular maintenance to reclaim used transaction IDs (XIDs) or 32-bit signed integers. Unless you vacuum regularly, you may hit some built-in limits that prevent XID wraparound. You should avoid this because hitting these limits means the GPDB runtime prevents further activity. To enable VACUUM, set two memory-related config parameters: max_fsm_relations and max_fsm_pages. These two parameters relate to Free Map Space (fsm). You can use the contents of the system catalog table pg_class to estimate the settings for max_fsm_*. Note: Before calculating values for these parameters, Greenplum recommends you run ANALYZE. The relpages values in pg_class are estimates updated by VACUUM, ANALYZE, and other system operations. When ANALYZE completes, calculate an upper bound on max_fsm_relations: 1. Run the following SQL command on all databases and sum the results: SELECT COUNT(*) FROM pg_class; 2. Estimate max_fsm_pages: SELECT MAX(relpages) FROM pg_class; Or SELECT SUM(relpages) FROM pg_class; Note: If the values returned for max_fsm_relations or max_fsm_pages are less than the calculated values, VACUUM will leave the space untouched. In such cases, you may see messages in the pg_log files saying that VACUUM failed and that you need to increase the value of max_fsm_relation or max_fsm_paqes. VACUUM FULL In addition to simple VACUUM, which marks dirty space in blocks available for reuse, you can use VACUUM FULL to reorganize and compact block storage for a table. VACUUM FULL functions similarly to a file system de-fragmentation, rewriting data to new blocks. However, VACUUM FULL takes much longer and establishes exclusive full table locks for the duration of the process. A VACUUM FULL will reclaim all expired row space, but is a very expensive operation and may take an unacceptably long time to finish on large, distributed Greenplum Database tables. If you do get into a situation where the free space map has overflowed, it may be more timely to recreate the table with a CREATE TABLE AS statement and drop the old table. A VACUUM FULL is not recommended in Greenplum Database. Option 1. ALTER TABLE sales SET WITH (REORGANIZE=TRUE); This leaves any indexes etc. in tact but rewrites the underlying table and, hence, eliminates all dead data. Option 2 SET gp_auto_stats_mode = none;CREATE TABLE xyz AS SELECT * FROM abc <ADD YOUR DISTRIBUTION AND STORAGE OPTIONS HERE>; DROP TABLE abc; ALTER TABLE xyz RENAME TO abc; ANALYZE abc; ALTER TABLE abc ADD PRIMARY KEY (<YOUR_PK_COLUMN_NAME>); This only writes the data one time. Create temp table t1 as select * from <Table Name>; Truncate <Table Name>; set gp_autostats_mode = none; Insert into <Table Name> select * from t1; Drop table t1; analyze <Table Name>; This only writes the data 2 times.
Table and Index Bloat view
CREATE OR REPLACE VIEW bloat AS SELECT schemaname, tablename, reltuples::bigint, relpages::bigint, otta, ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS tbloat, relpages::bigint - otta AS wastedpages, bs*(sml.relpages-otta)::bigint AS wastedbytes, pg_size_pretty((bs*(relpages-otta))::bigint) AS wastedsize, iname, ituples::bigint, ipages::bigint, iotta, ROUND(CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages/iotta::numeric END,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE ipages::bigint - iotta END AS wastedipages, CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes, CASE WHEN ipages < iotta THEN pg_size_pretty(0) ELSE pg_size_pretty((bs*(ipages-iotta))::bigint) END AS wastedisize FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml WHERE sml.relpages - otta > 0 OR ipages - iotta > 10 ORDER BY wastedbytes DESC, wastedibytes DESC; |
DOUBLE FAULTS - HOW TO AVOID IT?
Definition:
A double fault occurs when a Greenplum Database primary and corresponding mirror
segment pair fails. When you have a double fault, the Greenplum Database is not
available for query processing.
(Reference: Greenplum Database Administration Guide, Chapter 15: Enabling High Availability Features,available on Powerlink.) Avoidance:
Avoid double faults by monitoring single faults in the Greenplum Database as
follows:
Use the Greenplum utility
gpstate –e from the command line to monitor if segments are in changetracking or
if they are re-synchronizing from fault.
Run
select * from gp_segment_configuration where status=’d’ from the psql prompt to check for failed segments. If you find even one failed
segment in the output of the above two commands, immediately run gprecoverseg.
(Reference: Greenplum Database Administration Guide, Appendix B: Management
Utility Reference, available on Powerlink.)
Monitor: You can monitor for
failed segments in the following ways: Greenplum Database
Performance Monitor (gpperfmon): Install and configure the Greenplum Performance
Monitor console package to use the UI Dashboard. The dashboard clearly indicates
segment failures in the Database Health section. (Reference: Greenplum
Performance Monitor Administration Guide, available on Powerlink.)
Enabling System
Alerts and Notifications: You can configure the Greenplum Database to
trigger SNMP (Simple Network Management Protocol) alerts using Greenplum?s
daemon gpsnmpd. You can configure alerts for segment failures and messages such
as FATAL, PANIC, etc. (Reference:Greenplum Database Administration Guide,
Chapter 18: Monitoring a Greenplum System, available on Powerlink.)
Greenplum Utility
gpstate: gpstate has various options that can help monitor segments at
fault.
For example, run gpstate with the following parameters to learn more about your system?s health: –e shows segments with
mirror state issues.
–f shows standby master details. If the master standby is out of sync, then run gpinitstandby. –s shows the detailed status of gpstate. Recovery from Double
Faults: In case of a double fault, restart the Greenplum Database and
then run gprecoverseg. To confirm that segments have recovered, run gpstate with
appropriate options (-m for mirrors, –s for detailed status) or select * from
gp_segment_configuration. (Reference: Greenplum Database Administration Guide,
Chapter 15: Enabling High Availability Features, available on
Powerlink.)
|
MONITORING DATABASE LOG FILES
Monitoring database logs is a very important activity and can help in investigating issues as well as preventing database problems. Greenplum Database provides gp_toolkit to allow database administrators to examine log file entries. You can mine log file data using external tables mapped to the log files. See the Greenplum Database Administration Guide for details. As Greenplum Database is a Massively Parallel Processing (MPP) database, it consists of multiple instances. Every segment instance (master, standby, and segment) has a log file associated with it. Greenplum Database rotates log files every day at midnight and generates a new log file on every database start. The timestamp in the log file reflects the time at which the log file was rotated (midnight = 00:00:00) or when the database started. The instance log file is located in the pg_log subdirectory of the data directory: <data_directory>/pg_log. See pg_filespace_entry for data directory locations. The format for database log file names is gpdb-YYYYMM-DD_HHMISS.log, where YYYY is year, MM is month, DD is day, HH is hour, MI is minutes and SS is seconds. The master log file is located in $MASTER_DATA_DIRECTORY/pg_log. The master log file contains log entries about activity on the master segment; statements, errors, and information such as FTS messages. Database log files are CSV (commaseparated values) text files. Entries can be multi-line; for example, a multi-line SQL query or a multi-line stack trace. Following are the log file elements in order: <timestamp>, <database user>, <database name>, <process pid>, <thread id>, <host>, <port>, <session time>, <transaction id>, <session id> (in conXXX format), <command number within session>, <segment>, <plan slice>, <distributed transaction id>, <local transaction id>, <subtransaction id>, <severity level>, <state>, <message text>, <details>, <hint>, <query>, <query position>, <context>, <debugging information>, <cursor position>, <funciton>, <file>, <line>, <process stack> One of the most important values is <severity level>. Possible values are PANIC, FATAL, ERROR, WARNING, LOG, NOTICE, and INFO. You can control the logging actions based on severity, such as FATAL, or topic, such as setting the log_duration parameter to ON to log statement durations. See the log_min_messages parameter in the Greenplum Database Administration Guide for information about logging actions based on severity level. See the log_* parameters in the parameter list in the Greenplum Database Administration Guide for information about logging actions based on topics. Use the <session id> value to correlate log entries with sessions in pg_stat_activity. The value of conXYZ in the log file can be related to pg_stat_activity.sess_id (XYZ). For example, a session with pg_stat_activity.sess_id=12345 will have log entries marked con12345. The <segment> value is most useful in the master log file as it reports from which segment the log entry was received. The master segment is segment=-1. The best way to analyze and review log files for errors is to examine them directly and provide the files and selected log entries to EMC Support for further analysis. Greenplum Database also provides several objects in the gp_toolkit schema to allow database administrators to examine log file entries. You can mine log file data using external tables mapped to the log files. See the Greenplum Database Administration Guide for details. |
Monitoring Greenplum Database Locks
When a database statement
seems to be "hung" or executes very slowly, we often find that SQL statements
are holding conflicting locks. It is best practice to run a few basic checks
before calling for help. The database catalog tables can tell what each SQL
statement is doing, and will reveal if there is a locking problem. The
"pg_stat_activity" catalog table contains information about all the connected
sessions – the process ID of the server process, the session ID, database user,
current query executing, when the current query started, and finally whether the
session is waiting for somebody else or working. If a session is waiting, it
means another statement is holding a lock that is keeping it from executing.
The
statement needs to wait for a while until the other session releases the lock. The "pg_locks" catalog table contains a list of all locks that database sessions hold and the lock description – the type of the lock, which database the lock is in, which database object is locked, the process ID of the session, the lock mode, the session id, and whether the lock is granted or the session is waiting. If the "hung" session has a non-granted lock entry (granted=false), then it is waiting for another session to release a conflicting lock (granted=true). 1. Queries being executed:
Select * from pg_stat_activity where waiting is false;
2. Queries not being executed and waiting in queue:
Select * from pg_stat_activity where waiting is true;
3. How long since a query has been submitted:
Select sess_id,substr(current_query,1,20),now()-query_start from
pg_stat_activity;
4. Table to find more on what is the query waiting on:
Select * from pg_locks where granted is false;
Select * from pg_locks where mppsessionid=<sess_id from pg_Stat_activity>; It is easy to find out which session has the conflicting lock
using the "pg_locks" table. Both tables ("pg_stat_activity" and "pg_locks")
share a couple of common columns, which allows for joins and easy correlation
between sessions queries and locks.
In addition, the "gp_toolkit" database schema contains two views to help with locks: "gp_toolkit.gp_locks_on_relation" and "gp_toolkit.gp_locks_on_resqueue". The first one shows the locks held and wanted on database objects, and the second shows the locks on resource queues (resource queues in Greenplum database are implemented with the same infrastructure as database locks). |
Killing hung process
GPKILL UTILITY It happens–sometimes there‟s an issue with Greenplum Database and it seems that your best option is to kill the Greenplum Database process. Using „kill -9‟ or „kill -11‟ can kill critical Greenplum processes and leave the database in an unstable state, even bringing your clusters down. There is a better way. Release 4.2.1 of Greenplum Database provides the gpkill utility to protect you from accidentally killing key Greenplum Database processes. Depending on your specific Greenplum Database installation process, the kill utility can be aliased to the gpkill utility, which redirects the kill command‟s default path to gpkill. If you attempt to use the „kill‟ command to kill a critical process such as postmaster, gpkill displays an error message and asks you to contact EMC Support. So even if you accidentally run the kill command on a gpkill-enabled host, you can‟t hurt anything. Other ways to kill the session Killing Sessions – Easiest thing to do is kill queries in command center. You can also do the following: Find the session you want to kill: select usename, procpid, waiting, query_start from pg_catalog.pg_stat_activity; Kill the session: select pg_terminate_backend('<procpid>’); Note: pg_cancel_backend and pg_terminate_backend, there was issues
reported in the past(Ref Issue #17995 DDL and Utility Statements
) In rare circumstances, pg_cancel_backend and
pg_terminate_backend did not terminate sessions as expected. Systems could
not kill queries using pg_cancel_backend or pg_terminate_backend while they were waiting on locks for unique indexes. So it may or may not work. This issue was resolved in Greenplum database version 4.2.2.4 but exist in the older version. CREATE OR REPLACE FUNCTION kill_hungprocess(pid integer) RETURNS boolean AS $body$ DECLARE qry boolean; BEGIN qry := (SELECT pg_catalog.pg_cancel_backend(procpid) FROM pg_stat_activity WHERE usename=(select session_user) AND procpid=pid); RETURN qry; END; $body$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE RETURNS NULL ON NULL INPUT; GRANT EXECUTE ON FUNCTION kill_hungprocess(pid integer) TO sachi; The function below gives privilege to Non privileged user to kill any session. This function is created by gpadmin user. CREATE OR REPLACE FUNCTION kill_hungprocess(pid integer) RETURNS boolean AS $body$ DECLARE qry boolean; BEGIN qry := (select pg_catalog.pg_cancel_backend(pid)); RETURN qry; END; $body$ LANGUAGE plpgsql SECURITY DEFINER VOLATILE RETURNS NULL ON NULL INPUT; GRANT EXECUTE ON FUNCTION kill_hungprocess(pid integer) TO sachi; |
1-10 of 10