Select runs but insert failing in Greenplum database

posted Jun 13, 2015, 8:23 AM by Sachchida Ojha   [ updated May 20, 2016, 5:52 PM ]
Few days back I faced a unique problem. In one environment of Greenplum database system, inserts on a table (ITAS) was failing but select was not. When I executed the same ITAS in other Greenplum env. it was successful.

Lets understand the problem and then potential solution.

To learn more about this issue including internal GP database structures used to diagnose this issue Sign our Gold Members Club Now.

PostgreSQL's/Greenplum DB MVCC transaction semantics depend on being able to compare transaction ID (XID) numbers: a row version with an insertion XID greater than the current transaction's XID is ""in the future"" and should not be visible to the current transaction. But since transaction IDs have limited size (32 bits at this writing) a cluster that runs for a long time (more than 4 billion transactions) would suffer transaction ID wraparound: the XID counter wraps around to zero, and all of a sudden transactions that were in the past appear to be in the future which means their outputs become invisible. Refer to http://www.postgresql.org/docs/9.4/static/routine-vacuuming.html

There could be 3 possible reason for insert failure.

1. Missing table privilege / No access to table.
2. No space on disks to accept new rows
3. something else

We have checked the first two conditions but that was not met. So started thinking about something else. Before we go further, we need to understand that GP (MPP database) is built on top of PostgreSQL. Both provides MVCC (Multi version concurrent control). 

Multiversion concurrency control (MCC or MVCC), is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.

If someone is reading from a database at the same time as someone else is writing to it, it is possible that the reader will see a half-written or inconsistent piece of data. There are several ways of solving this problem, known as concurrency control methods. The simplest way is to make all readers wait until the writer is done, which is known as a lock. This can be very slow, so MVCC takes a different approach: each user connected to the database sees a snapshot of the database at a particular instant in time. Any changes made by a writer will not be seen by other users of the database until the changes have been completed (or, in database terms: until the transaction has been committed.)

When an MVCC database needs to update an item of data, it will not overwrite the old data with new data, but instead mark the old data as obsolete and add the newer version elsewhere. Thus there are multiple versions stored, but only one is the latest. This allows readers to access the data that was there when they began reading, even if it was modified or deleted part way through by someone else. It also allows the database to avoid the overhead of filling in holes in memory or disk structures but requires (generally) the system to periodically sweep through and delete the old, obsolete data objects. For a document-oriented database it also allows the system to optimize documents by writing entire documents onto contiguous sections of disk—when updated, the entire document can be re-written rather than bits and pieces cut out or maintained in a linked, non-contiguous database structure.

MVCC provides point in time consistent views. Read transactions under MVCC typically use a timestamp or transaction ID to determine what state of the DB to read, and read these versions of the data. Read and write transactions are thus isolated from each other without any need for locking. Writes create a newer version, while concurrent reads access the older version.

Before we proceed further, lets understand these 2 parameters associated with transaction.

xid_stop_limit

The number of transaction IDs prior to the ID where transaction ID wraparound occurs. When this limit is reached, Greenplum Database stops creating new transactions to avoid data loss due to transaction ID wraparound.
VALUE RANGEDEFAULTSET CLASSIFICATIONS
integer 10000000 - 20000000001000000000local

system

restart


xid_warn_limit
The number of transaction IDs prior to the limit specified by xid_stop_limit. When Greenplum Database reaches this limit, it issues a warning to perform a VACUUM operation to avoid data loss due to transaction ID wraparound.
VALUE RANGEDEFAULTSET CLASSIFICATIONS
integer 10000000 - 2000000000500000000local

system

restart


Greenplum Database stop accepting connections and client receiving the following errorDatabase is not accepting commands to avoid wraparound data loss in database "dbname"

You may also see the following warning Server: host.localdomain, database: pgdb WARNING: database "pgdb" must be vacuumed within 1461689882 transactions (seg65 slice2 myhost:16727 pid=156528) HINT: To avoid a database shutdown, execute a full-database VACUUM in "pgdb"

There are 2 configuration settings in GPDB that effect this: xid_warn_limit and xid_stop_limit. xid_warn_limit defaults to 500 million and impacts when the warnings are generated and xid_stop_limit defaults to 1 billion and impacts when the database stops accepting connections. These are both hidden configuration parameters are not recommended to be modified in most cases

To find out the remaining tables and databases which need to be vacuumed you can run the following queries.

--GP Version

SELECT -1, datname, age(datfrozenxid) 
FROM pg_database 
UNION ALL 
SELECT gp_segment_id, datname, age(datfrozenxid)
FROM gp_dist_random('pg_database') 
ORDER BY 3 DESC ;


-- PostgreSQL version

 SELECT datname, age(datfrozenxid) FROM pg_database;


--GP Version

SELECT coalesce(n.nspname, '<Missing schema>'), relname, relkind, relstorage, age(relfrozenxid) 
FROM pg_class c  LEFT JOIN pg_namespace n ON c.relnamespace = n.oid 
WHERE relkind = 'r' 
AND relstorage NOT IN ('x') 
ORDER BY 5 DESC ;

-- PostgreSQL version

SELECT c.oid::regclass as table_name, greatest(age(c.relfrozenxid),age(t.relfrozenxid)) as age 
FROM pg_class c 
LEFT JOIN pg_class t ON c.reltoastrelid = t.oid 
WHERE c.relkind IN ('r', 'm'); 

Note: The age column measures the number of transactions from the cutoff XID to the current transaction's XID.

Please remember to vacuum every table in every database on the Greenplum Cluster at least once every two billion transactions. Some table won't be vacuumed using the default vacuum command in psql . Instead, they could be: Tables under schema/namespace pg_temp_xxxx, where xxxx isa number Some external table generated by gpload .

Comments