Redistributing Tables data across GPDB Segments

Redistributing Tables

After successfully creating an expansion schema, you can bring Greenplum Database back online and redistribute tables across the entire array.

You can redistribute tables with gpexpand at specified intervals, targeting low-use hours when the utility’s CPU usage and table locks will have the least impact on database operations. Also, you can rank tables to ensure that the largest or most critical tables are redistributed in your preferred order.

While the redistribution of tables is underway:

1. Any new tables or partitions created will be distributed across all segments exactly as they would be under normal operating conditions.

2. Queries will use all segments, even though the relevant data may not have yet been redistributed to the tables on the new segments.

3. The table or partition currently being redistributed will be locked and unavailable for read or write operations. When its redistribution is completed, normal operations resume.

Ranking Tables for Redistribution

For large systems, it is recommended to control the order in which tables are redistributed by adjusting their rank values in the expansion schema. This allows you to redistribute heavily-used tables first and minimize the performance hit on the system. The amount of free disk space available can affect table ranking;

To rank tables for redistribution by updating rank values in gpexpand.status_detail, connect to Greenplum Database using psql or another supported client. Update gpexpand.status_detail with commands like the following:

=> UPDATE gpexpand.status_detail SET rank= 10;

UPDATE gpexpand.status_detail SET rank=1 WHERE fq_name = ‘public.lineitem’;

UPDATE gpexpand.status_detail SET rank=2 WHERE fq_name = ‘public.orders’;

These commands first lower the priority of all tables to 10, and then assign a rank of 1 to lineitem and then a rank of 2 to orders. When table redistribution begins, lineitem will be redistributed first, followed by orders and then all other tables in gpexpand.status_detail.

Note: For any table that you do not want to redistribute, you must remove the corresponding entry from gpexpand.status_detail.

Redistributing Tables Using gpexpand

To redistribute tables with gpexpand

1.Log in on the master host as the user who will be running your Greenplum Database system (for example, gpadmin).

2.Run the gpexpand utility. Optionally, you can use either the -d or -e option to define the time period for the expansion session. For example, to run the utility for a maximum of 60 consecutive hours:

$ gpexpand -d 60:00:00

The utility redistributes tables until the last table in the schema is successfully marked completed, or until the specified duration or end time is reached. Each time a session is started or finished, the utility updates the status and updated time in gpexpand.status.

Monitoring Table Redistribution

At any time during the process of redistributing tables, you can query the expansion schema. The view gpexpand.expansion_progress provides a summary of the current progress, including calculations of the estimated rate of table redistribution and estimated time to completion. The table gpexpand.status_detail can be queried for per-table status information.

Viewing Expansion Status

Because the estimates in gpexpand.expansion_progress are based on the rates achieved for each table, the view cannot calculate an accurate estimate until the first table has completed. Calculations are restarted each time you re-run gpexpand to start a new table redistribution session.

To monitor progress by querying gpexpand.expansion_progress, connect to Greenplum Database using psql or another supported client. Query gpexpand.expansion_progress with a command like the following:

=# select * from gpexpand.expansion_progress;

name | value

------------------------------+-----------------------

Bytes Left | 5534842880

Bytes Done | 142475264

Estimated Expansion Rate | 680.75667095996092 MB/s

Estimated Time to Completion | 00:01:01.008047

Tables Expanded | 4

Tables Left | 4

(6 rows)

Viewing Table Status

The table gpexpand.status_detail stores status, last updated time, and other useful information about each table in the schema. To monitor the status of a particular table by querying gpexpand.status_detail, connect to Greenplum Database using psql or another supported client. Query gpexpand.status_detail with a command like the following:

=> SELECT status, expansion_started, source_bytes FROM gpexpand.status_detail WHERE fq_name = ‘public.sales’;

status | expansion_started | source_bytes

-----------+----------------------------+--------------

COMPLETED | 2009-02-20 10:54:10.043869 | 4929748992

(1 row)