Partitioning in Greenplum

Table partitioning addresses the problem of supporting very large tables, such as fact tables, by allowing you to divide them into smaller and more manageable pieces. Partitioned tables can improve query performance by allowing the Greenplum Database query planner to scan only the relevant data needed to satisfy a given query rather than scanning the entire contents of a large table. Partitioned tables can also be used to facilitate database maintenance tasks, such as rolling old data out of the data warehouse.

When you partition a table in Greenplum Database, you are actually creating a top-level (or parent) table with one or more levels of sub-tables (or child tables). Internally, Greenplum Database creates an inheritance relationship between the top-level table and its underlying partitions (similar to the functionality of the INHERITS clause of PostgreSQL).

Using the partition criteria defined during table creation, each partition is created with a distinct CHECK constraint, which limits the data that table can contain. The CHECK constraints are also used by the query planner to determine which table partitions to scan in order to satisfy a given query predicate.

Partition hierarchy information is stored in the Greenplum system catalog so that rows inserted into the top-level parent table appropriately propagate to the child table partitions. Any changes to the partition design or table structure must be done through the parent table using the PARTITION clauses of the ALTER TABLE command.

Greenplum Database supports both range partitioning (division of data based on a numerical range, such as date or price) or list partitioning (division of data based on a list of values, such as sales territory or product line), or a combination of both types.

Partitioned tables are also distributed across Greenplum Database segments as is any non-partitioned table. Table distribution in Greenplum Database physically divides a table across the Greenplum segments to enable parallel query processing. Table partitioning is a tool to logically divide big tables to improve query performance and facilitate data warehouse maintenance tasks. Partitioning does not change the physical distribution of the table data across the segments.

Deciding on a Table Partitioning Strategy

Not all tables are good candidates for partitioning. If the answer is yes to all or most of the following questions, then table partitioning is a viable database design strategy for improving query performance. If the answer is no to most of the following questions, then table partitioning is not the right solution for that table:

1.Is the table large enough? Large fact tables are good candidates for table partitioning. If you have millions or billions of records in a table, you will see performance benefits from logically breaking that data up into smaller chunks. For smaller tables with only a few thousand rows or less, the administrative overhead of maintaining the partitions will outweigh any performance benefits you might see.

2.Are you experiencing unsatisfactory performance? As with any performance tuning initiative, a table should be partitioned only if queries against that table are producing slower response times than desired.

3. Do your query predicates have identifiable access patterns? Examine the WHERE clauses of your query workload and look for table columns that are consistently used to access data. For example, if most of your queries tend to look up records by date, then a monthly or weekly date-partitioning design might be beneficial. Or if you tend to access records by region, consider a list-partitioning design to divide the table by region.

4. Does your data warehouse maintain a window of historical data? Another consideration for partition design is your organization’s business requirements for maintaining historical data. For example, your data warehouse may only require you to keep the past twelve months worth of data. If the data is partitioned by month, you can easily drop the oldest monthly partition from the warehouse, and load current data into the most recent monthly partition.

5.Can the data be divided into somewhat equal parts based on some defining criteria? You should choose partitioning criteria that will divide your data as evenly as possible. If the partitions contain a relatively equal number of records, query performance improves based on the number of partitions created. For example, by dividing a large table into 10 partitions, a query will execute 10 times faster than it would against the unpartitioned table (provided that the partitions are designed to support the query’s criteria).

Range Partitions

- Defining Date Range Table Partitions

- Defining Numeric Range Table Partitions

List Partitions

 Defining List Table Partitions

Composite Partitions 

Defining Multi-level Partitions

Partitioning an Existing Table

Examples of Date Range Table Partitions

CREATE TABLE sales (id int, date date, amt decimal(10,2))

DISTRIBUTED BY (id)

PARTITION BY RANGE (date)

( START (date '2008-01-01') INCLUSIVE

END (date '2009-01-01') EXCLUSIVE

EVERY (INTERVAL '1 day') );

You can also declare and name each partition individually. For example:

CREATE TABLE sales (id int, date date, amt decimal(10,2))

DISTRIBUTED BY (id)

PARTITION BY RANGE (date)

( PARTITION Jan08 START (date '2008-01-01') INCLUSIVE ,

PARTITION Feb08 START (date '2008-02-01') INCLUSIVE ,

PARTITION Mar08 START (date '2008-03-01') INCLUSIVE ,

PARTITION Apr08 START (date '2008-04-01') INCLUSIVE ,

PARTITION May08 START (date '2008-05-01') INCLUSIVE ,

PARTITION Jun08 START (date '2008-06-01') INCLUSIVE ,

PARTITION Jul08 START (date '2008-07-01') INCLUSIVE ,

PARTITION Aug08 START (date '2008-08-01') INCLUSIVE ,

PARTITION Sep08 START (date '2008-09-01') INCLUSIVE ,

PARTITION Oct08 START (date '2008-10-01') INCLUSIVE ,

PARTITION Nov08 START (date '2008-11-01') INCLUSIVE ,

PARTITION Dec08 START (date '2008-12-01') INCLUSIVE

END (date '2009-01-01') EXCLUSIVE );

Note that you do not need to declare an END value for each partition, only the last one. In this example, Jan08 would end where Feb08 starts.

Examples of Numeric Range Table Partitions

CREATE TABLE rank (id int, rank int, year int, gender char(1), count int)

DISTRIBUTED BY (id)

PARTITION BY RANGE (year)

( START (2001) END (2008) EVERY (1),

DEFAULT PARTITION extra );

Examples of List  Table Partitions

CREATE TABLE rank (id int, rank int, year int, gender char(1), count int )

DISTRIBUTED BY (id)

PARTITION BY LIST (gender)

( PARTITION girls VALUES ('F'),

PARTITION boys VALUES ('M'),

DEFAULT PARTITION other );

Examples of Muti-Level/Composite Table Partitions

CREATE TABLE sales (trans_id int, date date, amount decimal(9,2), region text)

DISTRIBUTED BY (trans_id)

PARTITION BY RANGE (date)

SUBPARTITION BY LIST (region)

SUBPARTITION TEMPLATE

( SUBPARTITION usa VALUES ('usa'),

SUBPARTITION asia VALUES ('asia'),

SUBPARTITION europe VALUES ('europe'),

DEFAULT SUBPARTITION other_regions)

START (date '2008-01-01') INCLUSIVE

END (date '2009-01-01') EXCLUSIVE

EVERY (INTERVAL '1 month'),

DEFAULT PARTITION outlying_dates );

CREATE TABLE sales (id int, year int, month int, day int, region text)

DISTRIBUTED BY (id)

PARTITION BY RANGE (year)

SUBPARTITION BY RANGE (month)

SUBPARTITION TEMPLATE (

START (1) END (13) EVERY (1),

DEFAULT SUBPARTITION other_months )

SUBPARTITION BY LIST (region)

SUBPARTITION TEMPLATE (

SUBPARTITION usa VALUES ('usa'),

SUBPARTITION europe VALUES ('europe'),

SUBPARTITION asia VALUES ('asia'),

DEFAULT SUBPARTITION other_regions )

( START (2002) END (2010) EVERY (1),

DEFAULT PARTITION outlying_years );

Partitioning an Existing Table

partitioned at CREATE TABLE time. If you have an existing table that you want to partition, you must recreate the table as a partitioned table, reload the data into the newly partitioned table, drop the original table and rename the partitioned table to the original name. You must also regrant any table permissions. For example:

CREATE TABLE sales2 (LIKE sales)

PARTITION BY RANGE (date)

( START (date '2008-01-01') INCLUSIVE

END (date '2009-01-01') EXCLUSIVE

EVERY (INTERVAL '1 month') );

INSERT INTO sales2 SELECT * FROM sales;

DROP TABLE sales;

ALTER TABLE sales2 RENAME TO sales;

GRANT ALL PRIVILEGES ON sales TO admin;

GRANT SELECT ON sales TO guest;

Limitations of Partitioned Tables

A primary key or unique constraint on a partitioned table must contain all the partitioning columns. A unique index can omit the partitioning columns; however, it is only enforced on the parts of the partitioned table, not on the partitioned table as a whole.

Loading Partitioned Tables

Once you have created your partitioned table structure, top-level parent tables are always empty. Data is routed to the bottom-level child table partitions only. In a multi-level partition design, only the subpartitions at the bottom of the hierarchy can contain data.

If a row cannot be mapped to a child table partition, it will be rejected and the load will fail. If you do not want unmapped rows to be rejected at load time, you can define your partition hierarchy with a DEFAULT partition. Any rows that do not match to an existing partition’s CHECK constraints will then load into the DEFAULT partition.

At runtime, the query planner scans the entire table inheritance hierarchy and uses the CHECK table constraints to determine which of the child table partitions to scan in order to satisfy the query’s conditions. The DEFAULT partition (if your hierarchy has one) is always scanned. If the DEFAULT partition contains data, this will slow down the overall scan time.

When you use COPY or INSERT to load data into a parent table, it automatically gets rerouted to the correct partition by default. Therefore, you can load a partitioned table as you would a regular table.

You can also load data into the child table partitions directly if needed. You can also create an intermediate staging table, load it, and then exchange it into your partition design.This type of exchange is preferred over a direct COPY or INSERT.

Verifying Your Partition Strategy

The purpose for partitioning a table is to reduce the number of rows that must be scanned in order to satisfy a given query. If a table is partitioned based on the query predicate, you can verify that the query planner is selectively scanning the relevant data by using EXPLAIN to look at the query plan.

For example, suppose we have a sales table that is date-range partitioned by month and subpartitioned by region,

CREATE TABLE sales (trans_id int, date date, amount decimal(9,2), region text)

DISTRIBUTED BY (trans_id)

PARTITION BY RANGE (date)

SUBPARTITION BY LIST (region)

SUBPARTITION TEMPLATE

( SUBPARTITION usa VALUES ('usa'),

SUBPARTITION asia VALUES ('asia'),

SUBPARTITION europe VALUES ('europe'),

DEFAULT SUBPARTITION other_regions)

START (date '2008-01-01') INCLUSIVE

END (date '2009-01-01') EXCLUSIVE

EVERY (INTERVAL '1 month'),

DEFAULT PARTITION outlying_dates );

For the following query:

EXPLAIN SELECT * FROM sales WHERE date='01-07-08' AND region='usa';

The query plan for this query should show a table scan of the following tables only:

1. the default partition returning 0-1 rows (if your partition design has one)

2 the January 2008 partition (sales_1_prt_1) returning 0-1 rows

3 the USA region subpartition (sales_1_2_prt_usa) returning some number of rows.

Below is an example of the relevant portion of the query plan:

-> Seq Scan on sales_1_prt_1 sales (cost=0.00..0.00 rows=0

width=0)

Filter: "date"=01-07-08::date AND region='USA'::text

-> Seq Scan on sales_1_2_prt_usa sales (cost=0.00..9.87 rows=20

width=40)

Make sure that the query planner is not scanning unnecessary partitions or subpartitions (for example, scans of other months or regions not specified in the query predicate), and that scans of the top-level tables are returning 0-1 rows.

Viewing Your Partition Design

You can look up information about your partition design using the pg_partitions view. For example to see the partition design of the sales table:

SELECT partitionboundary, partitiontablename, partitionname, partitionlevel, partitionrank FROM pg_partitions WHERE tablename='sales';

There are also the following views that show information about partitioned tables:

1. pg_partition_templates - Shows subpartitions that were created using a subpartition template.

2. pg_partition_columns - Shows the partition key columns used in a partition design.

Maintaining Partitioned Tables

You must maintain a partitioned table using the ALTER TABLE command against the top-level parent table. The most common scenario is dropping old partitions and adding new ones in order to maintain a rolling window of data in a range partition design. You may also want to convert (exchange) older partitions to the append-only compressed storage format in order to save space. If you have a default partition in your partition design, the procedure for adding a new partition is to split the default partition.

1. Adding a New Partition

2. Renaming a Partition

3. Adding a Default Partition

4. Dropping a Partition

5. Truncating a Partition

6. Exchanging a Partition

7. Splitting a Partition

8. Modifying a Subpartition Template

Important: When defining and altering partition designs, use the given partition name, not the table object name. Although you can query and load any table (including partitioned tables) directly using SQL commands, you can only modify the structure of a partitioned table using the ALTER TABLE...PARTITION clauses. Since partitions are not required to have names, if there is no partition name, there are the following expressions can also specify a part: PARTITION FOR (value) or PARTITION FOR(RANK(number)).

Adding a New Partition

You can add a new partition to an existing partition design using the ALTER TABLE command. If the original partition design included subpartitions defined by a subpartition template, then the newly added partition will also be subpartitioned according to that template. For example:

ALTER TABLE sales ADD PARTITION

START (date '2009-02-01') INCLUSIVE

END (date '2009-03-01') EXCLUSIVE;

If a subpartition template was not used when you created the table, you would then define subpartitions when adding a new partition:

ALTER TABLE sales ADD PARTITION

START (date '2009-02-01') INCLUSIVE

END (date '2009-03-01') EXCLUSIVE

( SUBPARTITION usa VALUES ('usa'),

SUBPARTITION asia VALUES ('asia'),

SUBPARTITION europe VALUES ('europe') );

If you want to add a new subpartition to an existing partition, you can specify a particular partition to alter. For example:

ALTER TABLE sales ALTER PARTITION FOR (RANK(12))

ADD PARTITION africa VALUES ('africa');

Note: You cannot add a new partition to a partition design that has a default partition. You must split the default partition in order to add a new partition.

Renaming a Partition

Partitioned tables are using the following naming convention. However, partitioned subtable names are subject to uniqueness requirements and length limitations.

<parentname>_<level>_prt_<partition_name>

For example:

sales_1_prt_jan08

Or for auto-generated range partitions (a number is assigned when no name is given):

sales_1_prt_1

It is not possible to rename a partitioned child table directly by altering the table name. However, you can rename the top-level parent table, and the associated <parentname> will change in the table names of all associated child table partitions. For example:

ALTER TABLE sales RENAME TO globalsales;

Would change the associated table names accordingly:

globalsales_1_prt_1

You can also change the partition name of a partition to make it easier to identify. For example:

ALTER TABLE sales RENAME PARTITION FOR ('2008-01-01') TO jan08;

Would change the associated table name accordingly:

sales_1_prt_jan08

When altering partitioned tables with the ALTER TABLE command, they are always referred to by their partition name (jan08) and not their full table name (sales_1_prt_jan08).

Note: The table name cannot be a partition name in an ALTER TABLE statement. For example, ALTER TABLE sales... is correct, ALTER TABLE sales_1_part_jan08... is not allowed.

Adding a Default Partition

You can add a default partition to an existing partition design using the ALTER TABLE command.

ALTER TABLE sales ADD DEFAULT PARTITION other;

If your partition design is multi-leveled, then each level in the hierarchy needs a default partition. For example:

ALTER TABLE sales ALTER PARTITION FOR (RANK(1)) ADD DEFAULT PARTITION other;

ALTER TABLE sales ALTER PARTITION FOR (RANK(2)) ADD DEFAULT PARTITION other;

ALTER TABLE sales ALTER PARTITION FOR (RANK(3)) ADD DEFAULT PARTITION other;

Partition designs that do not have a default partition will reject incoming rows that do not match to an existing partition’s CHECK constraint. If a partitioned table has a default partition, incoming data that does not match to an existing partition is instead inserted into the default partition.

Dropping a Partition

You can drop a partition from your partition design using the ALTER TABLE command. When you drop a partition that has subpartitions, the subpartitions (and all data in them) are automatically dropped as well. For range partitions, it is common to drop the older partitions from the range as old data is rolled out of the data warehouse. For example:

ALTER TABLE sales DROP PARTITION FOR (RANK(1));

Truncating a Partition

You can truncate a partition using the ALTER TABLE command. When you truncate a partition that has subpartitions, the subpartitions are automatically truncated as well.

ALTER TABLE sales TRUNCATE PARTITION FOR (RANK(1));

Exchanging a Partition

Exchanging a partition involves swapping in another table in place of an existing partition. You can exchange a partition using the ALTER TABLE command. You can only exchange partitions at the lowest level of your partition hierarchy (only partitions that contain data can be exchanged).

This can be useful for data loading. For example, you could load a staging table and then swap the loaded table into your partition design. You can also use exchange to change the storage type of older partitions to append-only tables. For example:

CREATE TABLE jan08 (LIKE sales) WITH (appendonly=true);

INSERT INTO jan08 SELECT * FROM sales_1_prt_1 ;

ALTER TABLE sales EXCHANGE PARTITION FOR (DATE '2008-01-01') WITH TABLE jan08;

Note: This example refers to the single-level definition of the table sales, before partitions were added and altered in the previous examples.

Splitting a Partition

Splitting a partition involves dividing an existing partition into two. You can split a partition using the ALTER TABLE command. You can only split partitions at the lowest level of your partition hierarchy (only partitions that contain data can be split). The split value you specify will go into the latter partition.

For example, to split a monthly partition into two with the first partition containing dates January 1-15 and the second partition containing dates January 16-31:

ALTER TABLE sales SPLIT PARTITION FOR ('2008-01-01')

AT ('2008-01-16')

INTO (PARTITION jan081to15, PARTITION jan0816to31);

If your partition design has a default partition, you must split the default partition in order to add a new partition. You can only split default partitions at the lowest level of your partition hierarchy (only default partitions that contain data can be split).

When using the INTO clause, the second partition name specified should always be that of the existing default partition. For example, to split a default range partition to add a new monthly partition for January 2009:

ALTER TABLE sales SPLIT DEFAULT PARTITION

START ('2009-01-01') INCLUSIVE

END ('2009-02-01') EXCLUSIVE

INTO (PARTITION jan09, default partition);

Modifying a Subpartition Template

Use ALTER TABLE SET SUBPARTITION TEMPLATE to modify the subpartition template for an existing partition. After you set a new subpartition template, partitions that you add subsequently will have the new subpartition design. Existing partitions are not modified.

For example, to modify the subpartition 

ALTER TABLE sales SET SUBPARTITION TEMPLATE

( SUBPARTITION usa VALUES ('usa'),

SUBPARTITION asia VALUES ('asia'),

SUBPARTITION europe VALUES ('europe'),

SUBPARTITION africa VALUES ('africa')

DEFAULT SUBPARTITION other );

With this example template, when you next add a date-range partition of the table sales, it will include the new regional list subpartition for Africa. For example, the following command would create the subpartitions usa, asia, europe, africa, and a default partition named other:

ALTER TABLE sales ADD PARTITION sales_prt_3

START ('2009-03-01') INCLUSIVE

END ('2009-04-01') EXCLUSIVE );

If you need to remove a subpartition template, use SET SUBPARTITION TEMPLATE with empty parentheses. For example, to completely clear the subpartition template used in the above examples:

ALTER TABLE sales SET SUBPARTITION TEMPLATE ()

Exploring partitions in Greenplum - Nov 01, 2014 5:15:0 PM