How to load data into greenplum database using external tables?

Post date: Sep 21, 2012 4:14:38 PM

Follow the following  4 steps to load the data from external tables.

1. Start the Greenplum files server(s) if using the gpfdist protocol

2. Define the external tables

3. Place the data files 

4. Load the data

1. Start the Greenplum files server(s) if using the gpfdist protocol

Before creating external tables with the gpfdist protocol, the gpfdist file server program must be running. The following code starts the gpfdist file server program in the background on port 8081 serving files from directory /var/data/staging:

gpfdist -p 8081 -d /var/data/staging -l /home/gpadmin/log &

2. Define the external tables

The CREATE EXTERNAL TABLE command defines an external table and specifies the location and format of the external table data files with the LOCATION and FORMAT parameters. This command does not load data into the table.

CREATE EXTERNAL TABLE ext_expenses 

name              text,

date               date, 

amount           float4, 

category         text, 

desc1             text 

)

LOCATION (

'gpfdist://etlhost-1:8081/*',

'gpfdist://etlhost-2:8081/*')

FORMAT 'TEXT' (DELIMITER '|');

Note: Above example is showing Single Greenplum file server (gpfdist) instance on multiple NIC machine

There are three protocols (gpfdists, file, gphdfs) that you can use to access external table data sources, but you cannot mix the protocols in a CREATE EXTERNAL TABLE statement. In this example we will use gpfdists protocol.

3. Place the data files 

Place the data files under the directory /var/data/staging.

4. Load the data 

Once you have defined the external table, and have placed the necessary data files in the correct location (and have started the Greenplum files server(s) if using the gpfdist protocol), you can then select from the external table as you would an ordinary database table. For example, if you wanted to load a portion of the external table data into a database table, you could do something like: 

INSERT INTO expenses_travel SELECT * from ext_expenses where category='travel';

Or if you wanted to quickly load all data into a new database table:

CREATE TABLE expenses AS SELECT * from ext_expenses;