Creating EXTERNAL Tables - Examples
Creating EXTERNAL Tables - Examples
Example 1—Single Greenplum file server (gpfdist) instance on multiple NIC machine
Creates a readable external table named ext_expenses using the gpfdist protocol. The files are formatted with a pipe ( | ) as the column delimiter.
CREATE EXTERNAL TABLE ext_expenses (
name text,
date date,
amount float4,
category text,
desc1 text )
LOCATION (
'gpfdist://etlhost-1:8081/*',
'gpfdist://etlhost-2:8081/*'
)
FORMAT 'TEXT' (DELIMITER '|');
Example 2—Multiple Greenplum file server (gpfdist) instances
Creates a readable external table named ext_expenses using the gpfdist protocol from all files with the txt extension. The files are formatted with a pipe ( | ) as the column delimiter and an empty space as null.
CREATE EXTERNAL TABLE ext_expenses (
name text,
date date,
amount float4,
category text,
desc1 text )
LOCATION (
'gpfdist://etlhost-1:8081/*.txt',
'gpfdist://etlhost-2:8082/*.txt'
)
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
Example 3—Multiple secure Greenplum file server (gpfdists) instances
Creates a readable external table named ext_expenses using the gpfdists protocol from all files with the txt extension. The files are formatted with a pipe ( | ) as the column delimiter and an empty space as null.
First, run gpfdist with the --ssl option. Then, execute the following command.
CREATE EXTERNAL TABLE ext_expenses (
name text,
date date,
amount float4,
category text,
desc1 text )
LOCATION (
'gpfdists://etlhost-1:8081/*.txt',
'gpfdists://etlhost-2:8082/*.txt'
)
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
Example 4—Single Greenplum file server (gpfdist) instance with error logging
Creates a readable external table named ext_expenses using the gpfdist protocol from all files with the txt extension. The files are formatted with a pipe ( | ) as the column delimiter and an empty space as null.
The external table is accessed in single row error isolation mode. An error table (err_customer) is specified. Any data formatting errors that are found in the input data will be discarded to err_customer, along with a description of the error. err_customer can later be queried in order to see the nature of errors and reload the rejected data after fixing the issues. If the count of badly formatted data rows on any specific segment is greater than five (specified as the SEGMENT REJECT LIMIT value), the entire external table operation will be aborted and no rows will be processed.
CREATE EXTERNAL TABLE ext_expenses (
name text,
date date,
amount float4,
category text,
desc1 text )
LOCATION (
'gpfdist://etlhost-1:8081/*.txt',
'gpfdist://etlhost-2:8082/*.txt'
)
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
LOG ERRORS INTO err_customer SEGMENT REJECT LIMIT 5;
Create the same readable external table definition as above, but with CSV formatted files:
CREATE EXTERNAL TABLE ext_expenses (
name text,
date date,
amount float4,
category text,
desc1 text )
LOCATION (
'gpfdist://etlhost-1:8081/*.txt',
'gpfdist://etlhost-2:8082/*.txt'
)
FORMAT 'CSV' ( DELIMITER ',' )
LOG ERRORS INTO err_customer SEGMENT REJECT LIMIT 5;
Example 5—TEXT Format on a Hadoop Distributed File Server (HDFS)
Creates a readable external table named ext_expenses using the gphdfs protocol. The files are formatted with a pipe ( | ) as the column delimiter.
CREATE EXTERNAL TABLE ext_expenses (
name text,
date date,
amount float4,
category text,
desc1 text )
LOCATION (
'gphdfs://hdfshost-1:8081/data/filename.txt'
)
FORMAT 'TEXT' (DELIMITER '|');
Note: Only one data path is permitted with gphdfs. For examples of reading and writing custom formatted data on a Hadoop Distributed File System.
Example 6—Multiple file protocols in CSV format with header rows
Creates a readable external table named ext_expenses using the file protocol.The wildcard specifications are not the same for all the files. The files are formatted in CSV format and have a header row.
CREATE EXTERNAL TABLE ext_expenses (
name text,
date date,
amount float4,
category text,
desc1 text )
LOCATION (
'file://filehost:5432/data/international/*',
'file://filehost:5432/data/regional/*'
'file://filehost:5432/data/supplement/*.csv'
)
FORMAT 'CSV' (HEADER);
Example 7—Readable Web External Table with Script
Create a readable web external table that executes a script once per segment host:
CREATE EXTERNAL WEB TABLE log_output (linenum int, message text)
EXECUTE '/var/load_scripts/get_log_data.sh' ON HOST
FORMAT 'TEXT' (DELIMITER '|');
Example 8—Writable External Table that Writes to a File
Create a writable external table named sales_out that uses gpfdist to write output data to a file named sales.out. The files are formatted with a pipe (|) as the column delimiter and an empty space as null.
CREATE WRITABLE EXTERNAL TABLE sales_out (LIKE sales)
LOCATION ('gpfdist://etl1:8081/sales.out')
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
DISTRIBUTED BY (txn_id);
Example 9—Writable External Web Table with Script
Create a writable external web table that pipes output data received by the segments to an executable script named to_adreport_etl.sh:
CREATE WRITABLE EXTERNAL WEB TABLE campaign_out
(LIKE campaign)
EXECUTE '/var/unload_scripts/to_adreport_etl.sh'
FORMAT 'TEXT' (DELIMITER '|');
Use the writable external table defined above to unload selected data:
INSERT INTO campaign_out SELECT * FROM campaign WHERE customer_id=123;
Example 10—Readable and Writable External Tables with XML Transformations
Greenplum Database now can read and write XML data to and from external tables with gpfdist. For information on setting up an XML transform, see “Transforming XML Data” on page 178. The following code reads XML data into a table.
CREATE READABLE EXTERNAL TABLE prices_readable (LIKE prices)
LOCATION ('gpfdist://127.0.0.1:8080/data/prices.xml#transform=prices_input')
FORMAT 'text' (delimiter '|')
LOG ERRORS INTO prices_errortable SEGMENT REJECT LIMIT 10;
The following code creates a writable external table that transforms the data in the Greenplum Database to XML.
CREATE WRITABLE EXTERNAL TABLE prices_readable (LIKE prices)
LOCATION ('gpfdist://127.0.0.1:8080/data/prices.xml#transform=prices_input')
FORMAT 'text' (delimiter '|');