Greenplum Database New Feature - External table support to Hadoop distributions and MapR

Greenplum announced many more features to support Hadoop Distributions and MapR package.  Here we will explore few features/enhancements added in GPDB 4.3.1.0, 4.3.2.0, 4.3.3.0, 4.3.4.0, 4.3.4.1 and 4.3.5.1.

Note:

External table support to Hadoop distributions and MapR

(4.3.1, 4.3.2, 4.3.3, 4.3.4, 4.3.4.1, 4.3.5.1)

Starting with Greenplum Database 4.3.1, installing a separate gNet package is not required to use the gphdfs protocol. The jar files for the gphdfs extensions, the libraries, and the documentation for the gphdfs extensions are bundled with Greenplum Database. The files are installed in $GPHOME/lib/hadoop. The gphdfs protocol is used with external tables to access data from Hadoop file systems.

Starting with Greenplum Database 4.3.1, to upgrade to a different version of gphdfs, you must install the version of Greenplum Database that contains the version of gphdfs that you wish to use.

Starting with Greenplum Database 4.3.2, Greenplum support the CSV Format for HDFS External Tables. Greenplum Database external tables enable accessing external files as if they are regular database tables. For external files that contain data in the comma separated values (CSV) format on an Hadoop Distributed File System (HDFS), Greenplum Database supports reading and writing the files with the Greenplum Database gphdfs protocol.

Starting with Greenplum Database 4.3.3, Greenplum external table support for Hadoop Distributions.With Greenplum Database external tables created with the CREATE EXTERNAL TABLE command, you can specify the gphdfs protocol to access external files on an Hadoop file system (HDFS) as if they are regular database tables. For Greenplum Database 4.3.3, the gphdfs protocol has been enhanced to support these Hadoop distributions:

Starting with Greenplum Database 4.3.4,  GPDB supports MapR with Greenplum Database external tables that use the gphdfs protocol to access HDFS data.

Starting with Greenplum Database 4.3.4.1 Greenplum external table support Hadoop Distributions. With Greenplum Database external tables created with the CREATE EXTERNAL TABLE command, you can specify the gphdfsprotocol to access external files on an Hadoop file system (HDFS) as if they are regular database tables. This was reported earlier in GPDB 4.3.3. For Greenplum Database 4.3.3.1, the gphdfs protocol has been enhanced to support Cloudera CDH 5.2 and 5.3.

Starting with Greenplum Database 4.3.5.1, GPDB External Table Support Hadoop Distributions.With Greenplum Database external tables created with the CREATE EXTERNAL TABLE command, you can specify the gphdfsprotocol to access external files on an Hadoop file system (HDFS) as if they are regular database tables. This feature was earlier reported in 4.3.3.0 and 4.3.4.1. For Greenplum Database 4.3.5.1, the gphdfs protocol supports Pivotal HD 2.1 and Pivotal HD 3.0.

gp_hadoop_target_version

Specifies the installed version of Greenplum Hadoop target. For Greenplum Database 4.3.4.1, the parameter supports the valuecdh5 for the Cloudera CDH 5.2 and 5.3 distributions of HDFS.

VALUE RANGE

gphd-1.0

gphd-1.1

gphd-1.2

gphd-2.0

gpmr-1.0

gpmr-1.2

hdp2

cdh5

cdh4.1

cdh3u2

DEFAULT

gphd-1.1

SET CLASSIFICATIONS

local

session

reload