Greenplum DIA

To meet the challenges of fast data loading, the EMC Data integration Accelerator (DIA) is purpose-built for batch loading, and micro-batch loading, and leverages a growing number of data integration applications such as Informatica, Talend, and Pentaho. More software titles are being qualified for use in the DIA at this time.

The Data Integration Accelerator (DIA) is specially built to facilitate fast data loading to the DCA. It integrates the Greenplum data loading utility called gpfdist with the server, storage and networking gear into a single system. It leverages the high-speed internal 10 Gb/sec communication network to deliver the data quickly to the DCA.

The DIA servers are preloaded with RedHat Enterprise Linux operating systems, currently at version 5.5. It is also preloaded with a Greenplum utility called gpfdist. gpfdist is the Greenplum parallel file server utility used for facilitating fast data loading, making use of the DCA database’s MPP architecture.

Since the DIA servers are RedHat Linux hosts, they can also be configured as hosts for data integration software, such as Informatica, Talend, and Pentaho.

DATA INTEGRATION ACCELERATOR CONFIGURATIONS

Available in three configurations:

The DIA comes in blocks of 4 servers. Each block is referred to as a module. You can order up to 4 modules of DIA for a DCA installation. Currently each DIA server is a commodity server with 2TB SATA drives. Each DIA server comes with 12 CPU cores, 48GB of memory and 12 2TB SATA disks, with a total usable capacity of about 70TB. The exact server model may change over time, but the architecture should remain the same. With each server, the Greenplum gpfdist utility is pre-loaded by default.

DIA Features

RAPID DEPLOYMENT AND PREDICTABLE PERFORMANCE: The Greenplum Data Integration Accelerator is a purpose-built, open systems data accelerator

that architecturally integrates Greenplum data loading software (gpfdist), server, storage and networking into a single, easy-to-implement system. The packaging and pre-tuning ensures predictable performance, while dramatically simplifying your data loading activates, resulting in reduced administration overhead.

TIGHTLY INTEGRATED WITH THE DATA COMPUTING APPLIANCE FAMILY: The DIA was designed for tight integration with the DCA family of data warehousing and analytic appliances. Removing the need for custom solutions and non-supported hardware, the DIA enables an end-to-end solution with a single support and management infrastructure. By leveraging a common 10 GB/s Ethernet network, the DIA enables the fastest data loading directly into the DCA segment servers.

ENGINEERED FOR PARALLEL EXECUTION OF DATA LOADING: The DIA, combined with the Greenplum DCA manages the flow of data into all nodes of the

appliance using the EMC Greenplum’s MPP Scatter/Gather Streaming™ (SG Streaming) technology. The system uses a “parallel-everywhere” approach to loading, in which data flows from all the nodes on the DIA to every segment server of the database without any sequential choke points. The combined solution achieves loading speeds of more than 10 terabytes per hour, two- to five-times faster than other appliance solutions.

ENTERPRISE HIGH AVAILABILITY: The Greenplum DIA is a system that meets the reliability requirements of the most mission- critical enterprises with data availability consisting of RAID protection at the disk level. This provides no data loss when losing a disk within any server.

GREENPLUM PERFORMANCE MONITOR: The DIA is managed via the Greenplum Performance Monitor application that provides a single view of the Data Computing Appliance and the Data Integration Accelerator from a single management console. The system includes Secure Remote Support (call home) and provides email and SNMP notification in the case of any event needing attention.

PROACTIVE EMC ONE SUPPORT STRUCTURE: EMC Customer Support Services provides resources and services to quickly and proactively resolve solution-related issues and questions to ensure business continuity and a highly- available data environment. EMC’s global maintenance and support is available

around-the-clock via comprehensive online support tools including Live Chat and online service request management, Secure Remote Support (call home), live telephone support, and onsite support through the industry’s leading global field service organization.