Greenplum Database DBA Handbook - Preview

===============================================================================================

ALL eBOOKS are FREE!!!!!!! for our registered users.  Click here to know about our membership program and become a member. 

===============================================================================================

GREENPLUM DATABASE DBA 

HANDBOOK

Sachchida N. Ojha

First Edition

-

Covers

DCA V1 and V2

Greenplum Database 4.2.x

Data Domain and Greenplum Chorus

CHAPTER 1 - INTRODUCTION

About the author

The author of this book has 23 years of experience in database architecture, administration and troubleshooting.

Greenplum Database Fundamental Concepts

Greenplum is a solution built to support the next generation of data warehousing.  It is built for large scale analytics processing.

Recent years Greenplum database becomes a preferred choice for deep analytics. It is designed for large-scale analytics processing and business intelligence.

What is Big Data?

Big Data refers to the tools, processes, and procedures used to create, manipulate, and manage very large data sets.

What is Data Warehouse?

A data warehouse is a culmination of information gathered about the enterprise.  The information can consist of variety of data including inventory data, HR data, sales data, operational data used to support decision making.  

In a data warehouse, data must be

Basic Data flow of the Data Warehouse

The basic elements of the data warehouse includes,

Data Warehouse Methodologies

Two commonly used methods to model a data warehouse are,

Dimensional model divides transactional data into facts and dimensions.  This is the most common data warehouse model. Normalized model stores data in tables.  The level of normalization for this approach is not as high as typical OLTP data models.

Dimensional Design

Information systems fall into two major categories,

Operational System

Operational system supports the execution of business process by capturing details about events and transactions. For example:  Capturing information about orders, shipments, and returns.

The activities recorded by these systems are called transactions. The system themselves are sometimes called online transaction processing system (OLTP).

Because of the focused on process execution, the operational system updates the data as things change, purge or archive data once its operational usefulness has ended.

The optimal schema design for an operational system which is widely accepted is in 3rd normal form.

Analytic System

Analytic system supports the evaluation of the process.  For example: How the sales orders are trending this month versus last month? Who are our best customers? Sales goal versus actual sales?  Is particular promotion working or not?

The interaction with an analytic system takes place exclusively through queries that retrieve data about business processes.  Here information is neither created nor modified.

Difference between operational systems and analytical systems are highlighted below.

                                   Operational Systems                   Analytical Systems

Table 1: Operational System vs. Analytic System

Star Schema

Star schema consists of fact and dimension tables. The fact table holds the main data. The dimension tables describe the facts presented in the fact table. Most queries against a star schema follow a consistent pattern. One or more facts are requested, along with the dimensional attributes that provide the desired context. The focus is on summarized data.

The star schema gets its name from its appearance; when drawn with the fact table in the center, it looks like a star or asterisk.

Fact Table

At the core of the star schema is the fact table. In addition to presenting the facts, the fact table includes surrogate keys that refer to each of the associated dimension tables. Each row in the fact table stores facts at a specific level of details. This level of detail is known as the fact table’s grain.

Dimension Tables

In a star schema, a dimension table contains columns representing dimensions. These columns provide context for facts. A well-developed set of dimension tables provides powerful and diverse analytic capabilities. Dimension tables with a large number of attributes maximize analytic value.

Snowflake Schema

In the snowflake schema, multiple centralized fact tables connect to multiple dimension tables. A single dimension table may have multiple parent tables.

ETL

ETL means extract, transform and load.  You extract the data, perform the transformation on the ETL machine, and then load the data. ETL is tied to the hardware. There are many ETL tools available in the market.

ELT

ELT means extract, load, and then transform.  This strategy is strongly suggested for large data. While this is powerful, it is difficult to integrate with current tools.

===============================================================================================

ALL eBOOKS are FREE!!!!!!! for our registered users.  Click here to know about our membership program and become a member. 

===============================================================================================