Installing Greenplum SNE on Ubuntu Linux

Post date: Apr 23, 2013 7:42:13 PM

Before install Greenplum 4 on Ubuntu you have to do some additional tricks.

1# convince the Greenplum installer that it is on a RedHat/CentOS

echo "Trick for install Greenplum 4" > /etc/redhat-release

2# install the libnuma library (if not present)

apt-get install libnuma1

3# uncomment the line containing "session required pam_limits.so" in /etc/pam.d/su

4# use the following enhanced version of the fix-libs.sh script

#!/bin/bash

if [ -z "$GPHOME"]||[ ! -d $GPHOME/lib ]; then

echo "Missing or wrong GPHOME environment variable";

exit 255

fi

cd $GPHOME/lib

# libraries shipped with Greenplum SNE

gplibs="$(find -maxdepth 1 -type f | cut -f 2 -d /)"

# libraries with same abi installed via dpkg

deblibs="$(dpkg -S $gplibs 2> /dev/null | cut -f 2 -d ' ')"

# we remove the greenplum one to avoid "no version information available" errors

for lib in $deblibs; do

ver=$(basename $lib)

rm -fv $ver

while [ $ver = ${ver#.so} ]&&[ $ver != ${ver%.so*} ]; do

ver=${ver%.*}

rm -fv $ver

done

done

After the installation you can remove the file /etc/redhat-release, if you want. I've tested this procedure on a freshly installed Ubuntu 10.10 server 64 bit.

Officially Greenplum Database Single Node Edition (SNE) is only installable on Red Hat Enterprise Linux (RHEL) and SUSE Linux Enteprise Server (SLES), but while surfing the web I have seen many requests on how to install it on Debian/Ubuntu. Here I’m trying to give you some advices.

Before installing Greenplum Database CE, you need to adjust the following OS configuration parameters:

Set the following parameters in the `/etc/sysctl.conf` file:

kernel.shmmax = 500000000

kernel.shmmni = 4096

kernel.shmall = 4000000000

kernel.sem = 250 64000 100 512

net.ipv4.tcp_tw_recycle=1

net.ipv4.tcp_max_syn_backlog=4096

net.core.netdev_max_backlog=10000

vm.overcommit_memory=2

To activate such parameters you can either run `sudo sysctl -p` or reboot the system.

Set the following parameters in the `/etc/security/limits.conf` file:

* soft nofile 65536

* hard nofile 65536

* soft nproc 131072

* hard nproc 131072

In the file /etc/hosts comment out the line beginning with `::1`, as it could confuse the database when it resolves the hostname for localhost. Also make sure either localhost and your hostname is resolvable to a local address.

Now you have done preparing the environment for your Greenplum Database SNE. The next step is to create the user account designated to be the administrator of your installation, usually this user is called gpadmin.

sudo adduser –gecos “Greenplum Administrator” gpadmin

At this point you have to download or copy the installer file to the system. You should choose the RHEL installer for your architecture. I have a x86_64 so from now on I will use it as example.

To start the installation run the following commands (you need the unzip program installed):

unzip greenplum<versionx>.zip

sudo bash greenplum<version>.bin

Follow the on screen instructions. Accept the license and choose the installation path. The default one is fine. The installer will create a `greenplum-db` symbolic link one directory level above your chosen installation directory. The symbolic link is used to facilitate patch maintenance and upgrades between versions. From now on the install location will be referred to as `$GPHOME`.

Change the ownership of the installation so that it is owned by the gpadmin user and group.

sudo chown -R gpadmin:gpadmin $GPHOME

Now is the time to choose the data directory location, to explain how to choose nothing is better of quoting the official quick-start guide.

 Every Greenplum Database CE instance has a designated storage area on disk that  is called the data directory location. This is the file system location where the database  data is stored. In the Greenplum Database CE, you initialize a Greenplum Database CE master instance and two or more segment instances on the same system, each  requiring a data directory location. These directories should have sufficient disk space  for your data and be owned by the gpadmin user.

Remember that the data directories of the segment instances are where the user data resides, so they must have enough disk space to accommodate your planned data capacity. For the master instance, only the system catalog tables and system  metadata are stored in the master data directory.

For this guide we will use the default layout, with the master (`/gpmaster`) and two segments (`/gpdata1` and `/gpdata2`). 

sudo mkdir /gpmaster /gpdata1 /gpdata2

sudo chown gpadmin:gpadmin /gpmaster /gpdata1 /gpdata2

A `greenplum_path.sh` file is provided in your `$GPHOME` directory with environment variable settings for Greenplum Database SNE. You should source this in the gpadmin user’s startup shell profile (such as `.bashrc`) adding a line like the following:

source /usr/local/greenplum-db/greenplum_path.sh

Before to continue we should do some magics to avoid failures running programs from Ubuntu with libraries shipped by Greenplum CE.

#!/bin/sh

cd $GPHOME/lib

# libraries shipped with Greenplum CE

gplibs=”$(find -maxdepth 1 -type f | cut -f 2 -d /)”

# libraries with same abi installed via dpkg

deblibs=”$(dpkg -S $gplibs 2> /dev/null | cut -f 2 -d ‘ ‘)”

# we remove the greenplum one to avoid “no version information available” errors

for lib in $deblibs; do

rm -f $(basename $lib)

done

It’s now time to initialize the database system, all the following steps are to be executed as gpadmin user.

su – gpadmin

cp $GPHOME/docs/cli_help/single_hostlist_example ./single_hostlist

cp $GPHOME/docs/cli_help/gp_init_singlenode_example ./gp_init_singlenode

If you do not want to use the default configuration, data directory locations, ports, or other configuration options, edit the `gp_init_singlenode` file and enter your configuration settings.

Run the gpssh-exkeys utility to exchange ssh keys for the local host:

gpssh-exkeys -h 127.0.0.1 -h localhost

Run the following command to initialize the database:

gpinitsystem -c gp_init_singlenode

The utility verifies your setup information and makes sure that the data directories specified in the `gp_init_singlenode` configuration file are accessible. If all of the verification checks are successful, the utility prompts you to confirm the configuration before creating the system.

At the end of a successful setup, the utility starts your system. You should see:

Greenplum Database instance successfully created.

The management utilities require that you set the `MASTER_DATA_DIRECTORY` environment variable. This should specify the directory created by the gpinitsystem utility in the master data directory location.

echo “export MASTER_DATA_DIRECTORY=/gpmaster/gpsne-1″ >> ~/.bashrc

source ~/.bashrc

Now you can connect the master database using the psql client program:

psql postgres

I would remark to you that a system installed following this guide is to be considered as **evaluation platform only**, and is not supposed to be for production installations of Greenplum Database.