gpfdists (a secured version of gpfdist protocol)
gpfdists
The gpfdists protocol is a secure version of gpfdist, which enables encrypted communication and secure identification of the file server and the Greenplum Database to protect against attacks such as eavesdropping and man-in-the-middle attacks.
The protocol implements SSL security in a client/server scheme, with the following notable features:
Client certificates are required.
Multi-lingual certificates are not supported.
A Certificate Revocation List (CRL) is not supported.
The TLSv1 protocol is used with the TLS_RSA_WITH_AES_128_CBC_SHA encryption algorithm. These SSL parameters cannot be changed.
SSL renegotiation is supported.
The SSL ignore host mismatch parameter is set to false.
Private keys containing a passphrase are not supported for the gpfdist file server (server.key) and for the Greenplum Database (client.key).
Issuing certificates that are appropriate for the operating system in use is the users responsibility. Generally, converting certificates as shown in https://www.sslshopper.com/ssl-converter.html is supported.
- There may be a little performance impact seen on the gpfdists process, but i would suggest to test it on your cluster for a similar load while using gpfdist and gpfdists. Apologies, but we do not have a benchmark numbers, and the variation percentage may vary based on the volume of data, cluster size etc.
- Below are the steps which are required to implement gpfdists.
Step 1: Create a folder under the segment data directory & master data directory with a name called gpfdists and move the below files already created:
- The client certificate file, client.crt
- The client private key file, client.key
- The trusted certificate authorities, root.crt
Note: You can identify the segment data directory loaction using the below sql:
select fselocation,hostname from pg_filespace_entry pf, gp_segment_configuration gp where pf.fsedbid=gp.dbid;
Step 2 : Create an external table with gpfdists protocol. Example:
CREATE EXTERNAL TABLE ext_expenses ( name text,
date date, amount float4, category text, desc1 text )
LOCATION ('gpfdists://etlhost-1:8081/*.txt',
'gpfdists://etlhost-2:8082/*.txt')
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ') ;
Step 3: Put data under the path specified. Example:
Create a file call a1.txt under /var/load_files with the required delimiters
Step 4: Execute gpfdist service:
gpfdist -d /var/load_files -p 8081 --ssl $MASTER_DATA_DIRECTORY/gpfdists
Step 5: Fetch data from external table.
Note: You can also use gpload with ssl option true, More details on YAML structure on the administration guide.
- security is a feature / option provided by gpfdist service. We will not be able to block the usage of execution of gpfdist. Use having the priviliges to execute gpfdist can run the service without ssl options. Security must be implemented at user level as well.
Note: You may further use iptables to strengthen the access to IP / port on which data is served via gpfdist.
Testing gpfdists - Nov 01, 2014 4:12:2 PM