Writable external table initialization and file permission

1. By default the writable external file is created with permission set “600”, which means that nobody but gpadmin can actually read this file.

2. It is not advisable to change any configuration parameter to change the default behavior due the data security reasons. I am pretty sure that there is no such parameter in the configuration file. I will double check it .

3. Admin can always change the file permission to 666 as required.

Please see a sample test case below.

[gpadmin@sachi ~]$ gpfdist -d /home/gpadmin/loadfiles/ -p 8081 &

[1] 553

[gpadmin@sachi ~]$ Serving HTTP on port 8081, directory /home/gpadmin/loadfiles
[gpadmin@sachi ~]$ ps -ef|grep gpfdist
gpadmin 553 32674 0 09:26 pts/3 00:00:00 gpfdist -d /home/gpadmin/loadfiles/ -p 8081
gpadmin 555 32674 0 09:26 pts/3 00:00:00 grep gpfdist
[gpadmin@sachi ~]$ psql -d sachi

psql (8.2.15)
Type "help" for help.


sachi=# create WRITABLE EXTERNAL TABLE sachitest (
sachi(# name varchar(20), age int)
sachi-# LOCATION ('gpfdist://sachi:8081/ext_sachi.out')
sachi-# FORMAT 'CUSTOM' (formatter=fixedwidth_out, name=20, age=4, line_delim=E'\r\n');

CREATE EXTERNAL TABLE
sachi=# 

[gpadmin@sachi ~]$ ps -ef|grep gpfdist
gpadmin 553 32674 0 09:26 pts/3 00:00:00 gpfdist -d /home/gpadmin/loadfiles/ -p 8081
gpadmin 897 32674 0 09:37 pts/3 00:00:00 grep gpfdist
[gpadmin@sachi ~]$ ls -l /home/gpadmin/loadfiles/
total 4

-rw-------. 1 gpadmin gpadmin 52 Jan 31 09:35 ext_sachi.out

[gpadmin@sachi ~]$ cd /home/gpadmin/loadfiles/
[gpadmin@sachi loadfiles]$ ll

total 4
-rw-------. 1 gpadmin gpadmin 52 Jan 31 09:35 ext_sachi.out

[gpadmin@sachi loadfiles]$ chmod 666 ext_sachi.out
[gpadmin@sachi loadfiles]$ ll

total 4
-rw-rw-rw-. 1 gpadmin gpadmin 52 Jan 31 09:35 ext_sachi.out

[gpadmin@sachi loadfiles]$

Here Requiring admin action for every extract will in effect render impossible serving flat file extracts to the clients. Also, it would make it impossible to run ETL that would extract data from GP and then load that flat file to Oracle as a second step after extracting data. In short, we need a solution to a problem of automatically creating flat file extract readable by Informatica user via NFS. It might be something as simple as monitoring a directory on DIA and setting up correct permissions for newly created files. 

An alternative approach: If the file referenced by the writable external table already exists (and gpadmin already has write permissions), then those the data will be written without modifying those permissions. Assuming the umask of the gpadmin user on the DIA is something reasonable (for example, 0002), then the target file could be created prior to the extraction.

In our example above, the gpadmin user on the DIA could create the extract file at the Linux command line (and leave it empty) prior to starting up the gpfdist service:

touch /home/gpadmin/loadfiles/ext_sachi.out

As a one-time verification, at this point we can check the permissions of the empty file to make sure gpadmin is creating files that the world can see. If so, the rest of the example can be executed and the need to kick around file permissions at the end of the process would be precluded. Since the file already exists at the time of the unload operation, the existing permissions will be respected.

One thing to be aware of. Neither gpfdist nor the insert into statement will truncate existing files so they will continue to "grow" which may not be your desired outcome. A simple workaround for this is to issue this statement in your control framework prior to starting a "new" unload:

echo > /home/gpadmin/loadfiles/ext_sachi.out

This will empty the file out thus ensuring this "unload" only contains this run's data.
Comments