Gpfdist session is dying while running multiple gpload session from Informatica

Post date: Oct 10, 2014 5:0:31 PM

Debugging little more and looking at the log we found that Informatica is launching 3 gpload in parallel and each gpload process is launching have 10 gpfdist processes. The gpload processes and their associated gpfdist processes will all run on the same server that happens to be running the PowerCenter job. Informatica writes to a named pipe and gpfdist reads from the named pipe. We suspect the 30 gpload processes are stressing the kernel or network buffers/devices which is causing the connection fault. Typically when loading data with this many processes we would spread the gpfdist process across multiple servers. We may only need to tweak the kernel OS params but we need the logs to know for sure.

gpload has an option "-V" for very verbose and i believe there is a drop down in informatica that allows you to enable this functionality.

Collect the following information before and after job execution:

 

Collect this information before and after:

Collect this information just after the job:

Collect this information while the job is running

If we have 3 concurrent gpload jobs with 10 gpfdists processes and let say with default gp_external_max_seg value which is 64 then we are looking at 64 * 10 * 3 = 1920 concurrent Sessions with only one etl server. 

you could trying setting the gp_external_max_seg GUC to 32 which will reduce the number of concurrent session to 960 but it could impact large table load times. The server configuration parameter "gp_external_max_segs" can be set in the postgresql.conf file of your master instance. It will require a master system restart.

Gpfdist is a file server program using a HTTP protocol that serves files in parallel and provides the best performance when loading or unloading data in Greenplum database. Primary segments access external files in parallel when using gpfdist up to the value of gp_external_max_segments GUC.

In general, when optimizing gpfdist performance, maximize the parallelism as the number of segments increases. Spread the data evenly across as many nodes as possible. Split very large data files into equal parts and spread the data across as many file systems as possible. Run two gpfdist‘s per file system. Gpfdist tends to be CPU bound on the segment nodes when loading, but for example, if there are 8 racks of segment nodes, there is lot of available CPU on the segment side so you can drive more gpfdist processes. Run gpfdist on as many interfaces as possible (and be aware of bonded NICs and be sure to start enough gpfdist’s to work them). It is important to keep the work even across all these resources. In an MPP shared nothing environment, the load is as fast as the slowest node. Skew in the load file layout will cause the overall load to bottleneck on that resource.

The gp_external_max_segs server configuration parameter controls the number of segment instances that can access a single gpfdist instance simultaneously. Setting a low value might affect gpload performance. This parameter sets the number of segments that will scan external table data during an external table operation, the purpose being not to overload the system with scanning data and take away resources from other concurrent operations. This only applies to external tables that use the gpfdist:// protocol to access external table data.

Controlling Segment Parallelism

You can use this server configuration parameter to control how many segment instances access a single gpfdist program at a time. 64 is the default. This allows you to control the number of segments processing external table files, while reserving some segments for other database processing. This parameter can be set in the postgresql.conf file of your master instance:

gp_external_max_segs controls the number segments each gpfdist serves. The default is 64. Always keep gp_external_max_segs and the number of gpfdist processes and even factor (gp_external_max_segs divided by the # of gpfdist processes should have a 0 remainder). The way this works, for example if there are 12 segments and 4 gpfdist’s then the planner round robins the assignment as follows:

Seg 1 - gpfdist 1

Seg 2 - gpfdist 2

Seg 3 - gpfdist 3

Seg 4 - gpfdist 4

Seg 5 - gpfdist 1

Seg 6 - gpfdist 2

Seg 7 - gpfdist 3

Seg 8 - gpfdist 4

Seg 9 - gpfdist 1

Seg 10 - gpfdist 2

Seg 11 - gpfdist 3

Seg 12 - gpfdist 4