gpfdist error

The gadget spec URL could not be found
Resolved gpfdist error in GPDB
The gadget spec URL could not be found
  1. The Greenplum Database gpfdist utility failed with a SIGSEGV error when the utility received a empty request with two consecutive return characters “\n\n”.
  2. In some cases when network load was heavy, the Greenplum Database utility gpfdist intermittently failed with this error: gpfdist closed connection to server
The gadget spec URL could not be found
Known bug GPDB
  1. When a query joins an external table that uses the gpfdist protocol with a heap table, the planner might choose an incorrect plan that returns no results. Workaround: This can be avoided by running ANALYZE on the query before running the query.
  2. The bytenum field (byte offset in the load file where the error occurred) in the error log when using gpfdist with data in text format errors is not populated, making it difficult to find the location of an error in the source file.
  3. Running more than one gpfilespace command concurrently with itself to move either temporary files (--movetempfilespace) or transaction files (--movetransfilespace) to a new filespace can in some circumstances cause OID inconsistencies. Workaround: Do not run more than one gpfilespace command concurrently with itself. If an OID inconsistency is introduced gpfilespace --movetempfilespace or gpfilespace --movetransfilespace can be used to revert to the default filespace.
  4. gpfdist shows the error “Address already in use” after successfully binding to socket IPv6.Greenplum supports IPv4 and IPv6. However, gpfdist fails to bind to socket IPv4, and shows the message “Address already in use”, but binds successfully to socket IPv6. 
  5. The gadget spec URL could not be found
The gadget spec URL could not be found

gpfdist error code = 104 (Connection reset by peer)

posted Nov 19, 2014, 6:12 AM by Sachchida Ojha   [ updated Aug 17, 2016, 5:47 AM by Sachi Ojha ]

The connection reset by peer errors can occur in situations where there is high network packet loss. It may be due etl hosts are exhausting their TCP listening queue.

Setting the somaxconn to 1024 is a low risk operation and is typically recommended for web server applications which gpfdist essentially is. When the kernels TCP listening queue is exhausted the kernel will reject new incoming tcp sessions. If the backlog argument is greater than the value in /proc/sys/net/core/somaxconn, then it is silently truncated to that value.

Setting net.core.somaxconn to values higher than default is only needed on very-very loaded servers where connection rate is so high/bursty that having 128 (in BSDs even more: 128 backlog + 64 half-open) concurrent connections is not considered abnormal or when you need to delegate definition of what is normal to people writing application or it's config.
Some administrators use high net.core.somaxconn to hide problems with their services, so from user's point of view process stall would look like a latency spike instead of connection interrupted/timeout (controlled by net.ipv4.tcp_abort_on_overflow in Linux).

Real cause is either slow processing of some requests (e.g. some single threaded blocking server) or insufficient number of worker threads/processes in software (e.g. multi- process/threaded blocking software like apache)
PS. Also as listen(2) manual says - net.core.somaxconn acts only upper boundary for an application which is free to choose something smaller (usually set in app's config), though some apps just use listen(fd, -1) which means set backlog to the max.

PPS. Sometimes it's preferable to fail fast and let the load-balancer to do it's job than to make user wait - for that purpose we set net.core.somaxconn to some high values like 4096, but limit application backlog to something small like 10 and set net.ipv4.tcp_abort_on_overflow to 1.


gp_external_max_segs = 64

/etc/sysctl.conf set on ETL nodes only:
net.core.somaxconn = 1024

Changing the somaxconn without rebooting:
echo 1024 > /proc/sys/net/core/somaxconn
The gadget spec URL could not be found

1-1 of 1