IMPALA-6792: Fail status reporting if coordinator refuses connections
authorSailesh Mukil <sailesh@cloudera.com>
Tue, 3 Apr 2018 21:24:21 +0000 (14:24 -0700)
committerImpala Public Jenkins <impala-public-jenkins@gerrit.cloudera.org>
Wed, 11 Apr 2018 22:56:00 +0000 (22:56 +0000)
commitda3437a31b28c7fe598baf0f81e780e7f1dc82d5
treed1587c926904d496a1b41118ebbffd83e642e2a3
parentbd63208bfcfcfa893b979e76358cd40f71114979
IMPALA-6792: Fail status reporting if coordinator refuses connections

The ReportExecStatusAux() function is run on a dedicated thread per
fragment instance. This thread will run until the fragment instance
completes executing.

On every attempt to send a report to the coordinator, it will attempt
to send up to 3 RPCs. If all 3 of them fail, then the fragment instance
will cancel itself.

However, there is one case where a failure to send the RPC will not
be considered a failed RPC. If when we attempt to obtain a new
connection, we end up creating a new connection
(via ClientCache::CreateClient()) instead of getting a previously
cached connection, and this new connection fails to even Open(),
it will not be counted as a RPC failure.

This patch counts such an error as a failed RPC too.

This patch also clarifies some of the error log messages and introduces
a flag to control the sleep interval between failed ReportExecStatus RPC
retries.

Change-Id: If668838f99f78b5ffa713488178b2eb5799ba220
Reviewed-on: http://gerrit.cloudera.org:8080/9916
Reviewed-by: Sailesh Mukil <sailesh@cloudera.com>
Tested-by: Impala Public Jenkins
be/src/runtime/query-state.cc