17/05/03 03:20:06 INFO yarn.ApplicationMaster: Waiting for spark context initialization 17/05/03 03:20:06 INFO yarn.ApplicationMaster: Waiting for spark context initialization ... 17/05/03 03:20:06 INFO client.RemoteDriver: Connecting to:We can see from above log that the timeout happened 2 seconds after the attempted connection, which does not make too much sense that the timeout value is such short period. After digging further into code, it turned out that this timeout is controlled by hive’s setting called hive.spark.client.connect.timeout. The default value for this setting is 1000ms, which is only 1 second, which explained the cause. This issue only happens when cluster is on high load and HiveServer2 is not able to respond back to ApplicationMaster within 1 second and then connection will timeout. To by pass this issue, we can simply increase this timeout value to, say, 5 seconds::35915 17/05/03 03:20:08 ERROR yarn.ApplicationMaster: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:37) at org.apache.hive.spark.client.RemoteDriver. (RemoteDriver.java:156) at org.apache.hive.spark.client.RemoteDriver.main(RemoteDriver.java:556) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542) Caused by: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915 at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe$1.run(AbstractNioChannel.java:220) at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:120) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) 17/05/03 03:20:08 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915) 17/05/03 03:20:16 ERROR yarn.ApplicationMaster: SparkContext did not initialize after waiting for 100000 ms. Please check earlier log output for errors. Failing the application. 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: java.util.concurrent.ExecutionException: io.netty.channel.ConnectTimeoutException: connection timed out: /172.19.22.11:35915) 17/05/03 03:20:16 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1492040605432_11445 17/05/03 03:20:16 INFO util.ShutdownHookManager: Shutdown hook called
SET hive.spark.client.connect.timeout=5000; # Your query hereI have reported such issue upstream in JIRA: HIVE-16794, and I will submit a patch to increase this timeout setting soon.