Spark jobs failed with delegation token renewal error

Spark jobs failed with delegation token renewal error

An Oozie Spark job failed with the following error:
Job aborted due to stage failure: Task 103 in stage 194576.0 failed 4 times, most recent failure: Lost task 103.3 in stage 194576.0 
(TID 119674041, ): org.apache.hadoop.ipc.RemoteException($InvalidToken): 
token (token for sparkpse: HDFS_DELEGATION_TOKEN owner=@HADOOP.CHARTER.COM, renewer=yarn, realUser=, issueDate=1482494610879, 
maxDate=1483099410879, sequenceNumber=274718, masterKeyId=166) can't be found in cache 
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke( 
at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
This is caused by long running Spark job in a kerberized environment the checkpointing fails as Token is not renewed properly. The workaround is to add “–conf spark.hadoop.fs.hdfs.impl.disable.cache=true” to Spark job command line parameters to disable the token cache from spark side.


Leave a Reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!