Spark jobs failed with delegation token renewal error – Hadoop Troubleshooting Guide

An Oozie Spark job failed with the following error:

Job aborted due to stage failure: Task 103 in stage 194576.0 failed 4 times, most recent failure: Lost task 103.3 in stage 194576.0 
(TID 119674041, ): org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): 
token (token for sparkpse: HDFS_DELEGATION_TOKEN owner=@HADOOP.CHARTER.COM, renewer=yarn, realUser=, issueDate=1482494610879, 
maxDate=1483099410879, sequenceNumber=274718, masterKeyId=166) can't be found in cache 
at org.apache.hadoop.ipc.Client.call(Client.java:1471) 
at org.apache.hadoop.ipc.Client.call(Client.java:1408) 
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) 
at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)

This is caused by long running Spark job in a kerberized environment the checkpointing fails as Token is not renewed properly. The workaround is to add “–conf spark.hadoop.fs.hdfs.impl.disable.cache=true” to Spark job command line parameters to disable the token cache from spark side.

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Other posts that you might also be interested:

Leave a Reply Cancel reply