Yarn Job Failed with Error: “Split metadata size exceeded 10000000” – Hadoop Troubleshooting Guide

When you run a really big job in Hive that failed with the following error:

2016-06-28 18:55:36,830 INFO [Thread-58] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Setting job diagnostics to Job init failed : org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_1465344841306_1317
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1568)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1432)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.transition(JobImpl.java:1390)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:996)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.handle(JobImpl.java:138)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$JobEventDispatcher.handle(MRAppMaster.java:1289)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1057)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1500)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.io.IOException: Split metadata size exceeded 10000000. Aborting job job_1465344841306_1317
at org.apache.hadoop.mapreduce.split.SplitMetaInfoReader.readSplitMetaInfo(SplitMetaInfoReader.java:53)
at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl$InitTransition.createSplits(JobImpl.java:1563)
... 17 more

This indicated that the value for mapreduce.job.split.metainfo.maxsize is too small for your job (default value of 10000000). There are two options to fix this: 1. Set the value of mapreduce.job.split.metainfo.maxsize to be “-1” (unlimited) specifically for this job just before running it:

SET mapreduce.job.split.metainfo.maxsize=-1;

This should remove the limit, however, be warned that it will effectively let YARN to create unlimited metadata splits, if there is not enough resources on your cluster, it could have the potential to bring down the host. 2. The safer way is to increase the value to maybe double the value of default, which is 10000000:

SET mapreduce.job.split.metainfo.maxsize=20000000;

You could gradually increase the value and monitor your cluster to make sure that it will not bring down your machines. I have seen other posts on Google that people were suggesting to set the value of mapreduce.job.split.metainfo.maxsize in mapred-site.xml configuration file. In my opinion, this only affect small number of queries when running against very BIG data set, so it is better to set this value at job level, so that no cluster restart will be required. Please note that if you are using MapReduce V1, the setting should be mapreduce.jobtracker.split.metainfo.maxsize instead, which does the same thing. Hope this helps.

Other posts that you might also be interested:

Leave a Reply Cancel reply