Oozie Spark Actions Fail with Error “Spark config without ‘=’: –conf”

Oozie Spark Actions Fail with Error “Spark config without ‘=’: –conf”

Currently Oozie provides easy interface for Spark1 jobs via Spark1 action, so that user does not have to embed spark-submit into shell action. However, recently I have discovered an issue in Oozie that it has a bug to parse Spark configurations and incorrectly generated a spark-submit command to submit Spark jobs. By checking Oozie’s launcher stderr.log, I discovered below error:
Error: Spark config without '=': --conf
Run with --help for usage help or --verbose for debug output
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], exit code [1]
Also, by checking the stdout.log, I can see below incorrect command for Spark:
  --conf
  spark.yarn.security.tokens.hive.enabled=false
  --conf
  --conf
  spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*:$PWD/*
  --conf
  spark.driver.extraClassPath=$PWD/*
You can see that there were double “–conf” generated by Oozie for Spark command. This explains the error we saw earlier about “Spark config without ‘=’: –conf”. This is caused by a known issue reported upstream: OOZIE-2923. This is a bug on Oozie side that it wrongly parses below configs:
--conf spark.executor.extraClassPath=...
--conf spark.driver.extraClassPath=...
The workaround is to remove the “–conf” in front of the first instance of spark.executor.extraClassPath, so that it will be added by Oozie. For example, if you have below :

--files /etc/hive/conf/hive-site.xml 
--driver-memory 4G 
--executor-memory 2G 
... 
--conf spark.yarn.security.tokens.hive.enabled=false 
--conf spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*

Simply remove the first –conf before spark.executor.extraClassPath, so it becomes:

--files /etc/hive/conf/hive-site.xml 
--driver-memory 4G 
--executor-memory 2G 
... 
--conf spark.yarn.security.tokens.hive.enabled=false  
spark.executor.extraClassPath=/opt/cloudera/parcels/CDH/lib/hive/lib/*

This will allow you to avoid the issue. However, the downside is that if you decide to upgrade to a version of CDH that contains the fix to this issue, you will need to re-add “–conf” back. OOZIE-2923 is affecting CDH5.10.x, CDH5.11.0 and CDH5.11.1. And CDH5.11.2 and CDH5.12.x and above contains the fix.

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!