Recently I was working on an issue that Oozie was not able to pick up Spark’s configuration and caused job to fail. The reason that I know it was not loading Spark’s configuration was because spark had “spark.authenticate=true” set in its configuration file under file /etc/spark/conf/spark-defaults.conf.
$ head /etc/spark/conf/spark-defaults.conf
And I confirmed Oozie job failure can be resolved by adding “–conf spark.authenticate=true” into in the workflow.xml file. In theory, if Spark already has the setting, then Oozie should just pick it up.
By checking Oozie’s configuration file oozie-site.xml, I noticed that the setting that is required to load Spark configuration is missing: oozie.service.SparkConfigurationService.spark.configurations. Without this setting, Oozie will not be able to load those settings and apply to job for Spark Action.
To remedy this, it will be easy if you are using Cloudera Manager, simply go to:
Cloudera Manager > Oozie > Configuration > search for “Spark on Yarn Service”
Then select “Spark” instead of “none” and restart Oozie.
You can then go to oozie-site.xml file for Oozie’s process after restarting and confirm that below configs present:
After above change, Oozie should pick up Spark’s default configurations by default without the need to manually specify for every Spark Action.