This article explains how to trouble shoot the issue when Hive METASTORE dies with no apparent error message in the log.
Cloudera Manager (short for CM) has a mechanism to kill the processes that have reached its memory allocations. So when this scenario happens, it usually means that Hive METASTORE has memory issues and the process got killed by CM.
The reason that no errors in the server logs is because the server process got killed by CM agent process and it did not get any chance to log its errors.
The actual errors captured by CM is located in /var/run/cloudera-scm-agent/process/XXX-hive-HIVEMETASTORE/logs/*, where XXX is a number that represents each process ID that’s allocated to the server daemon every time it is restarted.
If you can find the following error in those files:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/hive_hive-HIVEMETASTORE-592629fdad78ff2feb384c5dfbac4c5b_pid$$.hprof …
Heap dump file created [102740174 bytes in 0.777 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError=”/usr/lib64/cmf/service/common/killparent.sh”
# Executing /bin/sh -c “/usr/lib64/cmf/service/common/killparent.sh”…
you can confirm that METASTORE is hitting the OOM error.
To fix it, simply log into CM and increase the HIVEMETASTORE Heaps to 2-4 GB depending on your cluster size, see screenshot below:
And then restart the server afterwards.
The same trouble shooting technique can also apply to Hive Server 2 and other server processes that are managed by CM.