Recently I discovered that the performance logs were missing from both HiveServer2 and HiveMetaStore server logs. This makes troubleshooting performance related issue very hard. The log message that I am expecting is something like below:
2020-01-02 08:30:26,450 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-898872]: 2020-01-02 08:30:26,507 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-898872]:
Above log tells me that the getSplit operation took 57 milli-seconds to complete.
2020-01-10 09:59:37,151 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-5-thread-28]: 2020-01-10 09:59:37,157 INFO org.apache.hadoop.hive.ql.log.PerfLogger: [pool-5-thread-28]:
Above log tells me that HMS spent 6 milli-seconds to get list of tables for a certain database.
If those numbers against “duration” is high, we can know exactly at what stage the slowness is from during troubleshooting steps. However, this information is missing from HiveServer2 and HiveMetaStore logs in CDH6.
To remedy this, follow steps below:
- Go to Cloudera Manager home page
- Click through to Hive Service and then Configuration page
- Search for below two configurations:
- Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve)
- HiveServer2 Logging Advanced Configuration Snippet (Safety Valve)
- Enter below contents into the textarea of above mentioned settings:
- Save then restart Hive Services
- Check both HS2 and HMS logs to confirm that performance logging are in place