Recently I discovered that the performance logs were missing from both HiveServer2 and HiveMetaStore server logs. This makes troubleshooting performance related issue very hard. The log message that I am expecting is something like below:

HiveServer2 Log:

2020-01-02 08:30:26,450 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-898872]: 
2020-01-02 08:30:26,507 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-898872]: 

Above log tells me that the getSplit operation took 57 milli-seconds to complete.

HiveMetaStore Log:

2020-01-10 09:59:37,151 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [pool-5-thread-28]: 
2020-01-10 09:59:37,157 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [pool-5-thread-28]: 

Above log tells me that HMS spent 6 milli-seconds to get list of tables for a certain database.

If those numbers against “duration” is high, we can know exactly at what stage the slowness is from during troubleshooting steps. However, this information is missing from HiveServer2 and HiveMetaStore logs in CDH6.

To remedy this, follow steps below:

  1. Go to Cloudera Manager home page
  2. Click through to Hive Service and then Configuration page
  3. Search for below two configurations:
    1. Hive Metastore Server Logging Advanced Configuration Snippet (Safety Valve)
    2. HiveServer2 Logging Advanced Configuration Snippet (Safety Valve)
  4. Enter below contents into the textarea of above mentioned settings:
  5. Save then restart Hive Services
  6. Check both HS2 and HMS logs to confirm that performance logging are in place

Leave a Reply

Your email address will not be published. Required fields are marked *