How to determine the cause of a simple COUNT(*) query to run slow
When a simple count query in Hive like below: with 2GB of data takes almost 30 minutes to finish in a reasonable sized cluster like 10 nodes, how do you determine the cause of the slowness? There are many possible causes of this issue, however, I think the following three …