UNION ALL returns zero record with aggregate functions in SELECT statement

UNION ALL returns zero record with aggregate functions in SELECT statement

This article explains how to solve the issue that UNION ALL returns zero record with aggregate functions in SELECT statement The following is the test case:
> CREATE TABLE test (a int);

> INSERT OVERWRITE TABLE test VALUES (1);

> SET hive.optimize.union.remove=true;
No rows affected (0.01 seconds)

> SET hive.mapred.supports.subdirectories=true;
No rows affected (0.007 seconds)

> SELECT COUNT(1) FROM test UNION ALL SELECT COUNT(1) FROM test;
+----------+--+
| _u1._c0  |
+----------+--+
+----------+--+
UNION ALL without COUNT function will work as expected:
> select * from test UNION ALL SELECT * FROM test;
+--------+--+
| _u1.a  |
+--------+--+
| 1      |
| 1      |
+--------+--+
Run the same query without setting hive.mapred.supports.subdirectories and hive.optimize.union.remove to true will give correct result:
> set hive.optimize.union.remove;
+-----------------------------------+--+
|                set                |
+-----------------------------------+--+
| hive.optimize.union.remove=false  |
+-----------------------------------+--+

> SELECT COUNT(1) FROM test UNION ALL SELECT COUNT(1) FROM test;
+----------+--+
| _u1._c0  |
+----------+--+
| 1        |
| 1        |
+----------+--+
This issue is now tracked by JIRA: HIVE-12788 hive.optimize.union.remove is OFF by default at the time of writing, and it requires that hive.mapred.supports.subdirectories to be ON to work properly because It creates sub-directories in the final output. If hive.optimize.union.remove is ON, most likely that someone has set it ON manually by either updating the hive-site.xml or saved in the safety valve in Cloudera Manager’s Hive Configuration page:

  hive.optimize.union.remove
  true
  

Simply remove it from the hive-site.xml or Cloudera Manager’s safety valve for Hive Client and HiveServer2 should resolve the issue. Check the following in Cloudera Manager to make sure all of them do not have references to hive.optimize.union.remove: Cloudera Manager > Hive > Configuration > Hive Service Advanced Configuration Snippet (Safety Valve) for hive-site.xml Cloudera Manager > Hive > Configuration > HiveServer2 Advanced Configuration Snippet (Safety Valve) for hive-site.xml Cloudera Manager > Hive > Configuration > Hive Client Advanced Configuration Snippet (Safety Valve) for hive-site.xml If you don’t use Cloudera Manager, you can simply remove the setting from hive-site.xml.

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!