Impala Query Failed with ERROR  “AnalysisException: ORDER BY expression not produced by aggregation output”

Impala Query Failed with ERROR “AnalysisException: ORDER BY expression not produced by aggregation output”

Recently, I discovered a bug in Impala that when you are using Expression in the ORDER BY clause, the query will fail with below error message:
ERROR: AnalysisException: ORDER BY expression not produced by aggregation output (missing from GROUP BY clause?): (CASE WHEN TRUE THEN 1 ELSE a END)
Customer used a very complicated query, and I managed to simplify it to look something like below:
DROP TABLE IF EXISTS test;
CREATE TABLE test (a int);

SELECT   ( 
    CASE 
       WHEN (1 =1) 
       THEN 1
       ELSE a
    end) AS b
FROM  test 
GROUP BY 1 
ORDER BY ( 
    CASE 
       WHEN (1 =1) 
       THEN 1
       ELSE a
    end);
This can be re-produced from CDH5.13.x onward. Since I can also re-produce in latest CDH 5.15.x at the time of writing, I went ahead to create a bug report in upstream JIRA: IMPALA-7083. As you can see that mentioned in the JIRA, the workaround is to disable ENABLE_EXPR_REWRITES via:
SET ENABLE_EXPR_REWRITES=false;
This is on by default in latest release. Another workaround, which is a better approach in my opinion, is to replace the expression using number:
DROP TABLE IF EXISTS test;
CREATE TABLE test (a int);

SELECT   ( 
    CASE 
       WHEN (1 =1) 
       THEN 1
       ELSE a
    end) AS b
FROM  test 
GROUP BY 1 
ORDER BY 1;
This will also makes query simpler and easier to read. However, lots of users use third party software like SAS etc to run queries and they have no control over the query generation, then setting ENABLE_EXPR_REWRITES to false is the way to go. Hope above helps.

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!