Impala query memory estimates are wrong for a SELECT query with LIMIT clause

Impala query memory estimates are wrong for a SELECT query with LIMIT clause

This article explains the workarounds to by pass the Impala memory estimation issue when doing a simple “SELECT * FROM table LIMIT 10” query. A request memory estimate error is being thrown if the Impala query has ‘LIMIT’ clause.
SELECT * FROM rating LIMIT 1;
ERROR: Rejected query from pool root.Impala_pool1 : request memory estimate 24.00 GB is greater than pool limit 19.53 GB.
The same query is working properly if the “LIMIT” clause is removed. This happens when impala query has LIMIT clause with value less than 100, and Impala will try to run the query on a single node (the coordinator itself), and there is currently a bug in Impala that does wrong memory estimation in this scenario, see: IMPALA-2841. And it affects the latest CDH5.5.x release (at the time of writing). There are two way to by pass the error: 1) change to LIMIT >= 100. For example: SELECT * FROM table_name LIMIT 100; This will force Impala to run query as normal, rather than on a single node, so the memory estimation will be correct. 2) set MEM_LIMIT to a reasonable value. For example: SET MEM_LIMIT=1g; Once MEM_LIMIT is set, it overrides the automatic memory estimate from Impala, so the estimation will be skipped and the bug will be by passed.

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!