Query Type: DML Query State: EXCEPTION Query Status: Memory limit exceeded Sql Statement: insert OVERWRITE TABLESELECT id, MIN(_id) AS uuid FROM WHERE id IS NOT NULL AND _id IS NOT NULL GROUP BY id This query was big, and used more than 60GB of memory, in theory with GROUP BY, Impala should trigger disk spilling when it was about to reach the memory limit. However, this was not the case:
BlockMgr - AsyncTotalTime: 0 - BlockWritesOutstanding: 0 - BlocksCreated: 0 - BlocksRecycled: 0 - BufferedPins: 0 - BytesWritten: 0 - InactiveTotalTime: 0 - MaxBlockSize: 8388608 - MemoryLimit: 65283502080 - PeakMemoryUsage: 0 - TotalBufferWaitTime: 0 - TotalEncryptionTime: 0 - TotalIntegrityCheckTime: 0 - TotalReadBlockTime: 0 - TotalTime: 0You can see that BytesWritten is “0”, which indicated that no disk spilling happened. By checking the cluster setting, I found that that disk spilling was disabled in the safety valve in Cloudera Manager:--enable_partitioned_aggregation=false --enable_partitioned_hash_join=false“enable_partitioned_aggregation” controls whether to enable disk spilling when doing aggregation, while “enable_partitioned_hash_join” controls disk spilling when doing a hash join. “enable_partitioned_aggregation” was disabled which caused the mentioned query to fail with Out of Memory exception. To re-enable the disk spilling for impala, following the instructions below if you are using Cloudera Manager: Set the following in the safety valve:--enable_partitioned_aggregation=true --enable_partitioned_hash_join=trueOr even just remove the following:--enable_partitioned_aggregation=false --enable_partitioned_hash_join=falsewill re-enable the disk spilling, as it is by default. We confirmed that the disk spilling happened and query succeeded after the change and restarted Impala:BlockMgr - AsyncTotalTime: 0 - BlockWritesOutstanding: 0 - BlocksCreated: 3613 - BlocksRecycled: 3379 - BufferedPins: 48 - BytesWritten: 2904251616 - InactiveTotalTime: 0 - MaxBlockSize: 8388608 - MemoryLimit: 65283502080 - PeakMemoryUsage: 29098027690 - TotalBufferWaitTime: 0 - TotalEncryptionTime: 0 - TotalIntegrityCheckTime: 0 - TotalReadBlockTime: 6839466872 - TotalTime: 0
✕My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!
This article explains one of the possible reasons that caused Impala query to fail with Out of Memory Exception.
The following is the profile from failed query: