Impala query failed with error: “Incompatible Parquet Schema” – Hadoop Troubleshooting Guide

Eric Lin Cloudera August 3, 2017 September 15, 2018

Yesterday, I was dealing with an issue that when running a very simple Impala SELECT query, it failed with “Incompatible Parquet schema” error. I have confirmed the following workflow that triggered the error:

Parquet file is created from external library
Load the parquet file into Hive/Impala table

Query the table through Impala will fail with below error message

incompatible Parquet schema for column 'db_name.tbl_name.col_name'. 
Column type: DECIMAL(19, 0), Parquet schema:\noptional byte_array col_name [i:2 d:1 r:0]

The same query works well in Hive

This is due to impala currently does not support all decimal specs that are supported by Parquet. Currently Parquet supports the following specs:

int32: for 1 <= precision <= 9
int64: for 1 <= precision <= 18; precision < 10 will produce a warning
fixed_len_byte_array: precision is limited by the array size. Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits
binary: precision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.

Please refer to Parquet Logical Type Definitions page for details. However, Impala only supports fixed_len_byte_array, but no others. This has been reported in the upstream JIRA: IMPALA-2494 The only workaround for now is to create a parquet file that will use supported specs for Decimal column, or simply create parquet file through either Hive or Impala.

8 Comments

Louis

November 29, 2018 at 1:53 AM 5 years ago

Reply

Hi Eric,

I have exactly this issue while I am writing a partition of a parquet table via Spark. My schema is a combination of strings and decimal(38,17) datatypes.
Can you elaborate on the supported specs for Decimal column ?
Which one of those is not supported by Impala ?
Thank you very much for your help.

Best,

Louis
1. Eric Lin
  
  December 7, 2018 at 5:52 PM 5 years ago
  
  Reply
  
  Hi Louis,
  
  Thanks for visiting my blog and apologies for the delayed response. Regarding the Decimal specs for Impala, you can refer to below online doc:
  
  https://www.cloudera.com/documentation/enterprise/latest/topics/impala_decimal.html
  
  If it still does not answer your question, please let me know.
  
  Cheers
krish

October 25, 2019 at 10:53 PM 5 years ago

Reply

Hello Eric,

We recently upgraded cloudera from 5.6.0 to 5.8.3 version after that we are facing two issues. since we are out of support from cloudera, we are expecting some resolution steps from you.

1. while running insert overwrite Query. throwing an issue like “Spilling has been disabled due to no usable scratch space.
2. parq’ has an incompatible Parquet schema for column ”. Column type: STRING, Parquet schema:
1. Eric Lin
  
  October 26, 2019 at 3:53 PM 5 years ago
  
  Reply
  
  Hi Krish,
  
  Thanks for visiting my blog and posting questions.
  
  For 1, this means you have not setup Impala’s scratch directory correctly, so disk spilling feature won’t work.
  
  If you are using Cloudera Manager, please go to CM > Impala > Configuration > “Impala Daemon Scratch Directories” and confirm if any values has been set?
  
  For 2, This looks like you have mismatched column type between Impala/Hive and Parquet file. Your comment seemed to be cut of, as I don’t see anything after “Parquet: schema:”. Can you check the data type of that column in Parquet and then update the table in Hive/Impala to match it?
  
  Cheers
  Eric
krish

October 29, 2019 at 4:07 PM 5 years ago

Reply

Eric, thanks for the reply.

For 1, this means you have not setup Impala’s scratch directory correctly, so disk spilling feature won’t work.
Response: scratch directories has been set and it was working fine with cdh5.6 version after upgrade it to 5.8.3. some of the insert overtwirte queries are throwing the spilled issue.

we have already set like this

/datanode1/impala/impalad until /datanode10/
krish

November 1, 2019 at 2:56 PM 5 years ago

Reply

Hello Eric,

i found something after upgrade of cloudera manager from 5.6 to 5.8.3 all the components have been upgraded to 5.8.3 but supervisord still having 5.6 versionjust like this.

CDH 5.8.3, Parcels — After upgrade
Supervisord 3.0-cm5.6.0 —After upgrade

Basically to start and stop the process, we rely on an open source system called supervisord. It takes care of redirecting log files, notifying us of process failure, setuid’ing to the right user.

SO, i’m assuming when we run insert overwrite this supervisord tried to set the permission and it allows to set ownership to the scratch directory as per the cdh 5.8.3 but here since supervisord is still having 5.6.0 it has breakdown with 5.8.3 and listening to clouder agent 5.6.0 . Is this reason for this issue “Spilling has been disabled due to no usable scratch space” ?

please let me know the if my understanding is correct? if yes, can we do hard restart of cloudera agent to reflect the changes of supervisord to 5.8.3 from 5.6.0?
1. Eric Lin
  
  November 1, 2019 at 3:12 PM 5 years ago
  
  Reply
  
  Hi Krish,
  
  I highly doubt that supervisord version issue will cause Impala to behave like this, but regardless you need to get supervisord to the correct version. Yes, try to hard restart it to see if it helps.
  
  Do you have impala daemon log with the error message that can be shared with me? Like putting it temporarily on dropbox shared folder and remove it after it is done?
  
  I just would like to check exactly what Impala says.
  
  Cheers
  Eric
krish

November 4, 2019 at 11:42 PM 5 years ago

Reply

Do you want me to share all the impala daemon logs? if yes, let me know your email id i can share. I don’t see any dropbox shared folder here.

Other posts that you might also be interested:

8 Comments

Leave a Reply Cancel reply