- Parquet file is created from external library
- Load the parquet file into Hive/Impala table
- Query the table through Impala will fail with below error message
incompatible Parquet schema for column 'db_name.tbl_name.col_name'. Column type: DECIMAL(19, 0), Parquet schema:\noptional byte_array col_name [i:2 d:1 r:0]
- The same query works well in Hive
int32
: for 1 <= precision <= 9int64
: for 1 <= precision <= 18; precision < 10 will produce a warningfixed_len_byte_array
: precision is limited by the array size. Lengthn
can store <=floor(log_10(2^(8*n - 1) - 1))
base-10 digitsbinary
:precision
is not limited, but is required. The minimum number of bytes to store the unscaled value should be used.
Hi Eric,
I have exactly this issue while I am writing a partition of a parquet table via Spark. My schema is a combination of strings and decimal(38,17) datatypes.
Can you elaborate on the supported specs for Decimal column ?
Which one of those is not supported by Impala ?
Thank you very much for your help.
Best,
Louis
Hi Louis,
Thanks for visiting my blog and apologies for the delayed response. Regarding the Decimal specs for Impala, you can refer to below online doc:
https://www.cloudera.com/documentation/enterprise/latest/topics/impala_decimal.html
If it still does not answer your question, please let me know.
Cheers
Hello Eric,
We recently upgraded cloudera from 5.6.0 to 5.8.3 version after that we are facing two issues. since we are out of support from cloudera, we are expecting some resolution steps from you.
1. while running insert overwrite Query. throwing an issue like “Spilling has been disabled due to no usable scratch space.
2. parq’ has an incompatible Parquet schema for column ”. Column type: STRING, Parquet schema:
Hi Krish,
Thanks for visiting my blog and posting questions.
For 1, this means you have not setup Impala’s scratch directory correctly, so disk spilling feature won’t work.
If you are using Cloudera Manager, please go to CM > Impala > Configuration > “Impala Daemon Scratch Directories” and confirm if any values has been set?
For 2, This looks like you have mismatched column type between Impala/Hive and Parquet file. Your comment seemed to be cut of, as I don’t see anything after “Parquet: schema:”. Can you check the data type of that column in Parquet and then update the table in Hive/Impala to match it?
Cheers
Eric
Eric, thanks for the reply.
For 1, this means you have not setup Impala’s scratch directory correctly, so disk spilling feature won’t work.
Response: scratch directories has been set and it was working fine with cdh5.6 version after upgrade it to 5.8.3. some of the insert overtwirte queries are throwing the spilled issue.
we have already set like this
/datanode1/impala/impalad until /datanode10/
Hello Eric,
i found something after upgrade of cloudera manager from 5.6 to 5.8.3 all the components have been upgraded to 5.8.3 but supervisord still having 5.6 versionjust like this.
CDH 5.8.3, Parcels — After upgrade
Supervisord 3.0-cm5.6.0 —After upgrade
Basically to start and stop the process, we rely on an open source system called supervisord. It takes care of redirecting log files, notifying us of process failure, setuid’ing to the right user.
SO, i’m assuming when we run insert overwrite this supervisord tried to set the permission and it allows to set ownership to the scratch directory as per the cdh 5.8.3 but here since supervisord is still having 5.6.0 it has breakdown with 5.8.3 and listening to clouder agent 5.6.0 . Is this reason for this issue “Spilling has been disabled due to no usable scratch space” ?
please let me know the if my understanding is correct? if yes, can we do hard restart of cloudera agent to reflect the changes of supervisord to 5.8.3 from 5.6.0?
Hi Krish,
I highly doubt that supervisord version issue will cause Impala to behave like this, but regardless you need to get supervisord to the correct version. Yes, try to hard restart it to see if it helps.
Do you have impala daemon log with the error message that can be shared with me? Like putting it temporarily on dropbox shared folder and remove it after it is done?
I just would like to check exactly what Impala says.
Cheers
Eric
Do you want me to share all the impala daemon logs? if yes, let me know your email id i can share. I don’t see any dropbox shared folder here.