Tag: <span>Parquet</span>

Tag: Parquet

Impala query failed with error: “Incompatible Parquet Schema”

Yesterday, I was dealing with an issue that when running a very simple Impala SELECT query, it failed with “Incompatible Parquet schema” error. I have confirmed the following workflow that triggered the error: Parquet file is created from external library Load the parquet file into Hive/Impala table Query the table …

Loading

Impala Reported Corrupt Parquet File After Failed With OutOfMemory Error

Recently I was dealing with an issue that impala reported Corrupt Parquet File after it failed with OutOfMemory error, however, if it does not fail, no corruption will be reported. See below error message reportd in Impala Daemon logs: This is reported in the upstream JIRA: IMPALA-5197, this can happen …

Loading

How to redirect parquet’s log message into STDERR rather than STDOUT

This article explains the steps needed to redirect parquet’s log message from STDOUT to STDERR, so that the output of Hive result will not be polluted should the user wants to capture the query result on command line. In Parquet’s code based, it writes its logging information directly into STDOUT, …

Loading

Unable to query Hive parquet table after altering column type

Currently Hive does not support changing column types for parquet tables, due to performance issues. I have developed the following test case to prove the bug: The following is the output: After some research I have found the following JIRAs for this issue: HIVE-6784 and HIVE-12080. The original issue HIVE-6784 …

Loading

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!