Month: <span>May 2015</span>

Month: May 2015

Timestamp stored in Parquet file format in Impala Showing GMT Value

This article explains why Impala and Hive return different timestamp values on the same table that was created and value inserted from Hive. It also outlines the steps to force Impala to apply local time zone conversion when reading timestamp field stored in Parquet file format. When Hive stores a timestamp …

How to control the number of mappers required for a Hive query

This article explains how to increase or decrease the number of mappers required for a particular Hive query. Setting both “mapreduce.input.fileinputformat.split.maxsize” and “mapreduce.input.fileinputformat.split.minsize” to the same value in most cases will be able to control the number of mappers (either increase or decrease) used when Hive is running a particular …

Hive Shows NULL Value to New Column Added to a Partitioned Table With Existing Data

Today I discovered a bug that Hive can not recognise the existing data for a newly added column to a partitioned external table. In this post, I explained the steps to re-produced as well as the workaround to the issue. Firstly I prepared the data in text format call test.txt, …

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!