$ test=`hive -e "SELECT * FROM default.test"` $ echo $test 2 5 4 3 2 1 5 4 3 22. However, if you do the same thing for Parquet table, the result is different:
$ test_parquet=`hive -e "SELECT * FROM default.test_parquet"` $ echo $test_parquet 2 5 4 3 2 1 5 4 3 2 16/08/2016 5:55:32 PM WARNING: parquet.hadoop.ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl 16/08/2016 5:55:32 PM INFO: parquet.hadoop.InternalParquetRecordReader: RecordReader initialized will read a total of 10 records. 16/08/2016 5:55:32 PM INFO: parquet.hadoop.InternalParquetRecordReader: at row 0. reading next block 16/08/2016 5:55:32 PM INFO: parquet.hadoop.InternalParquetRecordReader: block read in memory in 15 ms. row count = 10So if an application tries to use the variable $test_parquet, it will cause issues due to those WARNING messages. This problem has been reported in upstream JIRA: HIVE-13954, however, at the time of writing (CDH5.8.1), this JIRA has not been backported into CDH yet. To workaround the problem, follow the steps below:
- Save the content of the following to a file:
#=============== parquet.handlers= java.util.logging.ConsoleHandler .level=INFO java.util.logging.ConsoleHandler.level=INFO java.util.logging.ConsoleHandler.formatter=java.util.logging.SimpleFormatter java.util.logging.SimpleFormatter.format=[%1$tc] %4$s: %2$s - %5$s %6$s%n #===============
and put it anywhere you like on the client machine that you will run Hive CLI, in my test I put it under /tmp/parquet/parquet-logging2.properties - run the following command on shell before you run Hive CLI:
export HADOOP_CLIENT_OPTS="-Djava.util.logging.config.file=/tmp/parquet/parquet-logging2.properties"
please change the path to the properties file accordingly -
run your Hive CLI command:
test_parquet=`hive -e "SELECT * FROM default.test_parquet"`
the output will be saved in “$test_parquet” as expected