Alternative Timestamp Support in Hive (ISO-8601)

Alternative Timestamp Support in Hive (ISO-8601)

Hive does not support for ISO-8601 timestamp format, like this “2017-02-16T11:24:29.000Z” by default. Check the following test case: 1. Create a file with the following content:
2017-02-16T11:24:29.000Z
2017-02-16 11:24:29
2. Put the file in HDFS:
hadoop fs -put test.txt /tmp/test/data
3. Create an external table links to it:
CREATE EXTERNAL TABLE ts_test (a timestamp) ROW FORMAT DELIMITED FIELDS TERMINATED by ',' LOCATION '/tmp/test/data';
4. When you select the table, first record will be NULL:
+------------------------+--+
|       ts_test.a        |
+------------------------+--+
| NULL                   |
| 2017-02-16 11:24:29.0  |
+------------------------+--+
This is due to Hive not able to recognise timestamp format of “2017-02-16T11:24:29.000Z”. As of CDH5.7.x or Hive 1.2, Hive supports reading alternative timestamp formats, see HIVE-9298 To make it work, run the following Hive query:
ALTER TABLE ts_test SET SERDEPROPERTIES ("timestamp.formats"="yyyy-MM-dd'T'HH:mm:ss.SSSZ");
Then data can be read correctly by Hive:
+------------------------+--+
|       ts_test.a        |
+------------------------+--+
| 2017-02-16 03:24:29.0  |
| 2017-02-16 11:24:29.0  |
+------------------------+--+
The different values is due to timezone conversion (Z is for UTC). Hive treats “2017-02-16T11:24:29.000Z” as UTC and then converts it to server’s local time, in the case of second value of “2017-02-16 11:24:29”, no conversion is done so original value is returned. Hope this helps.

Loading

6 Comments

    1. Eric Lin

      Hi Infinity,

      Thanks for visiting my blog and posting question.

      I am not sure why Hive adds extra 0 in the end, but you can workaround it by formatting the output like below:
      SELECT date_format(a, ‘YYYY-MM-dd HH:mm:ss’) FROM ts_test;

      Hope it helps.

      Cheers

    1. Eric Lin

      Hi Teekoji,

      Sorry about the delay. That’s a good question and I can’t find a good answer for it. Maybe it won’t work, or you will need to massage the data to be in ‘yyyy-MM-dd hh:mm:ss’ format first.

      Cheers

Leave a Reply to Eric Lin Cancel reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!