Sqoop Teradata import truncates timestamp nano seconds information – Hadoop Troubleshooting Guide

In the last few weeks, I have been dealing the Teradata Support for a Sqoop issue that the value with Timestamp(6) data type in Teradata will lost last 3 digits of nano seconds after importing into HDFS using Sqoop command. The following test case validates the issue:

Create a table in Teradata:

CREATE TABLE vmtest.test (a integer, b timestamp(6) 
FORMAT 'yyyy-mm-ddbhh:mi:ss.s(6)') PRIMARY INDEX (a);

INSERT INTO vmtest.test VALUES (1, '2016-04-05 11:27:24.699022');

And sqoop import command:

sqoop import --connect jdbc:teradata:///database=vmtest \
    --username dbc --password dbc --target-dir /tmp/test --delete-target-dir \
    --as-textfile --fields-terminated-by "," --table test

data stored in HDFS as below:

[cloudera@quickstart ~]$ hadoop fs -cat /tmp/test/part*
1,2016-04-05 11:27:24.699

Notice the nano seconds part truncated from 699022 to 699 This is caused by a bug in TDCH (TeraData Connector for Hadoop) from Teradata, which is used by Cloudera Connector Powered by Teradata. The workaround is to make sure that the timestamp value is in String format before passing it to Sqoop, so that no conversion will happen. Below Sqoop command is an example:

sqoop import --connect jdbc:teradata:///database=vmtest \
    --username dbc --password dbc --target-dir /tmp/test \
    --delete-target-dir --as-textfile --fields-terminated-by "," \
    --query "SELECT a, cast(cast(b as format 'YYYY-MM-DD HH:MI:SS.s(6)') as char(40)) from test WHERE \$CONDITIONS" \
    --split-by a

After import, data is stored in HDFS correctly:

[cloudera@quickstart ~]$ hadoop fs -cat /tmp/test/part*
1,2016-04-0511:27:24.699022

M	T	W	T	F	S	S
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

Other posts that you might also be interested:

Leave a Reply Cancel reply