Oracle Number(1,0) field maps to Boolean in Spark – Hadoop Troubleshooting Guide

Recently I was working on a issue that when importing data from Oracle into Hive table using Spark, the data of type Number(1,0) in Oracle was implicitly converted into Boolean data type. Before was on CDH5.5.x, it worked correctly, however, after upgrading to CDH5.10.x, the issue happened. See below Hive table output after import: Before upgrade:

SELECT column1 FROM test_table limit 2;
0
1

After upgrade:

SELECT column1 FROM test_table limit 2;
False
True

After digging further, I discovered that this change was introduced by SPARK-16625, due to the integration required for Spark to work correctly with Oracle. Since the change was intended, the following is the suggested workarounds:

Cast the Boolean to a type of your choosing in the Spark code, before writing it to the Hive table
Make sure that the mapped column in Hive is also of compatible data type, for example, TinyInt, rather than String, so that the value of True or False will be mapped to 1 or 0 respectively, rather than string value of “True” or “False” (the reason that the column got “False” and “True” values were because the column was of String data type)

Hope above helps.

One comment

Hua Lin

September 24, 2017 at 4:26 AM 7 years ago

NUMBER(1,0) has 10 values: 0 to 9. The suggested solution will map the values to 0 and 1 for all non-zero values. Better solution is to cast in the select clause.
SELECT cast(column1 as INTEGER) column1 from test_table;

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

Other posts that you might also be interested:

One comment

Leave a Reply Cancel reply