Oracle Number(1,0) field maps to Boolean in Spark

Oracle Number(1,0) field maps to Boolean in Spark

Recently I was working on a issue that when importing data from Oracle into Hive table using Spark, the data of type Number(1,0) in Oracle was implicitly converted into Boolean data type. Before was on CDH5.5.x, it worked correctly, however, after upgrading to CDH5.10.x, the issue happened. See below Hive table output after import: Before upgrade:
SELECT column1 FROM test_table limit 2;
0
1
After upgrade:
SELECT column1 FROM test_table limit 2;
False
True
After digging further, I discovered that this change was introduced by SPARK-16625, due to the integration required for Spark to work correctly with Oracle. Since the change was intended, the following is the suggested workarounds:
  1. Cast the Boolean to a type of your choosing in the Spark code, before writing it to the Hive table
  2. Make sure that the mapped column in Hive is also of compatible data type, for example, TinyInt, rather than String, so that the value of True or False will be mapped to 1 or 0 respectively, rather than string value of “True” or “False” (the reason that the column got “False” and “True” values were because the column was of String data type)
Hope above helps.

Loading

One comment

  1. Hua Lin

    NUMBER(1,0) has 10 values: 0 to 9. The suggested solution will map the values to 0 and 1 for all non-zero values. Better solution is to cast in the select clause.
    SELECT cast(column1 as INTEGER) column1 from test_table;

Leave a Reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!