sqoop import -Dsqoop.hbase.add.row.key=true \ --connect jdbc:oracle:thin:@//Result as below::1521/orcl \ --username USERNAME --password password --table TEST_TABLE \ --hbase-create-table --hbase-table test_case_no_bulkload \ --column-family cf --split-by ID \ --hbase-row-key ID
ROW COLUMN+CELL 1 column=cf:DATA_S, timestamp=1475219854006, value=Test 1 column=cf:ID, timestamp=1475219854006, value=1 2 column=cf:DATA_S, timestamp=1475219894990, value=Test MOre 2 column=cf:ID, timestamp=1475219894990, value=2 2 row(s) in 0.7070 secondsYou can see that DATA_B column was missing in the destination HBase table. The fix here is to use the bulkload for HBase, see the command below:
sqoop import -Dsqoop.hbase.add.row.key=true \ --connect jdbc:oracle:thin:@//See the new result::1521/orcl \ --username USERNAME --password password --table TEST_TABLE \ --hbase-create-table --hbase-table test_case_bulkload \ --column-family cf --split-by ID \ --hbase-row-key ID \ --hbase-bulkload
hbase(main):002:0> scan 'test_case_bulkload' ROW COLUMN+CELL 1 column=cf:DATA_B, timestamp=1475220177891, value=2f 72 6f 6f 74 2f 31 30 39 33 33 31 2f 6c 65 69 73 61 5f 63 68 72 69 73 74 6d 61 73 5f 66 61 6c 73 65 5f 63 6f 6c 6f 72 2e 70 6e 67 1 column=cf:DATA_S, timestamp=1475220177891, value=Test 2 column=cf:DATA_B, timestamp=1475220177891, value=41 6e 6f 74 68 65 72 20 74 65 73 74 2 column=cf:DATA_S, timestamp=1475220177891, value=Test MOre 2 row(s) in 0.0620 secondsYou can see that column DATA_B has been created in HBase. However, it comes to another problem. Do you notice the ID column in HBase is missing? We expect the ID column should be created as part of the column family because we specified “-Dsqoop.hbase.add.row.key=true” in the import command. This is caused by a known issue that is reported by my colleague, see SQOOP-2952 for details. Basically the issue is when using –hbase-bulkload, -Dsqoop.hbase.add.row.key=true will be ignored and the key will not be created as part of column family for the new HBase table. If you do not care about the row key to be in the column family, then –hbase-bulkload is the solution for you, otherwise you will have to wait for SQOOP-2952 to be resolved. Hope this helps.
Thanks for posting Eric, this is very helpful. It saves my time to do research on why the BLOB data was not imported into HBase using Sqoop.