ROW1~|`900~|`3103~|`2015~|`AA~|`1002003829 ROW2~|`900~|`3103~|`2015~|`BB~|`1002005103 ROW3~|`900~|`3103~|`2015~|`CC~|`1002003829 ROW4~|`900~|`3103~|`2015~|`DD~|`1002003829 ROW5~|`900~|`3103~|`2015~|`FF~|`1002003829How to load the data into a Hive table with delimiter “~|`”? Well, it is pretty straightforward, just use the “MultiDelimitSerDe” which is available since CDH5.1.4, example as folllows:
CREATE TABLE test_multi (a string, b string, c string, d string, e string, f string) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' WITH SERDEPROPERTIES ( "field.delim"="~|`", "collection.delim"=":", "mapkey.delim"="@" );After loading the data, the table will look something like this:
hive> LOAD DATA LOCAL INPATH '/tmp/data-multi.txt' OVERWRITE INTO TABLE test_multi; hive> SELECT * FROM test_multi; +----------------+---------------+---------------+---------------+---------------+---------------+--+ | test_multi.a | test_multi.b | test_multi.c | test_multi.d | test_multi.e | test_multi.f | +----------------+---------------+---------------+---------------+---------------+---------------+--+ | ROW1 | 900 | 3103 | 2015 | AA | 1002003829 | | ROW2 | 900 | 3103 | 2015 | BB | 1002005103 | | ROW3 | 900 | 3103 | 2015 | CC | 1002003829 | | ROW4 | 900 | 3103 | 2015 | DD | 1002003829 | | ROW5 | 900 | 3103 | 2015 | FF | 1002003829 | +----------------+---------------+---------------+---------------+---------------+---------------+--+This is the preferred way of loading multi-character delimited data into Hive over the use of “org.apache.hadoop.hive.serde2.RegexSerDe”, as it is simpler and faster.
For some reason I am not able to add hive-contrib jars to my hive jars .
Will there be any chance to define a simple multi delimiter for the RegexSerde ?
i have some issue
have some data in csv file
that data like
column1 , column2
1,2,3,4 , 5,6,7,8
how can i load data into hive pleace help me
thank you
Hi Suresh,
In this case you will need to quote the strings, so that they are in the proper CSV file format, like below:
column1,column2
“1,2,3,4”,”5,6,7,8″
And then you can use OpenCSVSerde for your table like below:
CREATE EXTERNAL TABLE test (a string, b string, c string)
ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.OpenCSVSerde’
LOCATION ‘/user/hive/warehouse/test.db/test’;
The latest Hive supports OpenCSVSerde natively, so you don’t need to add any custom jars.
Hope this helps.
Hi Sujoy,
Sorry for the very late reply. My email server did not send me emails before, so I missed your update. This has now been fixed, so I should get all update emails from my blog.
Before I answer your question, I would like to check what the issue you are facing? And what version of Hive are you using?
Try this :https://github.com/sanjivsingh/FixedLengthAndDelimitedSerde
– Supported both fixed length and delimited data serialization/deserialization
– It support both case in single definition.