How to create a Hive multi-character delimitered table

How to create a Hive multi-character delimitered table

We have the following data:
ROW1~|`900~|`3103~|`2015~|`AA~|`1002003829
ROW2~|`900~|`3103~|`2015~|`BB~|`1002005103
ROW3~|`900~|`3103~|`2015~|`CC~|`1002003829
ROW4~|`900~|`3103~|`2015~|`DD~|`1002003829
ROW5~|`900~|`3103~|`2015~|`FF~|`1002003829
How to load the data into a Hive table with delimiter “~|`”? Well, it is pretty straightforward, just use the “MultiDelimitSerDe” which is available since CDH5.1.4, example as folllows:
CREATE TABLE test_multi 
(a string, b string, c string, d string, e string, f string) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES (
    "field.delim"="~|`",
    "collection.delim"=":",
    "mapkey.delim"="@"
);
After loading the data, the table will look something like this:
hive> LOAD DATA LOCAL INPATH '/tmp/data-multi.txt' OVERWRITE INTO TABLE test_multi;

hive> SELECT * FROM test_multi;
+----------------+---------------+---------------+---------------+---------------+---------------+--+
| test_multi.a   | test_multi.b  | test_multi.c  | test_multi.d  | test_multi.e  | test_multi.f  |
+----------------+---------------+---------------+---------------+---------------+---------------+--+
| ROW1           | 900           | 3103          | 2015          | AA            | 1002003829    |
| ROW2           | 900           | 3103          | 2015          | BB            | 1002005103    |
| ROW3           | 900           | 3103          | 2015          | CC            | 1002003829    |
| ROW4           | 900           | 3103          | 2015          | DD            | 1002003829    |
| ROW5           | 900           | 3103          | 2015          | FF            | 1002003829    |
+----------------+---------------+---------------+---------------+---------------+---------------+--+
This is the preferred way of loading multi-character delimited data into Hive over the use of “org.apache.hadoop.hive.serde2.RegexSerDe”, as it is simpler and faster.

Loading

5 Comments

      1. Eric Lin

        Hi Suresh,

        In this case you will need to quote the strings, so that they are in the proper CSV file format, like below:

        column1,column2
        “1,2,3,4”,”5,6,7,8″

        And then you can use OpenCSVSerde for your table like below:

        CREATE EXTERNAL TABLE test (a string, b string, c string)
        ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.OpenCSVSerde’
        LOCATION ‘/user/hive/warehouse/test.db/test’;

        The latest Hive supports OpenCSVSerde natively, so you don’t need to add any custom jars.

        Hope this helps.

    1. Eric Lin

      Hi Sujoy,

      Sorry for the very late reply. My email server did not send me emails before, so I missed your update. This has now been fixed, so I should get all update emails from my blog.

      Before I answer your question, I would like to check what the issue you are facing? And what version of Hive are you using?

Leave a Reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!