Hive supports MAP data type, to defined it in a table, use the following code:

CREATE TABLE IF NOT EXISTS test_table (
map_column MAP<int, int>
);

This works fine if you only need to insert data into this table using hive UDFs to convert data into MAP, but it will be tricky if you want to load the data from a plain text file. In order to do it, we need to set it up in a special way:

CREATE TABLE IF NOT EXISTS test_table (
map_column MAP<int, int>
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
COLLECTION ITEMS TERMINATED BY ','
MAP KEYS TERMINATED BY ‘:’
LINES TERMINATED BY ‘\n’;

This tells Hive that we want it to separate each collections by ‘,’ and separate key and value pairs by ‘:’. So the following data: [generic] 1:2,3:4,5:6,7:8 [/generic] Will appear in the following format after loaded using command:

LOAD DATA LOCAL INPATH ‘/home/test_table’ INTO TABLE test_table;

[generic] {1:2,3:4,5:6,7:8} [/generic] And you can do the query like this:

SELECT * FROM test_table WHERE map_column[1] = 2;

Map data type is very useful in some situations and it can greatly reduce the amount of rows that are required to store the data, and yet still allow us to query the data quickly and efficiently

Leave a Reply

Your email address will not be published. Required fields are marked *