How to query a multiple delimited table in Hive

How to query a multiple delimited table in Hive

This article explains how to query a multi delimited Hive table in CDH5.3.x. Use case as follow: Having the following table definitions:
CREATE TABLE test_multi 
(a string, b string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe' 
WITH SERDEPROPERTIES (
    "field.delim"="#|",
    "collection.delim"=":",
    "mapkey.delim"="@"
);
Query all columns is OK:
select * from test_multi;
+---------------+---------------+
| test_multi.a  | test_multi.b  |
+---------------+---------------+
| eric          | test more     |
+---------------+---------------+
1 row selected (1.58 seconds)
However, query a single column will get the following error in HS2:
select a from test_multi;
Error: Error while processing statement: FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassNotFoundException: 
Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
        at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:334)
        at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:352)
        at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:126)
        ... 22 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
        at org.apache.hadoop.hive.ql.exec.MapOperator.getConvertedOI(MapOperator.java:304)
        ... 24 more
This happens in CDh5.3.3, which ships Hive 0.13, and I am not sure whether it also applies to CDh5.4.x and CDh5.5.x. This is caused by class org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe was not loaded to Hive’s UDF list when HiveServer2 starts up. We need to copy the JAR file that contains this class to Hive’s AUX directory. Steps as follows: 1) Locate the AUX directory for HiveServer2, if you don’t have one create one and update Hive’s configuration through Cloudera Manager. If you don’t use Cloudera Manager, simply create a directory on HiveServer2 host, in my case is /hive/jars. 2) Create a symlink to file /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar (or if not using Cloudera Manager, /usr/lib/hive/lib/hive-contrib.jar) from within /hive/jars
ln -s /opt/cloudera/parcels/CDH/lib/hive/lib/hive-contrib.jar /hive/jars/hive-contrib.jar
3) If you don’t use Cloudera Manager, add the following:

hive.aux.jars.path
/hive/jars/hive-contrib.jar

to hive-site.xml for HiveServer2. If you use Cloudera Manager, simply go to step 4. 4) Restart HiveServer2. This should be able to remove the error we saw earlier in the post and get Hive query working. Hope this helps.

Loading

5 Comments

    1. Eric Lin

      Hi Abdul,

      Thanks for visiting my site. You just need to set the config change for HiveServer2, and you don’t need to update all hosts in the cluster.

      Hope that helps.

      Thanks
      Eric

    2. Eric Lin

      Hi Abdul,

      Other hosts do have hive-site.xml file under /etc/hive/conf directory, but they would be just the gateways for Hive service, meaning that they are just Hive client. The AUX JARs are only applicable to HiveServer2 service, not the clients. So you only need to update it on the HiveServer2 host(s).

      Hope that makes sense.

      Cheers

Leave a Reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!