Hive Export/Import Command – Transfering Data Between Hive Instances

Hive Export/Import Command – Transfering Data Between Hive Instances

When working with Hive, there are lots of scenarios that we need to move data from one cluster to another. For instance, sometimes we need to copy some production data into alpha/dev environment for testing purpose. Luckily that Hive provides two easy commands for us to do it. Since version 0.8, Hive supports EXPORT and IMPORT features that allows you to export the metadata as well as the data for the corresponding table to a directory in HDFS, which can then be imported back to another database or Hive instance. The command looks like this:
EXPORT TABLE <table_name> TO 'path/to/hdfs';
Please note that you don’t need the “/” in front, otherwise Hive will complain about invalid path or target is not an empty directory. And the final data will be written to /user/<user_name>/path/to/hdfs directory in HDFS (of course, it will need to be writable by the current user). The next step is to copy the data across to another Hive instance, you can use the “distcp” command from Hadoop:
hadoop distcp hdfs://:8020/path/to/hdfs hdfs:///path/to/hdfs
Once the copy is finished, you can then use the IMPORT command on the new cluster to load the data into a new table:
IMPORT TABLE <table_name> FROM 'path/to/another/hdfs';
Now you will have the exact same data in the new cluster as the old one. The transfer should be smooth, the only thing is that if the data is big, it might take a while for “distcp” to copy the data across. For more information, you can have a look at the Apache Hive’s Confluence Page

3 Comments

    1. Eric Lin

      Hi Hari,

      Sorry about the late response, I missed your update in my email.

      View is just a Hive query stored against a name, you just need to re-run the create view statement in another cluster. Hope that makes sense.

      Thanks again for your visit on my site.

      Cheers

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!