Copy Hadoop Data From One HDFS to Another

Copy Hadoop Data From One HDFS to Another

If you have two HDFS cluster operating on two different places (production vs alpha for example), sometimes you might want to copy some data from one cluster to another. To do it is easy using Hadoop’s internal “distcp” command:
hadoop distcp hdfs://hadoop-namenode/data/2013/01 hdfs:///data/2013/
We have the following directory structure in the source:
/data/2013/01/01
/data/2013/01/02
/data/2013/01/03
/data/2013/01/04
...
...
...
The end result from the above command will simply copy the directory “01” and its sub-directories to the destination HDFS, so that we will end up with the same directory structure. This is a very handy and useful tool to copy data in HDFS.

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!