If you have two HDFS cluster operating on two different places (production vs alpha for example), sometimes you might want to copy some data from one cluster to another. To do it is easy using Hadoop’s internal “distcp” command:
hadoop distcp hdfs://hadoop-namenode/data/2013/01 hdfs:///data/2013/
We have the following directory structure in the source:
/data/2013/01/01
/data/2013/01/02
/data/2013/01/03
/data/2013/01/04
...
...
...
The end result from the above command will simply copy the directory “01” and its sub-directories to the destination HDFS, so that we will end up with the same directory structure. This is a very handy and useful tool to copy data in HDFS.

Leave a Reply

Your email address will not be published. Required fields are marked *