Enable Snappy Compression For Flume

Enable Snappy Compression For Flume

Snappy is a compression/decompression library developed by Google. It aims for very high speeds and reasonable compression ( might be bigger than other standard compression algorithms but faster speed ). Snappy is shipped with Hadoop, unlike LZO compression which is excluded due to licensing issues. To enable Snappy in your Flume installation, following the steps below: Install on Red Hat systems:
$ sudo yum install hadoop-0.20-native
Install on Ubuntu systems:
$ sudo apt-get install hadoop-0.20-native
This should create a directory under /usr/lib/hadoop/lib/native/ which contains some native hadoop libraries. Then create environment config for Flume:
$ cp /usr/lib/flume/bin/flume-env.sh.template /usr/lib/flume/bin/flume-env.sh
And update the last line in the file to be: For 32-bit platform
$ export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-i386-32
For 64-bit platform
$ export JAVA_LIBRARY_PATH=/usr/lib/hadoop/lib/native/Linux-amd64-64
Next update the flume’s configuration file under “/etc/flume/conf/flume-site.xml” on the collector node to:
  
    flume.collector.dfs.compress.codec
    SnappyCodec
    Writes formatted data compressed in specified codec to
    dfs. Value is None, GzipCodec, DefaultCodec (deflate), BZip2Codec, SnappyCodec
    or any other Codec Hadoop is aware of 
  
And then finally restart the flume-node:
$ /etc/init.d/flume-node restart
You next file update in HDFS will look something like the following:
-rw-r--r--   3 flume supergroup          0 2011-10-21 14:01 /data/traffic/Y2011_M9_W37_D254/R0_P0/C1_20111021-140124175+1100.955183363700204.00000244.snappy.tmp
-rw-r--r--   3 flume supergroup   35156526 2011-10-20 16:51 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-164928958+1100.780424004236302.00000018.snappy
-rw-r--r--   3 flume supergroup     830565 2011-10-20 17:15 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-171423368+1100.781918413572302.00000018.snappy
-rw-r--r--   3 flume supergroup          0 2011-10-20 17:19 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-171853599+1100.782188644505302.00000042.snappy.tmp
-rw-r--r--   3 flume supergroup    1261171 2011-10-20 17:37 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-173728225+1100.783303271088302.00000018.snappy
-rw-r--r--   3 flume supergroup    2128701 2011-10-20 17:40 /data/traffic/Y2011_M9_W37_D254/R0_P0/C2_20111020-174024045+1100.783479090669302.00000046.snappy
Happy Fluming..

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!