Tag: <span>HDFS</span>


Kafka Consumer Command Failed With Error: disconnected (org.apache.kafka.clients.NetworkClient)

I am on LDR (Learning & Development Rotation) week again for my role in Cloudera, where each support engineer will get 1-2 weeks off the queue to learn whatever we want, that can help on our daily job. This week, I choose Ranger, which is a Authorisation and Auditing framework …


Big Compressed File Will Affect Query Performance for Impala

As we know, Hadoop/HDFS/MapReduce/Impala is designed to store and process large amount of data, in terms of TBs or PBs. And we also know that having too many small files will hurt query performance, because NameNode needs to store millions of metadata to hold the information about files being stored …


Access to WebHCat with error “User: HTTP/full-domain@REALM is not allowed to impersonate username”

Last week I was dealing with an issue that when connecting to WebHCat using the following command: user got the following error: After doing some research, it turned out to be caused by the auth_to_local rules user defined in the cluster, see below config in the core-site.xml for HDFS: In …


How to use “filters” to exclude files when in DistCp

This article explains how to use the new feature supported in Apache Hadoop 2.6.0 to filter out the files that don’t need to be DistCp-ed. Hadoop 2.8.0 added support to filter out certain files that match certain regular expressions, so that they won’t be copied to destination when DistCp command …


My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!