I am on LDR (Learning & Development Rotation) week again for my role in Cloudera, where each support engineer will get 1-2 weeks off the queue to learn whatever we want, that can help on our daily job. This week, I choose Ranger, which is a Authorisation and Auditing framework for Hadoop, as Ranger will replace Cloudera’s legacy Sentry in the new CDP release.
This post is not able Ranger, but I just want to document on one of the Kafka issue that I faced when I tried to understand how a change from Atlas will be captured by Ranger and then propagated to Ranger plugins for HDFS and Hive etc.
What happened behind the scene is that after a change is done in Atlas, an event will be produced to Kafka under topic ATLAS_ENTITIES, which will be captured by consumer, which happens to be Ranger Admin service. In order to capture this event, I used below command to see what was sent:
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server {kafka-host}:6667 --topic ATLAS_ENTITIES
Keep in mind that this is a HDP cluster, not CDH, as I also need to learn a bit of Ambari side of things, so that I can help my legacy HWX colleague with their customers. And above command produced below WARNing message continuously:
[2019-12-04 05:01:02,481] WARN [Consumer clientId=consumer-1, groupId=console-consumer-42194] Bootstrap broker c2393-node4.squadron.support.hortonworks.com:6667 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient) [2019-12-04 05:01:02,536] WARN [Consumer clientId=consumer-1, groupId=console-consumer-42194] Bootstrap broker c2393-node4.squadron.support.hortonworks.com:6667 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
I am quite new to Kafka, so instead of diving into finding what happens, my colleague helped me to add more parameter “–consumer-property security.protocol=SASL_PLAINTEXT” to resolve the issue:
/usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --bootstrap-server c2393-node4.squadron.support.hortonworks.com:6667 --topic ATLAS_ENTITIES --consumer-property security.protocol=SASL_PLAINTEXT
And then it captured below message after I updated one of the entities in Atlas:
{"version":{"version":"1.0.0","versionParts":[1]},"msgCompressionKind":"NONE","msgSplitIdx":1,"msgSplitCount":1,"msgSourceIP":"xxx.xx.xx.x","msgCreatedBy":"","msgCreationTime":1575435106685,"message":{"type":"ENTITY_NOTIFICATION_V2","entity":{"typeName":"hdfs_path","attributes":{"path":"hdfs://:8020/data","createTime":1575378000000,"qualifiedName":"hdfs:// :8020/data@c2393","name":"/data"},"guid":"62b52a50-34cb-4540-9328-7e939d60b020","status":"ACTIVE","displayText":"/data","classificationNames":["PII"],"classifications":[{"typeName":"PII","entityGuid":"62b52a50-34cb-4540-9328-7e939d60b020","entityStatus":"ACTIVE","propagate":true,"validityPeriods":[],"removePropagationsOnEntityDelete":false}]},"operationType":"CLASSIFICATION_ADD","eventTime":1575435104464}}
Now I can clearly see what data was sent from Atlas to Kafka for Ranger to pick up and update in Ranger’s database.
This blog is just to document this Kafka error and hopefully can also help others.