Introduction to Apache Ranger – Part I – Ranger vs Sentry – Hadoop Troubleshooting Guide

As I mentioned before in my other posts, Cloudera, as an employer, allows us to do a couple of self-learning weeks during a calendar year, at least for all of us in the Support Organisation. We can choose whatever topics that we would like to learn, the only thing that we need to make sure is whatever we learnt can help with our day to day work. Last time I chose Spark and did an internal presentation to our wider team regarding how to develop Spark applications within Jetbrains IDE, without the need of a working Hadoop cluster, just on a laptop. This time, I chose Ranger, which has been decided as a replacement for Sentry, after Cloudera and Hortonworks’ merger.

Before I started learning, I tried to gather a few resources for myself, however, I noticed there were not a lot actually. We have O’Reilly subscription, but I can’t find a book or video course about Ranger at all. There are a few posts that can be found from Google, but they only cover very high level information and finished in one post. So I have decided, I will write a few series about Ranger posts after I finished my own training to share with the rest of you guys.

So, let’s get started. Firstly, I would like to do a high level comparison between Ranger and Sentry, to understand why Sentry is now deprecated and will be replaced by Ranger. I will assume that you have basic knowledge of Hadoop, CDH or HDP ecosystem to continue this article.

Let’s have a look at what Sentry has to provide. According to Sentry’s official documentation, Apache Sentry is a granular, role-based authorisation module for Hadoop. It provides the ability to control and enforce precise levels of privileges on data for authenticated users and applications that run on Hadoop cluster, particularly CDH. Currently Sentry is well integrated with Apache Hive, Apache Solr, Apache Kafka, Apache Impala and HDFS (limited to data that are linked by Hive tables via Sentry HDFS sync).

Sentry is role-based, meaning, you will need to create Roles in Sentry, which will need to be mapped to Groups, either at OS level, or AD, which will then be mapped to end users who intended to access Hadoop. You can use Sentry to limit user’s access to DB, TABLE, COLUMN or URI, and this is done via Sentry commands, which are to be run from Impala or Hive interface, more details about those commands can be found in Cloudera’s Sentry Documentation.

Now, let’s have a look at what Ranger has to offer. Again, according to official Apache Ranger documentation, Apache Ranger is a framework to enable, monitor and manage comprehensive data security across the Hadoop Platform. Apache Ranger has the following goals:

Centralised security administration to manage all security related tasks in a central UI or using REST APIs.
Fine grained authorisation to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool
Standardise authorisation method across all Hadoop components.
Enhanced support for different authorisation methods – Role based access control, attribute based access control etc.
Centralise auditing of user access and administrative actions (security related) within all the components of Hadoop.

As you can see, on top of the security authorisation, Apache Ranger also supports user friendly web UI, REST APIs and Auditing etc, which are missing from Sentry. So to summarise, I will outline the main differences between the two Apache projects to understand why Ranger is the choice to go in future of CDH, which is CDP:

As you can see, Apache Ranger supports more features and integrated with more other Hadoop components. Even though Ranger currently does not support Impala, work is in progress and it will be available in the future release of Cloudera CDP product.

That’s all for the first episode, I will discuss more Ranger features in more detail in the future episodes. If you have any comments, please feel free to add below.

Please note that the table comparison above was based on information prior to Cloudera’s CDP release. So things will have changed by the time you read this blog post. The Ranger support for Impala and HDFS sync is being worked and will be available in future release of CDP 7 and plus.

Happy Hadooping.

6 Comments

Ansha

June 18, 2020 at 3:57 AM 4 years ago


Hi Sir
Impala has added some things about Apache ranger in Impala 3.4 but you have stated in your article that Impala is not supported (in the comparison table). Can you check if it doesn’t bother?
1. Eric Lin
  
  June 18, 2020 at 2:00 PM 4 years ago
  
  
  Thanks Ansha for your comment. Yes indeed, Ranger now supports Impala in CDP. However, my blog post was written before the release of CDP.
  
  I have updated the post to include a small note that changes might have happened already. The table I had for comparison was to illustrate the features of each component had before the CDP to show why Ranger was chosen as the replacement for Sentry.
  
  Thanks again!
Karim Farhane

March 6, 2021 at 5:51 AM 3 years ago


Hello Eric,
Nice article.
FYI starting from CDP 7.1.5 Ranger offers HDFS Sync using RMS plugin.
1. Eric Lin
  
  March 9, 2021 at 9:13 PM 3 years ago
  
  
  Hi Karim,
  
  Thanks for visiting and sharing that CDP 7.1.5 has HDFS SYNC from Ranger. Good to know that new features are added to latest Ranger!
  
  Cheers
eb

April 2, 2021 at 12:19 PM 3 years ago


Strangely, Ranger roles are acting weird in CDP 7.1.5. Database table access policy given to a role works for impala but not for hive sql query.
1. Eric Lin
  
  April 12, 2021 at 9:06 AM 3 years ago
  
  
  Hi EB,
  
  Apologies, I am not at Cloudera anymore, so I do not have access to CDP and Ranger, and unable to test and confirm the issue. You might want to reach out to Cloudera folks.
  
  Cheers
  Eric

	Apache Sentry	Apache Ranger	Comment
Authorisation
Denial support
Web UI
Command Line
HDFS Sync
Rest API
Audit
Impala			For now, WIP for Ranger + Impala
Hive
HDFS			Sentry supports via Sync
Solr
Kafka
HBase
Knox
YARN
Storm
Support Tag Based			More details to come
Row Level Filtering			More details to come
Column Masking			More details to come

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Other posts that you might also be interested:

6 Comments

Leave a Reply to Karim Farhane Cancel reply