Introduction to Apache Ranger – Part II – Architecture Overview – Hadoop Troubleshooting Guide

In the last Episode, I have quickly introduced the main features that Ranger provides, the main differences between Ranger and Sentry that have to offer to the end users and the main reason that Cloudera has chosen Ranger as the replacement for Sentry in the latest product that is offered by Cloudera, CDP. If you have missed, please review the Introduction to Apache Ranger – Part I – Ranger vs Sentry.

In this second episode, I would like to introduce some basic architecture of Ranger, the components that combine together to form the full Ranger product.

To start with, let’s list out all the components inside Ranger:

Ranger Admin Server/Portal
Ranger Policy Server
Ranger Plugins
Ranger User/Group Sync
Ranger Tag Sync
Ranger Audit Server

And below is a nice Architecture graph that shows you the relationship between each components:

Image source: https://kymr.github.io/files/hadoop-summit/security/ranger_architecture.png

Now, let’s have a look in more detail on what each component does.

Ranger Admin Server/Portal

Central interface for security administration
Admin users can
- Define repositories
- Create and update policies
- Manage Ranger users/groups
- Define audit policies
- View audit activities
It runs embedded Tomcat server
Provides Ranger API

Ranger Policy Server

Allows admin users to define/update policy details
Allows admin users to specify which users are the delegate admins, who can have access to modify policies
Policies can be divided into different security zones
- One resource can only be assigned to one security zone
- If resource is matched, only the policies in the defined zone will be checked
- If no resource is matched, policies under the default zone (without a name) will be used
Supports both allow and deny policies
- Denials will be checked before allowances
Policies can apply at User or Group levels

Ranger User/Group Sync

Synchronisation utility to pull users and groups, it supports user/group sources from:
- Unix
- LDAP
- AD
User/Group information is stored within Ranger admin policy DB and used for policy definition

Ranger Plugins

Lightweight Java programs to be installed in Hadoop components, like HDFS or Hive
Pulls in policies regularly from Admin Server and cache locally
Acts as authorisation module and evaluate user requests against security policies
- If no policy found, will fallback to HDFS ACLs for HDFS request, access will be denied for all other components
Trigger audit data store request (to both HDFS and Solr)

Ranger Audit Server

Audits are configured via policies (user specifies if audit need to be enabled or not if this policy applies)
Audits are stored in both HDFS and Solr by default
- Data in Solr will be used to display audit data in Ranger admin UI
- Data in HDFS as a backup and won’t be used (as far as my understanding goes)
- Audits in DB is no longer supported since 0.5
Supports Audit Log Summarisation
- Since Apache Ranger 0.5
- Similar logs within defined period that only differs by timestamp will be aggregated to single audit entry, to avoid large number of audit logs
- Default to 5 seconds

Ranger Tag Sync

Since Apache Ranger 0.6
It separates resource-classification from access-authorisation
Can have one Tag policy applies to multiple components, so long as resources have the same tag attached
- Helps to reduce the amount of policies that are needed in Ranger
Requires Apache Atlas to manage metadata (Hive DBs/Tables, HDFS Path, Kafka Topics and Tags/Classification etc)
Event based
- Any changes in Hive etc will send event to Kafka topic (ATLAS_HOOK) and then Atlas will pick up the changes
- Any changes in Atlas will send event to Kafka topic (ATLAS_ENTITIES) and then Ranger Tag Sync will pick up the changes
Tag policies will be evaluated before Resource based policies

As you can see, there are a lot happening inside Ranger, and I think above overview should give you fair idea of how Ranger functions as a whole. If you have any comments, please post them below.

Stay tuned for the Part III of the series in the coming days.

2 Comments

antonio

March 9, 2020 at 5:27 PM 4 years ago


I’m looking for a concise explanation of what Ranger can do in terms of attribute-based access control so I can share it with…. people who need to know. This series is helpful, certianly. But what would be really, really useful is an explanation of how tag-based policies work, in the sense of “what are tags, what can be tagged, how do tags combine to define policies, etc..”
1. Eric Lin
  
  March 12, 2020 at 2:02 PM 4 years ago
  
  
  Hi Antonio,
  
  Thanks for checking in and add comment in my post. Yes, I have plan to write another post about Tag based policy, I will keep you posted.
  
  Cheers
  Eric

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Other posts that you might also be interested:

2 Comments

Leave a Reply Cancel reply