Ranger Audit Data Retention Policy in Infra-Solr

Ranger Audit Data Retention Policy in Infra-Solr

In this post, I will explain how the retention policy works in Infra-Solr for Ranger’s Audit data, and how to confirm that it works.

The retention policy for Ranger’s Audit data in Infra-Solr is controlled by the below setting in Ambari > Ranger > CONFIGS > ADVANCED > “Advanced ranger-solr-configuration” > “Max Retention Days”. Please see screenshot below:

This setting will force Ranger to set the “_ttl_” (Time To Live) attribute when creating new Solr documents for Ranger’s audit data. However, it won’t affect all the existing Solr documents that were created beforehand. The default value for this setting is 90 days, which is long enough for most cases to check audit history.

If you want to shorten this period, for example, due to disk space limitation, simply update this setting to a value suitable for your case and then restart both Ranger and Infra-Solr via Ambari interface. In my case I changed to 4 days.

To confirm the setting is applied successfully and new audit data is created using the new “_ttl_” value. Simply go to Solr’s interface and perform below checks:

Firstly, go to Solr’s web UI. If you are unable to visit Solr’s web UI due to Kerberos authentication, please check Cloudera’s official documentation on How to Configure Browsers for Kerberos Authentication. I recommend you use Firefix as it is much simpler to set it up. And then Cloud > Tree > / > configs > ranger_audts > solrconfig.xml, and search for “_ttl_”, confirm that the value has been updated:

Once confirmed, also make sure that new documents created have the value of 4 days. Click on the “Collection Selector” > “ranger_audits” > Query, under “sort”, enter “evtTime desc”, under “fl”, enter “_ttl_”, then “Execute Query”, you will get something like this below, where I sort the documents by event creation time and only return _ttl_ field:

You can also check when were last documents with 90 days retention created, by adding _ttl_:”+90DYAS” to “fq” and “evtTime” to “fl”. In my case, I changed the setting on 30th of April, 2020, so the last documents that were created using this setting can be found, they won’t expire any time soon.

All the new documents that expire after defined _ttl_ time period (defined by expire_at will be deleted automatically, please see Solr TTL – Auto-Purging Solr Documents & Ranger Audits for details. However, this won’t happen for older indexing data that were created before the change. To manually delete them, please run below command (if kerberized cluster, make sure you have kinit using correct credentials):

curl -v --negotiate -u : "http://{solr-url}:8886/solr/ranger_audits/update?commit=true" \
-H "Content-Type: text/xml" \
--data-binary "<delete><query>evtTime:[* TO 2020-05-01T00:00:00.000Z]</query></delete>"

In my example above, all data prior to 1st of May, 2020 will be deleted. You can confirm by querying in Solr UI using the same query, no data should return if the deletion was successful.


Leave a Reply

Your email address will not be published. Required fields are marked *

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!