Dynamic Variables for Hive’s VIEW

Dynamic Variables for Hive’s VIEW

I have seen in a lot of use cases that Hive users need to pass along a variable into Hive VIEW, so that their queries can be run using dynamic conditions. However, this is not currently supported by Hive due to the fact that VIEW’s query string is evaluated during creation time, not run time. To demonstrate, please see below example:

Firstly I prepared a partitioned table with two partition data:

CREATE TABLE test (col int) partitioned by (part int);
INSERT INTO test PARTITION (part = 1) VALUES (1);
INSERT INTO test PARTITION (part = 2) VALUES (2);

SELECT * FROM test;
+-----------+------------+--+
| test.col  | test.part  |
+-----------+------------+--+
| 1         | 1          |
| 2         | 2          |
+-----------+------------+--+

It makes sense to create a view looks like below:

SET hiveconf:part = 1;
CREATE VIEW test_view AS SELECT * FROM test WHERE part = '${hiveconf:part}';

This VIEW creation will be successful, however, to check the definition of the VIEW, you can see that the variable has been substituted already:

SHOW CREATE TABLE test_view;
+----------------------------------------------------+--+
|                   createtab_stmt                   |
+----------------------------------------------------+--+
| CREATE VIEW `test_view` AS SELECT `test`.`col`, `test`.`part` FROM `default`.`test` WHERE `test`.`part` = '1' |
+----------------------------------------------------+--+

If you did not set the variable before the CREATE VIEW statement, it won’t work either:

CREATE VIEW test_view AS SELECT * FROM test WHERE part = '${hiveconf:part}';
SHOW CREATE TABLE test_view;
+----------------------------------------------------+--+
|                   createtab_stmt                   |
+----------------------------------------------------+--+
| CREATE VIEW `test_view` AS SELECT `test`.`col`, `test`.`part` FROM `default`.`test` WHERE `test`.`part` = '${hiveconf:part}' |
+----------------------------------------------------+--+

Set the variable now at run time then the VIEW won’t have any effect:

SET hiveconf:part = 1;
SELECT * FROM test_view;
+----------------+-----------------+--+
| test_view.col  | test_view.part  |
+----------------+-----------------+--+
+----------------+-----------------+--+

SET hiveconf:part = 2;
SELECT * FROM test_view;
+----------------+-----------------+--+
| test_view.col  | test_view.part  |
+----------------+-----------------+--+
+----------------+-----------------+--+

So, above example confirms that using variables in Hive’s VIEW won’t work. After performing some research, it turned out that we can make use of MACRO to achieve this. If you are not familiar with Macro, let me briefly introduce it here.

Macro is basically a temporary function to store a set of SQL statements. It only exists during the duration of a session, so it is only temporary, never permanent. If you want it to be permanent, then you better to create a UDF instead. Also, since Macro is temporary, it can’t be shared between different sessions. Apache’s official documentation has some simple examples on how Macros can be used:
Create/Drop Macro

Now, how can we use Macro together with VIEW to achieve what we want? Example worth a thousand words, so let me show you with below steps using the same “test” table that I created earlier:

Firstly we need to create a Macro:

CREATE TEMPORARY MACRO part_value() 1;

I basically created a static Macro that always returns value of 1.

Secondly, create a VIEW using the Macro:

CREATE VIEW test_view AS SELECT * FROM test WHERE part = part_value();

Confirm the VIEW is created without issues:

SHOW CREATE TABLE test_view;
+----------------------------------------------------+--+
|                   createtab_stmt                   |
+----------------------------------------------------+--+
| CREATE VIEW `test_view` AS SELECT `test`.`col`, `test`.`part` FROM `default`.`test` WHERE `test`.`part` = `part_value`() |
+----------------------------------------------------+--+

Now simply run the query against the VIEW, since I have set the initial value for Macro part_value() to be 1, so I should get the partition with value part=1:

SELECT * FROM test_view;
+----------------+-----------------+--+
| test_view.col  | test_view.part  |
+----------------+-----------------+--+
| 1              | 1               |
+----------------+-----------------+--+

So how can we change the value? Well, we have to DROP and RECREATE the Macro, perform below:

DROP TEMPORARY MACRO part_value;
CREATE TEMPORARY MACRO part_value() 2;

Again run the query against the VIEW:

SELECT * FROM test_view;
+----------------+-----------------+--+
| test_view.col  | test_view.part  |
+----------------+-----------------+--+
| 2              | 2               |
+----------------+-----------------+--+

You can see that the output got updated based on the value of our Macro.

Bear in mind that Macro is only available in the current session, as mentioned before. So if you exit your current session and run again the query against test_view, the query will fail:

SELECT * FROM test_view;
Error: Error while compiling statement: FAILED: SemanticException Line 1:79 Invalid function 'part_value' in definition of VIEW test_view [
SELECT `test`.`col`, `test`.`part` FROM `default`.`test` WHERE `test`.`part` = `part_value`()
] used as test_view at Line 1:14 (state=42000,code=40000)

Simply create the Macro again and it will be fixed:

CREATE TEMPORARY MACRO part_value() 1;
SELECT * FROM test_view;
+----------------+-----------------+--+
| test_view.col  | test_view.part  |
+----------------+-----------------+--+
| 1              | 1               |
+----------------+-----------------+--+

So, there are two limitations to this solution:

  • MACRO needs to be redefined for each session, as it is temporary.
  • MACRO needs to be redefined if you change value in same session.

Just need to be aware of that and I think it is still a very useful workaround.

Hope this post could be useful to anyone who might looking for the solution to Hive VIEW’s limitation.

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!