col1 col2 col3 row21 row22 row23Put the file under HDFS using hdfs user:
su - hdfs hdfs dfs -mkdir test su - hdfs hdfs dfs -mkdir test/p=p1 su - hdfs hdfs dfs -put test.txt test/p=p1Then I ran the following Hive commands (if you did the first step, you can simply copy all commands and the paste in Hive cli in one go):
DROP TABLE IF EXISTS test_external; CREATE EXTERNAL TABLE test_external (col1 STRING) PARTITIONED BY (p string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/user/hdfs/test'; ALTER TABLE test_external ADD PARTITION (p='p1') LOCATION '/user/hdfs/test/p=p1'; ALTER TABLE test_external ADD COLUMNS (col2 STRING); SELECT * FROM test_external;And I got the following output:
col1 NULL p1 row21 NULL p1You can see that the output shows the second column “col2” are NULL. This confirms the bug. The bug was reported on 13th of Jan, 2014, but still not yet fixed. If you are interested, you can have a look at New columns after table alter result in null values despite data. There are two choices as workarounds: 1. We can DROP the partition and the re”ADD” the partition to trick hive to read it properly (because it is an EXTERNAL table):
ALTER TABLE test_external DROP PARTITION (p='p1'); ALTER TABLE test_external ADD PARTITION (p='p1') LOCATION '/user/hdfs/test/p=p1'; SELECT * FROM test_external;Now we can see Hive gives us correct output:
col1 col2 p1 row21 row22 p12. In Hive 1.1, which was shipped with CDH5.4, comes with a new feature to apply a new column to individual partitions as well as ALL partitions. This feature indirectly fixes the issue we mentioned in this post. For more information: Support partial partition spec for certain ALTER PARTITION statements Support “alter table .. add/replace columns cascade” So, in the 5th query we run in the original test:
ALTER TABLE test_external ADD COLUMNS (col2 STRING);we can simply add word “CASCADE” to the end:
ALTER TABLE test_external ADD COLUMNS (col2 STRING) CASCADE;Now run all the commands again, with modified queries (actually just added one word):
DROP TABLE IF EXISTS test_external; CREATE EXTERNAL TABLE test_external (col1 STRING) PARTITIONED BY (p string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/user/hdfs/test'; ALTER TABLE test_external ADD PARTITION (p='p1') LOCATION '/user/hdfs/test/p=p1'; ALTER TABLE test_external ADD COLUMNS (col2 STRING) CASCADE; SELECT * FROM test_external;We will get:
col1 col2 p1 row21 row22 p1Same as choice 1. Of course, this has to be run in Hive 1.1 and above, otherwise the second last command will fail.
Related articles
nice blog ERIC..!! i m newbie to hive. i hv kind of same issue. i have added a column and shows NULL value. i want to put DATE part from timestamp column to newly created column. i tried with below query:
ALTER TABLE table_2 ADD COLUMNS(DATE_COL string);
insert into table_2 (DATE_COL) as select substring(TIMESTAMP_COL, -19, 10) from table_1 ;
this is working bt still it shows NULL values in newly created date_col.
table_1 has 13 columns, table_2 has 14 columns (13 + DATE_COL).
TIMESTAMP_COL :- STRING
DATE_COL – STRING.
please tell me how to solve this problem.
use stack over flow