message example { required binary file_name (UTF8); required binary date_time (UTF8); required binary tail (UTF8); required binary event (UTF8); required binary value (UTF8); optional int32 record_number; required binary src (UTF8); } message schema { optional binary file_name; optional binary date_time; optional binary tail; optional binary event; optional binary value; optional int32 record_number; optional binary src; }If you run it in Pig, the following error will be returned:
[main] ERROR org.apache.pig.PigServer - exception during parsing: Error during parsing. repetition constraint is more restrictive: can not merge type required binary file_name (UTF8) into optional binary file_name Failed to parse: repetition constraint is more restrictive: can not merge type required binary file_name (UTF8) into optional binary file_name at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:198) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1676) at org.apache.pig.PigServer$Graph.access$000(PigServer.java:1409) at org.apache.pig.PigServer.parseAndBuild(PigServer.java:342) at org.apache.pig.PigServer.executeBatch(PigServer.java:367) at org.apache.pig.PigServer.executeBatch(PigServer.java:353) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:140) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:202) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:173) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:478) at org.apache.pig.PigRunner.run(PigRunner.java:49) at org.apache.oozie.action.hadoop.PigMain.runPigJob(PigMain.java:286) at org.apache.oozie.action.hadoop.PigMain.run(PigMain.java:226) at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39) at org.apache.oozie.action.hadoop.PigMain.main(PigMain.java:74) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:227) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)At the time of writing, this issue affects both CDH5.3.x and CDH5.4.x, and it is reported in PARQUET-138, but still not fixed. I have also found another issue PARQUET-139, which is fixed from CDH5.4.0 onwards, provides us a workaround to fix the problem we have here. To fix this issue, we need to upgrade CDH to 5.4.x, and then update the Pig script from:
data = LOAD '$path_to_source' USING parquet.pig.ParquetLoader as( file_name:bytearray, date_time:bytearray, tail:bytearray, event:bytearray, value:bytearray, record_number:int, src:bytearray );to:
data = LOAD '$path_to_source' USING parquet.pig.ParquetLoader( 'file_name:bytearray,date_time:bytearray,tail:bytearray,event:bytearray,value:bytearray,record_number:int,src:bytearray' );So instead of passing each column definitions as one parameter to AS(), we pass all columns as a single string to ParquetLoader’s constructor. After this change, the problem should be fixed. Hope this helps.