Sqoop Action with –query fails on oozie using  tag

Sqoop Action with –query fails on oozie using tag

Yesterday I have discovered an Oozie bug that it does not handle the –query parameter for sqoop action. See my example sqoop action XML below:

    
        ${jobTracker}
        ${nameNode}
        
              
        
        import  --connect jdbc:mysql://node6.lab.cloudera.com/test --username root --password cloudera 
--target-dir hdfs://node5.lab.cloudera.com:8020/user/eric/sqoop-import --query "SELECT * FROM test WHERE \$CONDITIONS" -m 1
        /user/eric/mysql-connector-java-5.1.37-bin.jar#mysql-connector-java-5.1.37-bin.jar
    
    
    

By looking at the error log from Oozie Launcher, I found the following:
Sqoop command arguments :
             import
             --connect
             jdbc:mysql://node6.lab.cloudera.com/test
             --username
             root
             --password
             cloudera
             --target-dir
             hdfs://node5.lab.cloudera.com:8020/user/eric/sqoop-import
             --query
             "SELECT
             *
             FROM
             test
             WHERE
             \$CONDITIONS"
             -m
             1

WARN  org.apache.sqoop.tool.SqoopTool  - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
INFO  org.apache.sqoop.Sqoop  - Running Sqoop version: 1.4.5-cdh5.4.7
WARN  org.apache.sqoop.tool.BaseSqoopTool  - Setting your password on the command-line is insecure. Consider using -P instead.
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Error parsing arguments for import:
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: *
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: FROM
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: test
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: WHERE
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: \$CONDITIONS"
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: -m
ERROR org.apache.sqoop.tool.BaseSqoopTool  - Unrecognized argument: 1
Intercepting System.exit(1)
From the error message, it looks like that sqoop was trying to split the whole “SELECT” statement into tokens, just like the rest of parameters, even though we have double quotes around it. Looking at the Oozie code, I found the following:
String[] args;
if (actionXml.getChild("command", ns) != null) {
    String command = actionXml.getChild("command", ns).getTextTrim();
    StringTokenizer st = new StringTokenizer(command, " ");
    List l = new ArrayList();
    while (st.hasMoreTokens()) {
        l.add(st.nextToken());
    }
    args = l.toArray(new String[l.size()]);
}
else {
    List eArgs = (List) actionXml.getChildren("arg", ns);
    args = new String[eArgs.size()];
    for (int i = 0; i < eArgs.size(); i++) {
        args[i] = eArgs.get(i).getTextTrim();
    }
}
Apparently this is a bug. Before the bug is fixed, I suggest the following workaround:

    
        ${jobTracker}
        ${nameNode}
        
              
        
          import
          --connect
          jdbc:mysql://node6.lab.cloudera.com/test
          --username
          root
          --password
          cloudera
          --target-dir
          hdfs://node5.lab.cloudera.com:8020/user/eric/sqoop-import
          --query
          SELECT * FROM test WHERE $CONDITIONS
          -m
          1
    
    
    

Basically replace the <command> with <arg> tag so that the whole query string is considered as one argument, rather than multiple arguments. Hope this helps.

Leave a Reply

Your email address will not be published.

My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!