Securely Managing Passwords In Sqoop

Securely Managing Passwords In Sqoop

Apache Sqoop became the Top-Level Project in Apache in March 2012. Since then, Sqoop has developed a lot and become very popular amongst Hadoop ecosystem. In this post, I will cover the ways to specify database passwords to Sqoop in a secure way. The following ways are common to pass database passwords to Sqoop:
sqoop import --connect jdbc:mysql://myexample.com/test \
             --username myuser -P \
             --table mytable
sqoop import --connect jdbc:mysql://myexample.com/test 
             --username myuser \
             --password mypassword \
             --table mytable
The first one is secure as other people can’t see the password, however, it is only practical to use in the command line. And we all agree that the second one is insecure as everyone can see what the password is to access the database. The more secure way of passing the password is through the use of so called password file. The command as follows:
echo -n "password" > /home/ericlin/.mysql.password
chmod 400 /home/ericlin/.mysql.password
sqoop import --connect jdbc:mysql://myexample.com/test \
             --username myuser \
             --password-file /home/ericlin/.mysql.password \
             --table mytable
Please note that we need “-n” option for the “echo” command so that no newline will be added to the end of the password. And, please do not use “vim” to create the file as “vim” will add newline automatically to the end of the file, which will cause Sqoop to fail as the password contains a newline character. However, storing password in a text file is still considered not secure even though we have set the permissions. As of Sqoop 1.4.5, Sqoop supports the use of JAVA Key Store to store passwords, so that you do not need to store passwords in clear text in a file. To generate the key:
[ericlin@localhost ~] $ hadoop credential create mydb.password.alias -provider jceks://hdfs/user/ericlin/mysql.password.jceks
Enter password: 
Enter password again: 
mysql.password has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.
On prompt, enter the password that will be used to access the database. The “mydb.password.alias” is the alias that we can use to pass to Sqoop when running the command, so that no password is needed. Then you can run the following Sqoop command:
sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/ericlin/mysql.password.jceks \
             -–connect ‘jdbc:mysql://myexample.com/test’ \
             -–table mytable \
             -–username myuser \
             -–password-alias mydb.password.alias
This way password is hidden inside jceks://hdfs/user/ericlin/mysql.password.jceks and no one is able to see it. Hope this helps.

    Loading

    20 Comments

    1. disqus_gU0MTIvS0B

      Thank you for this, Eric! Very helpful. With the hdfs method of storing password, it would only work if you have a persisted hdfs, correct? Do you have any suggestions for EMR as well? Thanks.

    2. I followed same steps, but i got this error ” INFO [main] sqoop.Sqoop: Running Sqoop version: 1.4.6
      CredentialProvider facility not available in the hadoop environment used

      I run his command
      sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/hadoop/mysql.password.jceks –connect ‘jdbc:mysql://localhost/practice’ –table authors –username root –password-alias mydb.password.alias
      Whats wrong? Please reply.

    3. shravan

      Hi,
      I run the same command , but it is giving unrecognised argument at –password-alias like below

      14/04/05 13:57:53 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
      16/07/13 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: -Dhadoop.security.credential.provider.path=jceks://hdfs/user/vijay/myPassword.jceks
      16/07/13 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: –password-alias
      16/07/13 13:57:53 ERROR tool.BaseSqoopTool: Unrecognized argument: db2-dev-password

      plz help me….

    4. Rui

      Hi Eric,

      Is there any protection to the credential provider? My point is if everyone can access the provider and retrieve the password, that’ll be essentially the same as everyone being able to see the password in plain text. Right?

      1. Eric Lin

        Hi Rui,

        Sorry about the late reply. I am not sure any good ways, but I would think setting the file to be 600 will prevent others from accessing it. The password is already encrypted and it is very hard to steal password already.

        1. Rui

          Thanks Eric for the reply. So we need to rely on file permission to protect the key store, which is the same as the (plain text) password file method. But the difference is, unlike password file, the password is encrypted in key store file. Is this correct?

    5. Oleg V

      Hey Eric,

      Nice and clean explanation, thanks a lot for sharing mate, this info was quite useful for me.

      Keep up the good work and update your blog more often 🙂

      Cheers!

    Leave a Reply to Boris Tyukin Cancel reply

    Your email address will not be published. Required fields are marked *

    My new Snowflake Blog is now live. I will not be updating this blog anymore but will continue with new contents in the Snowflake world!