Introduction – The problem

The problem is the following: You run some service where you need to have regular backups. The data to be backed up can be represented as something you can put in a tarred archive. The common way to do this is to pipe the tar output to OpenSSL and encrypt with AES-256, which is fine, except that it requires inputting the password constantly or putting a static password on the server. If you’ve guessed that we’re going to use public key cryptography, you’re right, but there is a problem there.

The typical way of doing this with passwords

If you research this topic online, you’ll find the internet flooded with solutions that look like this:

$ tar -czpf - /path/to/directory-to-backup | openssl aes-256-cbc -salt -pbkdf2 -iter 100000 -pass | dd of=output.aes

There are a few problems with this… let me state them and let’s see whether we can do better:

You have to input the password manually for this to work, so this cannot go into a cron/periodic job unless you expose the password
Since you have to put the password manually, it has to be some text, which lacks entropy, given that pbkdf2 (as far as I know) uses SHA256; where if you know anything about bitcoin mining, you’ll know that SHA256 calculations have become very cheap… meaning that your password that you write with your hand, most likely, lacks entropy for security
It uses AES-256-CBC, which lacks authentication… but let’s not get into that since OpenSSL has support issues for the better alternative, GCM, but I’ll provide a simple solution to this at the end

Let’s see if we can do better.

How can public key cryptography solve these problems?

Introduction to public key cryptography

Feel free to skip this little section if you know what public key crypto is.

We call symmetric key cryptography, the kind of cryptography where the encryption and decryption, or signing and verifying, uses the same key. In public key cryptography, or asymmetric key cryptography, we use the public key to encrypt data, and private key to decrypt data. Or we use the private key to sign data, and public key to verify signatures.

But public key encryption doesn’t really help as a replacement

It may be a surprise, but there’s no way to encrypt large data with public keys. The reason for this is that encryption algorithms usually are either done in blocks (like in AES), or in streams (like Chacha20). Public key encryption that we have is either considered a block cipher (like RSA encryption), or is used to generate symmetric keys using some agreement protocol, like Diffie-Hellman, which then uses symmetric encryption. And given that RSA encryption is very expensive, because it’s an exponentiation of a large number, it’s not really practical to use it for large data, so no standard algorithms are found in software like OpenSSL to encrypt your backups. You’ll never see RSA-4096-CBC or RSA-4096-GCM for that reason.

How then do we encrypt without a password?

The trick is simple. We generate a truly random password, encrypt the password with RSA and store it in a file, use the password for encryption, then we forget the password and move on. The password then can only be recovered with a private key (which will be encrypted). Let’s see how we do this.

First, in your local machine where you’re OK with unpacking the data, generate your RSA keys. Generate your decryption private key (encrypted with a passphrase):

$ openssl genrsa -aes256 -out priv_key.pem 4096

and from that, generate your public key

$ openssl rsa -in priv_key.pem -outform PEM -pubout -out pub_key.pem

Now this password that you use to encrypt this key will be the only password you’ll ever type. Your backups don’t need it.

Now copy the pub_key.pem file to the remote machine where backups have to happen.

If you have a backup bash script, you can do the following now:

#!/bin/bash

# generate a random password with 256-bit entropy, so PBKDF2 won't be a problem
password=$(openssl rand -hex 64)

# tar and encrypt your data with the password
tar -czpf - /path/to/directory-to-backup | openssl aes-256-cbc -salt -pbkdf2 -iter 100000 -pass pass:${password} | dd of=output.aes

# encrypt the password using your RSA key, and write the output to a file
echo ${password} | openssl rsautl -encrypt -inkey pub_key.pem -pubin -in - -out output.password.enc

To view the password in that file, you can view it in stdout using this command:

$ openssl rsautl -decrypt -inkey priv_key.pem -in output.password.enc -out -

This will print the password to your terminal, which you can copy and use to decrypt your encrypted archive.

Finally, to decrypt your data, run the command:

$ dd if=output.aes | openssl aes-256-cbc -d -salt -pbkdf2 -iter 100000 | tar -xpzf -

and when asked, input the password you copied from the private key decryption of the command before

WARNING: It’s WRONG to use the same password and encrypt it again and again and again with RSA (with this scheme, at least). Be aware that encrypting the same password with the same public key will yield the same output. This means that someone with access to the encrypted files can recognize that you’re using the same password everywhere.

Conclusion

No need, anymore, to choose between encrypting your backups and not being there. You can have both with this simple trick. This is also more secure because the 256-bit password generated randomly can never be brute-forced through the cheap pbkdf2, which basically means that brute-forcing the encryption itself is cheaper than brute-forcing the password that derived the key, which will guarantee the 256-bit encryption strength.

Now about using CBC instead of GCM, you can create your own HMAC to authenticate your archives. Or, even easier, you can call sha256sum on your archive and encrypt the output with your public key or even sign it. That’s good enough and can be streamlined into anything you automate for your backups.

I just wanna mention also that this isn’t an innovation of mine by any means… this is called in OpenSSL the “envelope encryption”, but for some reason it’s only supported when using the C library and not when using the OpenSSL executable. This is also a common way for encryption in many applications because asymmetric encryption for data is impractical with the available tooling. I wrote this article because it seems that no one talks about this and everyone is busy typing possibly insecure passwords in their encryption archives. Hopefully this will become more common after this article.

Samer Afach

How to backup and end-to-end encrypt your remote server data without manually inputting a password