Here's my shot at implementing encryption.
It's probably not very well coded and I'm not completly satisfied by the way it's implemented (see todo below) but it works. If somebody has time (or is brave enough) to give it a try, I'd be glad to hear his ideas to improve it.
I have changed :
- s3try.rb (to catch decryption errors),
- HTTPStreaming.rb (to add a CryptedStream class on the same model as the ProgressStream class),
- s3Sync.rb (renamed here s3syncC.rb).
All other files are unchanged. It also requires the openssl and digest/sha2 ruby libraries but they 're usually bundled in the ruby package
- Encryption is only used on the file contents; The file names, directories and symlinks are not encrypted.
- Encryption is used when uploading files to S3 if there is a $ENCRYPTION_ALGO constant set in config.yml file (pointing to the desired openssl encryption algorithm, for example "aes-256-cbc"). Additionnaly you may set a $ENCRYPTION_KEY constant for your password, though may also type in the password at runtime if you're not comfortable in storing you password in clear in the confg file.
- Decryption is used when downloading from S3 if the "encrypted" flag is set in the metadata (this flag is set automatically when uploaded). No other metadata is created, not even the unencrypted file md5. For this to work, s3sync.rb calculates both crypted and uncrypted md5 before assessing if a given file needs a refresh (not good - see todo). If the password is incorrect, an error is thrown and the local file (if there is one) is not overwritten.
Note that unencrypted files will be handled well too, but once you start to use encryption you cannot revert to the official s3sync.rb as it will not recognise that a file on S3 is encrypted and will update local files with encrypted data. You have been warned !
Todo :
- either store uncrypted file md5 as metadata or optimize the comparison process (with this version each file is encrypted once for md5 comparison, and possibly a second time if the file needs to be uploaded - it could cache the encrypted file -). Though usually the speed of the syncing process is bandwidth-bound so encrypting twice is not slowing things too much.
- have a command line option to force encryption or no encryption
- other ?