Title: Option to encrypt stored data using gpg Post by: ferrix on February 20, 2007, 06:29:38 AM This also implies some more work to cache the etag value of encrypted contents locally.. or else using the modify date only in the comparator.
Could also store the unencrypted md5 sum to S3 as meta data, though that doesn't come down in the bucket list, so it's not clear that it would be too useful. I think the "check date first, then issue HEAD to check md5" approach would probably be fast enough. Edit 6/23/2008: The "--no-md5" option has since been added, which uses only the modify date and file size, as mentioned above. For encryption it may be sufficient to force this option, rather than doing some kind of wacky md5/etag cache. Title: Re: Option to encrypt stored data using gpg Post by: lowflyinghawk on February 20, 2007, 10:22:19 AM my 2 cents: I don't think encrypting/decrypting the stored data is the business of an rsync lookalike. you would be asking for trouble from users who forgot their keys, changed or buggy encryption code in a library, etc etc etc. you could provide a utility that does the encryption in a separate process (still asking for trouble) or just point the users to things like GPG and its many wrappers. any time your code is transforming the user's data, especially when it is not reversible, is asking for a big helping of blame, hold the mustard.
of course encrypting the stored data is a separate question from transferring the files using SSL, which I *do* think s3sync should (and already does) support. Title: Re: Option to encrypt stored data using gpg Post by: xpg on April 11, 2007, 02:53:34 AM I see your point lowflyinghawk, but I still think that having encryption of the data stored to S3 is a rather good idea. It is true, that using encryption requires a bit more thought with regards to key and algorithm handling. My guess is that there are quite a lot paranoid people out there that would enjoy such a feature. (I for one would like the added security). But you might have a very relevant point in whether this is something an rsync-like program should do, it would certainly make s3sync.rb more a dedicated backup-program.
With regards to the storage of the unencrypted md5 sum, ferrix, I have noticed that s3sync.rb creates nodes for each directory. This might be an appropriate place for storing the unencrypted md5sum, rather than having to issue separate HEAD requests for each file. But then again, I might have missed the point entirely :-) Title: Re: Option to encrypt stored data using gpg Post by: fredo on December 07, 2007, 04:21:22 PM Here's my shot at implementing encryption.
It's probably not very well coded and I'm not completly satisfied by the way it's implemented (see todo below) but it works. If somebody has time (or is brave enough) to give it a try, I'd be glad to hear his ideas to improve it. I have changed :
All other files are unchanged. It also requires the openssl and digest/sha2 ruby libraries but they 're usually bundled in the ruby package
Note that unencrypted files will be handled well too, but once you start to use encryption you cannot revert to the official s3sync.rb as it will not recognise that a file on S3 is encrypted and will update local files with encrypted data. You have been warned ! Todo : - either store uncrypted file md5 as metadata or optimize the comparison process (with this version each file is encrypted once for md5 comparison, and possibly a second time if the file needs to be uploaded - it could cache the encrypted file -). Though usually the speed of the syncing process is bandwidth-bound so encrypting twice is not slowing things too much. - have a command line option to force encryption or no encryption - other ? Title: Re: Option to encrypt stored data using gpg Post by: fredo on December 13, 2007, 05:54:43 AM My second try, this time by storing the unencrypted MD5 as metadata.
Files changed from official version : s3try.rb (same as try #1 above), HTTPStreaming.rb (some changes to CryptedStream class), and s3sync.rb. On the plus side : 1) less load on the CPU as the local files do not have to be encrypted to compare MD5 with S3, this makes a difference with the prior approach when comparing identical files between S3 and local as the process could be CPU-bound instead of bandwidth bound, 2) much less changes to the official version (about 10 lines of code added to s3sync.rb). Downside : a get headers command is necessary for each file, this slows down the process noticeably (x2 when files are both present on S3 and locally, no changes otherwise) Configuration variables are unchanged. Title: Re: Option to encrypt stored data using gpg Post by: edalquist on December 13, 2007, 05:32:48 PM That is a great patch fredo and exactly what I was looking for. I want to copy a backup to s3 and having the backup tool understand how to do the encryption makes the process much more efficient. It would be great if this feature could get into the core version.
Title: Re: Option to encrypt stored data using gpg Post by: ferrix on December 14, 2007, 07:02:25 PM Patience :)
Title: Re: Option to encrypt stored data using gpg Post by: jh on January 28, 2008, 09:08:24 AM I think the right way to do this is to allow an arbitrary filter before s3 writes and after s3 reads.
Instead of the current: Read File --> Put File on S3; and Get File from S3 --> Write File we'd have: Read File --> Arbitrary Filter --> Put File; and Get File --> Arbitrary Filer --> Write file Reasonable choices for the filters are "gzip -f" and "gunzip" (to cut down on bandwith and other costs), and some variation of gpg that doesn't require user input. (Is there such a thing?) |