General Category => General Discussion => Topic started by: babelian on April 24, 2008, 03:32:08 PM

Title: s3sync speed vs rsync
Post by: babelian on April 24, 2008, 03:32:08 PM
when syncing an up to date mirror, s3sync is taking about 10 minutes to do what psync/rsync did in < 5 minutes, and uses alot more cpu over the period (somewhat subjective, but it seems to be sitting at 30% the whole time whereas rsync just burst for a short period at the beginning while it built the file list).

i am assuming this is because s3sync can't pull a list of files at the beginning so spends alot more time traversing the directories (in XML rather than ssh no less). is this the case? has anyone else noticed comparably poor results when dealing with alot of files?

(i used to have my whole user directory (~120k files .. crazy, but there we go..) syncing every few hours to strongspace with rsync with no issue, but if its going to take half an hour each time and use 30% cpu while it works then its not really viable (not to mention my cost may be in the get/put/list requests - will see once it balances out).

amendment: rsync does seem to be taking a comparable amount of time, just not using any cpu, and its only about 23k files (not sure where i got the 120k, maybe that was the whole drive)

Title: Re: s3sync speed vs rsync
Post by: ferrix on April 24, 2008, 10:39:11 PM
I'd guess the cpu is due to s3sync md5'ing every local file to check if it needs to be re-sync'd.  But just a best guess.  Otherwise, it's not really meaningful to compare the two.  They don't share any common code or architecture.

Title: Re: s3sync speed vs rsync
Post by: danm2 on May 29, 2008, 06:10:49 AM

First there is know issue in ruby library.  That a look on this post http://s3sync.net/forum/index.php?topic=191.0

Second and more important. Given that S3 is all or nothing, i.e. if you want to change a file you have to PUT the whole thing again.
When using s3sync every time a local file has been changed it will uploads the hole new file, not just what has changed. That means that are wasting unnecessary bandwidth and take more time.

If you to benefit from Rsync bandwidth efficient algorithm You need to use your own Amazon Ec2 machine or to
3rd party gateway like: http://www.s3rsync.com/