I just started using s3sync.rb to backup the files on my web server to S3. I really like the program and the features of it were exactly what I was looking for.
I wrote a little wrapper script to run the program to selectively back up directories from my web server. When I ran this, I found an interesting problem with the program that may be me doing something wrong or may be a bug. I'm not sure.
The problem occurs when s3sync.rb is run in delete mode syncing to a target directory in S3 that has sibling directories that start with the same characters as the target directory. When this occurs, s3sync.rb removes the other directories and I don't think that it should. (However, I'm new at this and I want to see what you think.) So, if I have a directory on S3 with the following directories in it: test and test2, when I try to sync test, it will remove test2.
What appears to be happening is that s3sync.rb is matching the first part of the directory name and not taking into account that it is not at a directory boundary (/).
This is kind of hard to explain, so let me give you an example that shows what is happening pretty clearly.
I have a local directory with the following files:
$ ls -al
total 10
drwxrwxr-x 5 vpcweb vpcweb 2048 Mar 31 17:46 .
drwxr-x--- 20 vpcweb nobody 2048 Mar 31 17:45 ..
drwxrwxr-x 4 vpcweb vpcweb 2048 Mar 31 16:31 test
drwxrwxr-x 4 vpcweb vpcweb 2048 Mar 19 18:19 test2
drwxrwxr-x 2 vpcweb vpcweb 2048 Mar 31 17:46 test3
$
The whole directory tree looks like the following:
$ find .
.
./test
./test/test_file2
./test/test_file
./test/dir_abc
./test/dir_abc/test_file_ab
./test/dir_abc/test_file_a
./test/dir_a
./test/dir_a/file2
./test/dir_a/dir2
./test/dir_a/dir2/file1
./test/dir_a/dir2/dir3
./test/dir_a/dir2/dir3/file
./test3
./test2
./test2/dir
./test2/dir/file_c
./test2/test_dir
./test2/test_dir/test_file2
$
My S3 bucket is currently empty:
$ s3cmd.rb list vpc_test
--------------------
$
I want to sync the test and test2 directories to S3 using s3sync.rb but not the test3 directory. To do this I run s3sync.rb twice. Once to backup the test directory and once to backup the test2 directory. (This example is a bit contrived but it should get the point across. For a case this simple you could probably use exclude to do the same thing in one run and not hit this problem.)
The first time I do this, things work fine:
$ s3sync.rb -s -r -v --delete ./test/ vpc_test:backup_a/test
Create node dir_a
Create node dir_a/dir2
Create node dir_a/dir2/dir3
Create node dir_a/dir2/dir3/file
Create node dir_a/dir2/file1
Create node dir_a/file2
Create node dir_abc
Create node dir_abc/test_file_a
Create node dir_abc/test_file_ab
Create node test_file
Create node test_file2
$
$ s3sync.rb -s -r -v --delete ./test2/ vpc_test:backup_a/test2
Create node dir
Create node dir/file_c
Create node test_dir
Create node test_dir/test_file2
$
And all my files are on S3:
$ s3cmd.rb list vpc_test
--------------------
backup_a/test/dir_a
backup_a/test/dir_a/dir2
backup_a/test/dir_a/dir2/dir3
backup_a/test/dir_a/dir2/dir3/file
backup_a/test/dir_a/dir2/file1
backup_a/test/dir_a/file2
backup_a/test/dir_abc
backup_a/test/dir_abc/test_file_a
backup_a/test/dir_abc/test_file_ab
backup_a/test/test_file
backup_a/test/test_file2
backup_a/test2/dir
backup_a/test2/dir/file_c
backup_a/test2/test_dir
backup_a/test2/test_dir/test_file2
$
However, when I run this again (with no changes to the local source directories) the first command deletes the directory that the second command put on S3:
$ s3sync.rb -s -r -v --delete ./test/ vpc_test:backup_a/test
Remove node 2/dir/file_c
Remove node 2/test_dir/test_file2
Remove node 2/test_dir
Remove node 2/dir
$
And we see that the files have been removed from S3:
$ s3cmd.rb list vpc_test
--------------------
backup_a/test/dir_a
backup_a/test/dir_a/dir2
backup_a/test/dir_a/dir2/dir3
backup_a/test/dir_a/dir2/dir3/file
backup_a/test/dir_a/dir2/file1
backup_a/test/dir_a/file2
backup_a/test/dir_abc
backup_a/test/dir_abc/test_file_a
backup_a/test/dir_abc/test_file_ab
backup_a/test/test_file
backup_a/test/test_file2
$
The second command restores the files from it's directory:
$ s3sync.rb -s -r -v --delete ./test2/ vpc_test:backup_a/test2
Create node dir
Create node dir/file_c
Create node test_dir
Create node test_dir/test_file2
$
And we see all the files back in S3:
$ s3cmd.rb list vpc_test
--------------------
backup_a/test/dir_a
backup_a/test/dir_a/dir2
backup_a/test/dir_a/dir2/dir3
backup_a/test/dir_a/dir2/dir3/file
backup_a/test/dir_a/dir2/file1
backup_a/test/dir_a/file2
backup_a/test/dir_abc
backup_a/test/dir_abc/test_file_a
backup_a/test/dir_abc/test_file_ab
backup_a/test/test_file
backup_a/test/test_file2
backup_a/test2/dir
backup_a/test2/dir/file_c
backup_a/test2/test_dir
backup_a/test2/test_dir/test_file2
$
Note that exactly the same behavior is exhibited if I use the following format of the s3sync.rb commmands:
s3sync.rb -s -r -v --delete ./test vpc_test:backup_a
s3sync.rb -s -r -v --delete ./test2 vpc_test:backup_a
To me, this isn't the way that it should work. Am I missing something or is this a subtle bug?
Thanks for any insight.
Rob