It looks like this can be fixed by checking the items returned by the list_bucket command (both the entries and common_prefix_entries) to see if the part of the name beyond the prefix starts with a slash (/). If it does, then we're on a directory boundary and things will work fine. If it doesn't, then this is one of those cases that was causing the problem, so we should just skip that item.
There is already code to extract the part of the name beyond the prefix to check for excludes. So, I just used this excludePath variable to do the check for the initial slash. Then I added some if logic around the code that calls S3Node.new or recurses into s3TreeRecurse again (depending on the item type) to skip these calls when the prefix doesn't align with a directory boundary. There's probably a more elegant way to do what I did, but for a quick test this seems to work.
The following patch shows my implementation:
--- s3sync.rb 2008-01-06 10:25:55.000000000 -0500
+++ s3sync.rb.mod 2008-04-03 17:08:51.000000000 -0400
@@ -314,22 +314,30 @@
if not (item.kind_of? String)
# this is an item
excludePath = item.name.slice($S3SyncOriginalS3Prefix.length...item.name.length)
- if $S3SyncExclude and $S3SyncExclude.match(excludePath)
- debug("skipping S3 item #{excludePath} due to --exclude")
+ if !excludePath.empty? && excludePath[0,1] != '/'
+ debug("file not on directory boundary. skipped")
else
- debug("S3 item #{item.name}")
- g.yield(S3Node.new(bucket, prefix, item))
+ if $S3SyncExclude and $S3SyncExclude.match(excludePath)
+ debug("skipping S3 item #{excludePath} due to --exclude")
+ else
+ debug("S3 item #{item.name}")
+ g.yield(S3Node.new(bucket, prefix, item))
+ end
end
else
# it's a prefix (i.e. there are sub keys)
partialPath = item.slice(prefix.length..item.length) # will have trailing slash
excludePath = item.slice($S3SyncOriginalS3Prefix.length...item.length)
- # recurse
- if $S3SyncExclude and $S3SyncExclude.match(excludePath)
- debug("skipping prefix #{excludePath} due to --exclude")
+ if !excludePath.empty? && excludePath[0,1] != '/'
+ debug("file not on directory boundary. skipped")
else
- debug("prefix found: #{partialPath}")
- s3TreeRecurse(g, bucket, prefix, partialPath) if $S3syncOptions['--recursive']
+ # recurse
+ if $S3SyncExclude and $S3SyncExclude.match(excludePath)
+ debug("skipping prefix #{excludePath} due to --exclude")
+ else
+ debug("prefix found: #{partialPath}")
+ s3TreeRecurse(g, bucket, prefix, partialPath) if $S3syncOptions['--recursive']
+ end
end
end
end
This appears to fix the problem that I'm seeing. Someone who knows the code should check this carefully to make sure that it doesn't mess up anything else.
Thoughts?
Rob