S3Sync.net
February 02, 2014, 01:36:04 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Option to use size/modified time in comparisons  (Read 3829 times)
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« on: February 20, 2007, 05:47:45 AM »

This is non-trivial in some ways to do correctly, but in certain situations it could be much faster than reading every byte of every local file to calculate its etag.
Logged
lowflyinghawk
Jr. Member
**
Posts: 52


View Profile
« Reply #1 on: February 20, 2007, 10:13:18 AM »

s3 it isn't like rsync where just a few bytes can be sent or retrieved to sync the files, so if the size is different you are stuck getting the whole thing, right?  a size check seems like a cheap way to avoid calculating the etag before a retrieve.  of course for puts there is no way around it, and eventually you will do the etag check after a retrieve if you want to be maximum safe (but that's on the retrieved bits, not on the ones you already have).

timestamps might be something to restore, but if the bits are the same the timestamp is a cheap fix without getting anything but the metadata.
Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #2 on: February 20, 2007, 05:14:09 PM »

I already check the size before the etag and skip etag if the size is different.  But the problem is that 99% of the time the size is the same, so the etag check has to go ahead.  Of course no one would claim that "same size" is good enough to know that the files are the same.

The point of my comment on modified time is not to restore timestamps... but rather to use them as a method of determining whether to sync *in lieu* of etag checking.  This way we would always be able to use meta data locally to check, rather than md5'ing the file.

I wouldn't make this the default, or take the old behavior away.  Just some low-hanging fruit to speed things up especially on my slow ass windows XP machine.
Logged
lowflyinghawk
Jr. Member
**
Posts: 52


View Profile
« Reply #3 on: February 20, 2007, 06:47:28 PM »

timestamps would only be adequate under special circumstances, i.e. you'd have to have special knowledge about the files in question.  rsync takes the opposite tack though, believing the files are the same if the timestamp and size match (of course it has options to control the behavior).

here is another thing I didn't know, from the rsync man page:

              When  comparing  two  timestamps, rsync treats the timestamps as
              being equal if they differ by no  more  than  the  modify-window
              value.   This  is  normally  0 (for an exact match), but you may
              find it useful to set this to a larger value in some situations.
              In  particular,  when  transferring to or from an MS Windows FAT
              filesystem (which represents times with a 2-second  resolution),
              --modify-window=1 is useful (allowing times to differ by up to 1
              second).

ps:  unless your bandwidth is extreme and your drives awfully slow I don't know if it is worth worrying about the time to compute the checksum.  I tested by filling a 1G file with bytes from /dev/urandom and then calculating the md5 on the file.  interestingly, doing it in ruby is just as fast as doing it with md5sum (a compiled utility on linux), about 15 seconds total.  if you had 1MB/s up bandwidth it would still take you 17 minutes to upload the bytes, so I think the md5 calculation time is in the noise.
« Last Edit: February 20, 2007, 07:10:40 PM by lowflyinghawk » Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!