S3Sync.net
February 02, 2014, 01:34:56 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
  Home Help Search Login Register  
  Show Posts
Pages: 1 2 3 [4]
46  General Category / Feature Requests / Re: Option to encrypt stored data using gpg on: February 20, 2007, 10:22:19 AM
my 2 cents: I don't think encrypting/decrypting the stored data is the business of an rsync lookalike.  you would be asking for trouble from users who forgot their keys, changed or buggy encryption code in a library, etc etc etc.  you could provide a utility that does the encryption in a separate process (still asking for trouble) or just point the users to things like GPG and its many wrappers.  any time your code is transforming the user's data, especially when it is not reversible, is asking for a big helping of blame, hold the mustard.

of course encrypting the stored data is a separate question from transferring the files using SSL, which I *do* think s3sync should (and already does) support.
47  General Category / Feature Requests / Re: Option to use size/modified time in comparisons on: February 20, 2007, 10:13:18 AM
s3 it isn't like rsync where just a few bytes can be sent or retrieved to sync the files, so if the size is different you are stuck getting the whole thing, right?  a size check seems like a cheap way to avoid calculating the etag before a retrieve.  of course for puts there is no way around it, and eventually you will do the etag check after a retrieve if you want to be maximum safe (but that's on the retrieved bits, not on the ones you already have).

timestamps might be something to restore, but if the bits are the same the timestamp is a cheap fix without getting anything but the metadata.
48  General Category / General Discussion / Re: hard links on: February 20, 2007, 08:15:30 AM
yes, I agree with you about the obstacles...I've been thinking about it for a while but no finished idea has bloomed.  it's possible a filesystem could make it all simple, but I'm not holding my breath...s3 definitely has some issues when you want it to pretend to be a harddrive ;-).

the "find the hardlinks" code might be useful to warn users in a log message, e.g. "files 'x' 'y.blah'" are hardlinks, data will be duplicated".

memory usage:  yes, if you had a full filesystem backed up using hardlinks this would be problematical, however there is no way to find all the links without looking at all the files.  I suppose one might iterate over all the files one inode at a time, spit out the result, then go again, but frankly I'm too lazy to go far with that...I have around 50,000 files in my HOME directory, and if all of them were paired hardlinks the memory usage wouldn't be enough to get excited about.  if you had a million files it would be an issue, but if you have filesystems like that you have bigger issues to think about.

me:  I've been fooling around with s3 for two reasons, 1) because I need to back up my pics, and 2) because I wanted to learn ruby (or python, but ruby won out).  I started out with a little app, no classes, no rubyisms, etc, and then it grew like topsy.  now I have a big app with one big class that does most of the work, a bunch of little helper classes, and 4,000 commandline switches, but I finally didn't like that much so I refactored the whole thing into a bunch of obvious classes (bucket, service, ...) and smaller focused apps, e.g. s3mkbucket, s3rm, etc.  now the code uses 'yield' and blocks idiomatically and as a bonus doesn't fill up huge arrays with interim results and the utilities are much more typical unix-like (small, focused scripts).  I learned quite a bit about ruby vs c++ in the process, and I ended up with some useful gadgets.  fooling around with it also ended up making me relearn some css and html so I could use s3 as a webserver for pics and whatnot.

I like participating in the s3sync discussions because I've learned a lot by so doing, even if I did occasionally broadcast my ignorance (e.g. SSL x.509 certs). 

why not s3sync?  as I said, one goal was to learn ruby, and you can't really do that by just looking at code.  what I ended up with is not really rsync-like, although it performs many of the same functions.  for example none of my gadgets generate their own list of keys to archive, retrieve, etc.  on the other hand my s3archive *does* look before leaping, i.e. it doesn't just blindly copy bits without checking what is there first, and my s3get is the same way, it looks first and only retrieves if necessary.

my stuff does do some things I doubt s3sync does, for example I can look at ACLs either as xml or in summary format, I can use canned-acls to set permissions or selectively modify the xml to change permissions for a single grantee (REXML::document) on one or more keys or buckets, in other words I wrapped the ACL-related code in S3.rb and turned it into some utilities.
49  General Category / General Discussion / hard links on: February 19, 2007, 09:39:28 PM
as I recall, s3sync doesn't attempt to deal with hardlinks, i.e. it just copies the data multiple times.  note hardlinks exist  in linux, mac and (yes) ntfs filesystems.  the code below will detect them on linux, but I don't know about mac and not likely on windows.  note that just like rsync it doesn't try to find them outside the directory tree being examined because it would have to look at the entire filesystem to be sure.

once detected I imagine some bookeeping could be invented to save multiple copies by keeping notes in metadata.  if hardlinks worked it would be a lot easier to come up with a snapshot style backup utility built on s3sync the same as you would with rsync.

note this is *not* a feature request...I'm not using s3sync myself ;-).

--- cut ---
#!/usr/bin/ruby -w

require 'find'
require 'yaml'

def find_hard_links(path)
  links = Hash.new { |hash,key| hash[key] = [] }
  Find.find(path) do |f|
    if File.file?(f)
      s = File.stat(f)
      if s.nlink > 1
        ino = File.stat(f).ino
        links[s.ino].push(f)
      end
    end
  end
  links
end

path = ARGV[0] if ARGV[0]
path = "." unless path

# links is a hash with key = inode number and value = array of paths
links = find_hard_links(path)
YAML::dump(links,$stdout)
--- cut ---
50  General Category / General Discussion / s3sync thread on aws forum on: February 19, 2007, 08:09:37 PM
greg,

don't forget you should post to the old thread every so often, otherwise it will sink out of sight and eventually be lost.
51  General Category / General Discussion / Re: Is this thing on??? on: February 19, 2007, 08:05:46 PM
the most annoying thing about s3 to me is not the occasional outage, but the fact that aws never says jack about what is going on, no warnings, no informative "we are about to use you as guinea pigs to test" posts, and very little explanation after some bad thing is corrected; further they ignore any and all questions about schedule, priorities (you just get the generic "oh yeah, we sure are considering *that*! bit) test plans, etc.
52  General Category / Feature Requests / Re: s3sync.rb to run without current directory defined to where s3try.rb, S3.rb on: February 18, 2007, 07:31:04 AM
why not set RUBYLIB to point to the library directory?  it is a colon-separated string on linux and semicolon separated on windows (but on windows you can set it permanently via somewhere in My Computer I think).  for example, in bash:

export RUBYLIB=/usr/local/whatever/lib:/home/me/lib
Pages: 1 2 3 [4]
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!