S3Sync.net
February 02, 2014, 01:34:56 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
  Home Help Search Login Register  
  Show Posts
Pages: 1 2 [3] 4
31  General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads on: March 02, 2007, 07:23:50 PM
not quite, from *my* tests it is clear that it is *nothing* in ruby, since ruby and wget take the same time.

however brad didn't say what platform he's running on.  you (greg) are on windows, right?  why don't you take a minute, click my link earlier, and find out how long your browser takes to get the file (I assume you don't have wget, although you could get it from http://www.christopherlewis.com/WGet/WGetFiles.htm) vs how long it takes to get it with s3sync?

brad: platform?
32  General Category / General Discussion / Re: Amazon object limit of 5 GB causing problem for rsync (s3sync) on: March 02, 2007, 04:40:13 PM
greg:  you're going to have to answer this one.  I don't know a thing about firefox s3, and I assume the usual "other tools may not be compatible" applies to it, but what about s3sync and s3cmd?
33  General Category / Feature Requests / Re: Show Upload Time in s3cmd's list command on: March 02, 2007, 06:45:08 AM
you are mistaken, the size of your folders is irrelevant, only the size of the individual files matters. see your other thread.

the HEAD comment was intended for the developer, not for you.
34  General Category / General Discussion / Re: Amazon object limit of 5 GB causing problem for rsync (s3sync) on: March 02, 2007, 06:41:52 AM
yes, that's what I'm saying.  the limit applies only to the data stored under one key, so you can store 100 2G files all under your directory foo/bar.  these keys are *not* directories, i.e. /foo/bar and /foo/baz are completely separate from one another.  better than me saying it is the s3sync developer:

"The size of the folder is irrelevant, only the size of each node.  s3sync maps one file per node.  So if you have a file > 5G then you can't use s3sync.  Otherwise it should be OK."

always with the caveat that amazon hasn't fixed the >2G issue AFAIK.
35  General Category / General Discussion / Re: Amazon object limit of 5 GB causing problem for rsync (s3sync) on: March 01, 2007, 07:12:38 PM
no, it's one key per file, so a folder containing 10 2G files maps to 10 separate keys each of which is PUT separately.  the only limit is on the individual keys, i.e. foo:bar/baz can't be over 2G, but the total of foo:/* doesn't matter.  remember, S3 is not a file system on a disk, it is a name/value database, so only the individual keys matter.  /foo/bar and /foo/bar/baz are not contents of the folder "/foo" in the way you may be used to thinking of it, each one is just a key.
36  General Category / Feature Requests / Re: Show Upload Time in s3cmd's list command on: March 01, 2007, 07:05:39 PM
s3sync puts the files in a folder one by one already right?  so if you can s3cmd put multiple individual files, then can't you use s3sync on the folders?

if you do a HEAD before PUT it is possible to check a file for changes and not do the PUT at all if it's the same, i.e. approximately the way rsync behaves.  if s3sync does this already (and I assume it does), then cm_gui could just put the whole folder and only the changed files will actually be transferred.  this is the whole idea isn't it?
37  General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads on: March 01, 2007, 11:47:47 AM
bdixon:  yes, I'm blind ;-).

greg: what happens when you do this?  I tried wget with the file above, and my times in ruby and wget are less than 1 second different (~29.5sec), well within the noise.  I could understand a bit of overhead, e.g. I do a HEAD call that wget doesn't and I do an extra etag check, but 6x-7x is way out of line.  platform may matter, I'm on fedora core 6, x86.
38  General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads on: March 01, 2007, 08:38:55 AM
oh boy, I just love it when my login times out while writing a post...

here is a test we haven't tried: cut ruby out of the loop by downloading the same file using your browser.  there is a bit of fuzz because I have to click "save to disk" with firefox, but my time to download the link below is pretty much exactly the same as using ruby (30 seconds), which says neither my code nor the ruby libraries are likely to be doing anything stupid.

http://s3.amazonaws.com/1001/zero.bin

the file contains exactly 20,000,000 bytes and you should be able to download it for your own testing.
39  General Category / General Discussion / linux/windows filesystem ruby portability on: February 28, 2007, 07:20:25 AM
I store keys from linux with names that sound like filesystem paths, e.g. "data/blah/whatever.txt".  when I retrieve them the code looks like this (leaving out error handling, etc):

        path = File.dirname(key)
        FileUtils.mkdir_p(path)
        File.open(key,"wb") do |f|
           get_response = @conn.get(bucket, key,{},f)
        end

what happens if I use this same code on windows to retrieve the same sample key as above?  the docs imply that ruby will do the right thing for the local filesystem, but I haven't found a definitive answer to that in the ruby docs.

of course it's easy enough to try out...if you have a windows box ;-).
40  General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads on: February 27, 2007, 08:03:55 PM
for what it's worth, here is the profiler output from a GET of a 22M file (I removed any calls below 1%).  remember, during a normal run without the profiler, the cpu never gets above 1% except calculating the etag, my guess is because the ruby lib just calls down to the socket library which uses hardly any cpu.  you can see the call that generates the etag down near the bottom.

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 17.33    14.13     14.13    21662     0.65     0.93  Thread#kill
 15.43    26.71     12.58    21664     0.58     1.76  Timeout.timeout
 14.19    38.28     11.57        1 11570.00 76470.00  Net::BufferedIO#read
  7.28    44.22      5.94    21661     0.27     0.39  Net::ReadAdapter#call_block
  4.84    48.17      3.95    21664     0.18     1.95  Object#timeout
  4.52    51.86      3.69    21687     0.17     0.26  Net::BufferedIO#rbuf_consume
  4.14    55.24      3.38    21662     0.16     0.16  Thread#start
  3.00    57.69      2.45        5   490.00 16106.00  IO#open
  2.53    59.75      2.06    21661     0.10     0.49  Net::ReadAdapter#<<
  2.43    61.73      1.98    21662     0.09     2.04  Net::BufferedIO#rbuf_fill
  2.32    63.62      1.89    21687     0.09     0.09  String#slice!
  2.29    65.49      1.87    21682     0.09     0.09  String#<<
  2.16    67.25      1.76    21661     0.08     0.08  Fixnum#<
  2.10    68.96      1.71    21661     0.08     0.12  IO#<<
  1.99    70.58      1.62    65010     0.02     0.02  String#size
  1.85    72.09      1.51    21676     0.07     0.10  Fixnum#==
  1.55    73.35      1.26    43469     0.03     0.03  Fixnum#+
  1.25    74.37      1.02    21662     0.05     0.05  IO#sysread
  1.16    75.32      0.95    21660     0.04     0.04  Digest::Base#<<
  1.13    76.24      0.92    21662     0.04     0.04  Kernel.sleep
  1.04    77.09      0.85    21665     0.04     0.04  IO#write
41  General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads on: February 27, 2007, 07:25:30 PM
when I GET with the ruby lib stuff set up to stream I get up to about 700KB/s, the same maximum that my ISP's speed test shows.  I watched the process while downloading a 22M test file (this is on linux).  the only time the ruby process even registered 1% of the cpu was the last second or so when calculating the etag check (which I do after the download finishes).  altogether this makes me think the ruby libraries, certainly on linux, are not likely to be the problem area.

you should be able to run the ruby profiler by
    ruby -r profile s3sync ...

notice I didn't mention PUTs?  that's because my DSL connection limits them to around 40KB/s.  for what it's worth the cpu consumption never is even visible except for when calculating the etag, but this speed is so low as to hardly be a challenge.

during the GETs the segments I'm getting back are usually 1024 or 436 bytes, but they can be other lengths.  don't ask me why, I don't control it.
42  General Category / Feature Requests / Re: Option to use size/modified time in comparisons on: February 20, 2007, 06:47:28 PM
timestamps would only be adequate under special circumstances, i.e. you'd have to have special knowledge about the files in question.  rsync takes the opposite tack though, believing the files are the same if the timestamp and size match (of course it has options to control the behavior).

here is another thing I didn't know, from the rsync man page:

              When  comparing  two  timestamps, rsync treats the timestamps as
              being equal if they differ by no  more  than  the  modify-window
              value.   This  is  normally  0 (for an exact match), but you may
              find it useful to set this to a larger value in some situations.
              In  particular,  when  transferring to or from an MS Windows FAT
              filesystem (which represents times with a 2-second  resolution),
              --modify-window=1 is useful (allowing times to differ by up to 1
              second).

ps:  unless your bandwidth is extreme and your drives awfully slow I don't know if it is worth worrying about the time to compute the checksum.  I tested by filling a 1G file with bytes from /dev/urandom and then calculating the md5 on the file.  interestingly, doing it in ruby is just as fast as doing it with md5sum (a compiled utility on linux), about 15 seconds total.  if you had 1MB/s up bandwidth it would still take you 17 minutes to upload the bytes, so I think the md5 calculation time is in the noise.
43  General Category / General Discussion / Re: Amazon object limit of 5 GB causing problem for rsync (s3sync) on: February 20, 2007, 06:37:03 PM
AWS has never announced a fix for this, nor a schedule. "we're working on it" is all you get.  lots of people have complained about it though, and you should post another to keep the topic warm.
44  General Category / Feature Requests / Re: command to move objects? on: February 20, 2007, 03:49:24 PM
fyi, moving an object means retrieving it and then PUTting it to the new key.  s3 provides no other means of doing this.  it is fine for small objects, but very painful for big ones.  it would be possible to build something on top of s3 that had the necessary bookeeping, in other words a layer of indirection, but it wouldn't be easy to make it reliable.
45  General Category / General Discussion / Re: hard links on: February 20, 2007, 03:46:23 PM
AFAIK the metadata can't be modified.  this seems like an artificial limitation given that ACLs, which are really just metadata, can.  something else that would be useful for the commercial guys is being able to set permissions on prefixes, i.e. make them more like directories, then you effectively could have any number of buckets.

yes, yield, blocks, et al are fun.  ruby is an interesting jar to your mindset if you are used to something like c++.  mixins, include, extension of existing classes, etc, are quite different.  the libraries are pretty good.  I'm sure I still don't understand the scope rules though ;-).   on the other hand, compared to c++, it is weird getting used to how many mistakes you can make without seeing a complaint (until your boss is looking and that odd branch in the code runs for the first time).   the compiler catches a lot of errors for you in c++, and it takes getting used to when you don't have it.  one exception is c++ templates where member functions can have all sorts of errors that don't show up until they are instantiated, i.e. the compiler really doesn't do much beyond tokenization unless the function is actually called.

I programmed a lot in perl at one time, then came back to it after 5 years or so of heavy c++, and it just doesn't seem to scale (other than CPAN, which is incredible)...that grafted on OO biz just doesn't make it for me.  in all, ruby is a nice language.  funnily enough, what made me start looking at it is something I don't even use, Rails...now there is some power on display!
Pages: 1 2 [3] 4
Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!