Show Posts
|
Pages: 1 2 [3] 4
|
34
|
General Category / General Discussion / Re: Amazon object limit of 5 GB causing problem for rsync (s3sync)
|
on: March 02, 2007, 06:41:52 AM
|
yes, that's what I'm saying. the limit applies only to the data stored under one key, so you can store 100 2G files all under your directory foo/bar. these keys are *not* directories, i.e. /foo/bar and /foo/baz are completely separate from one another. better than me saying it is the s3sync developer:
"The size of the folder is irrelevant, only the size of each node. s3sync maps one file per node. So if you have a file > 5G then you can't use s3sync. Otherwise it should be OK."
always with the caveat that amazon hasn't fixed the >2G issue AFAIK.
|
|
|
35
|
General Category / General Discussion / Re: Amazon object limit of 5 GB causing problem for rsync (s3sync)
|
on: March 01, 2007, 07:12:38 PM
|
no, it's one key per file, so a folder containing 10 2G files maps to 10 separate keys each of which is PUT separately. the only limit is on the individual keys, i.e. foo:bar/baz can't be over 2G, but the total of foo:/* doesn't matter. remember, S3 is not a file system on a disk, it is a name/value database, so only the individual keys matter. /foo/bar and /foo/bar/baz are not contents of the folder "/foo" in the way you may be used to thinking of it, each one is just a key.
|
|
|
36
|
General Category / Feature Requests / Re: Show Upload Time in s3cmd's list command
|
on: March 01, 2007, 07:05:39 PM
|
s3sync puts the files in a folder one by one already right? so if you can s3cmd put multiple individual files, then can't you use s3sync on the folders?
if you do a HEAD before PUT it is possible to check a file for changes and not do the PUT at all if it's the same, i.e. approximately the way rsync behaves. if s3sync does this already (and I assume it does), then cm_gui could just put the whole folder and only the changed files will actually be transferred. this is the whole idea isn't it?
|
|
|
37
|
General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads
|
on: March 01, 2007, 11:47:47 AM
|
bdixon: yes, I'm blind ;-).
greg: what happens when you do this? I tried wget with the file above, and my times in ruby and wget are less than 1 second different (~29.5sec), well within the noise. I could understand a bit of overhead, e.g. I do a HEAD call that wget doesn't and I do an extra etag check, but 6x-7x is way out of line. platform may matter, I'm on fedora core 6, x86.
|
|
|
38
|
General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads
|
on: March 01, 2007, 08:38:55 AM
|
oh boy, I just love it when my login times out while writing a post... here is a test we haven't tried: cut ruby out of the loop by downloading the same file using your browser. there is a bit of fuzz because I have to click "save to disk" with firefox, but my time to download the link below is pretty much exactly the same as using ruby (30 seconds), which says neither my code nor the ruby libraries are likely to be doing anything stupid. http://s3.amazonaws.com/1001/zero.binthe file contains exactly 20,000,000 bytes and you should be able to download it for your own testing.
|
|
|
39
|
General Category / General Discussion / linux/windows filesystem ruby portability
|
on: February 28, 2007, 07:20:25 AM
|
I store keys from linux with names that sound like filesystem paths, e.g. "data/blah/whatever.txt". when I retrieve them the code looks like this (leaving out error handling, etc):
path = File.dirname(key) FileUtils.mkdir_p(path) File.open(key,"wb") do |f| get_response = @conn.get(bucket, key,{},f) end
what happens if I use this same code on windows to retrieve the same sample key as above? the docs imply that ruby will do the right thing for the local filesystem, but I haven't found a definitive answer to that in the ruby docs.
of course it's easy enough to try out...if you have a windows box ;-).
|
|
|
40
|
General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads
|
on: February 27, 2007, 08:03:55 PM
|
for what it's worth, here is the profiler output from a GET of a 22M file (I removed any calls below 1%). remember, during a normal run without the profiler, the cpu never gets above 1% except calculating the etag, my guess is because the ruby lib just calls down to the socket library which uses hardly any cpu. you can see the call that generates the etag down near the bottom.
% cumulative self self total time seconds seconds calls ms/call ms/call name 17.33 14.13 14.13 21662 0.65 0.93 Thread#kill 15.43 26.71 12.58 21664 0.58 1.76 Timeout.timeout 14.19 38.28 11.57 1 11570.00 76470.00 Net::BufferedIO#read 7.28 44.22 5.94 21661 0.27 0.39 Net::ReadAdapter#call_block 4.84 48.17 3.95 21664 0.18 1.95 Object#timeout 4.52 51.86 3.69 21687 0.17 0.26 Net::BufferedIO#rbuf_consume 4.14 55.24 3.38 21662 0.16 0.16 Thread#start 3.00 57.69 2.45 5 490.00 16106.00 IO#open 2.53 59.75 2.06 21661 0.10 0.49 Net::ReadAdapter#<< 2.43 61.73 1.98 21662 0.09 2.04 Net::BufferedIO#rbuf_fill 2.32 63.62 1.89 21687 0.09 0.09 String#slice! 2.29 65.49 1.87 21682 0.09 0.09 String#<< 2.16 67.25 1.76 21661 0.08 0.08 Fixnum#< 2.10 68.96 1.71 21661 0.08 0.12 IO#<< 1.99 70.58 1.62 65010 0.02 0.02 String#size 1.85 72.09 1.51 21676 0.07 0.10 Fixnum#== 1.55 73.35 1.26 43469 0.03 0.03 Fixnum#+ 1.25 74.37 1.02 21662 0.05 0.05 IO#sysread 1.16 75.32 0.95 21660 0.04 0.04 Digest::Base#<< 1.13 76.24 0.92 21662 0.04 0.04 Kernel.sleep 1.04 77.09 0.85 21665 0.04 0.04 IO#write
|
|
|
41
|
General Category / Closed Bugs / Re: s3sync performance testing: Slow Downloads
|
on: February 27, 2007, 07:25:30 PM
|
when I GET with the ruby lib stuff set up to stream I get up to about 700KB/s, the same maximum that my ISP's speed test shows. I watched the process while downloading a 22M test file (this is on linux). the only time the ruby process even registered 1% of the cpu was the last second or so when calculating the etag check (which I do after the download finishes). altogether this makes me think the ruby libraries, certainly on linux, are not likely to be the problem area.
you should be able to run the ruby profiler by ruby -r profile s3sync ...
notice I didn't mention PUTs? that's because my DSL connection limits them to around 40KB/s. for what it's worth the cpu consumption never is even visible except for when calculating the etag, but this speed is so low as to hardly be a challenge.
during the GETs the segments I'm getting back are usually 1024 or 436 bytes, but they can be other lengths. don't ask me why, I don't control it.
|
|
|
42
|
General Category / Feature Requests / Re: Option to use size/modified time in comparisons
|
on: February 20, 2007, 06:47:28 PM
|
timestamps would only be adequate under special circumstances, i.e. you'd have to have special knowledge about the files in question. rsync takes the opposite tack though, believing the files are the same if the timestamp and size match (of course it has options to control the behavior).
here is another thing I didn't know, from the rsync man page:
When comparing two timestamps, rsync treats the timestamps as being equal if they differ by no more than the modify-window value. This is normally 0 (for an exact match), but you may find it useful to set this to a larger value in some situations. In particular, when transferring to or from an MS Windows FAT filesystem (which represents times with a 2-second resolution), --modify-window=1 is useful (allowing times to differ by up to 1 second).
ps: unless your bandwidth is extreme and your drives awfully slow I don't know if it is worth worrying about the time to compute the checksum. I tested by filling a 1G file with bytes from /dev/urandom and then calculating the md5 on the file. interestingly, doing it in ruby is just as fast as doing it with md5sum (a compiled utility on linux), about 15 seconds total. if you had 1MB/s up bandwidth it would still take you 17 minutes to upload the bytes, so I think the md5 calculation time is in the noise.
|
|
|
44
|
General Category / Feature Requests / Re: command to move objects?
|
on: February 20, 2007, 03:49:24 PM
|
fyi, moving an object means retrieving it and then PUTting it to the new key. s3 provides no other means of doing this. it is fine for small objects, but very painful for big ones. it would be possible to build something on top of s3 that had the necessary bookeeping, in other words a layer of indirection, but it wouldn't be easy to make it reliable.
|
|
|
45
|
General Category / General Discussion / Re: hard links
|
on: February 20, 2007, 03:46:23 PM
|
AFAIK the metadata can't be modified. this seems like an artificial limitation given that ACLs, which are really just metadata, can. something else that would be useful for the commercial guys is being able to set permissions on prefixes, i.e. make them more like directories, then you effectively could have any number of buckets.
yes, yield, blocks, et al are fun. ruby is an interesting jar to your mindset if you are used to something like c++. mixins, include, extension of existing classes, etc, are quite different. the libraries are pretty good. I'm sure I still don't understand the scope rules though ;-). on the other hand, compared to c++, it is weird getting used to how many mistakes you can make without seeing a complaint (until your boss is looking and that odd branch in the code runs for the first time). the compiler catches a lot of errors for you in c++, and it takes getting used to when you don't have it. one exception is c++ templates where member functions can have all sorts of errors that don't show up until they are instantiated, i.e. the compiler really doesn't do much beyond tokenization unless the function is actually called.
I programmed a lot in perl at one time, then came back to it after 5 years or so of heavy c++, and it just doesn't seem to scale (other than CPAN, which is incredible)...that grafted on OO biz just doesn't make it for me. in all, ruby is a nice language. funnily enough, what made me start looking at it is something I don't even use, Rails...now there is some power on display!
|
|
|
|