Title: Broken Pipe: solved? Post by: dodo on December 12, 2007, 01:57:32 PM Hi folks,
I ran into the "Broken pipe" problem. Trying to upload 20 GB of ~1 MB pictures constantly failed after a while with "broken pipe" and the retries simply counted down to 0 with no effect. A bit of debugging (with no ruby skills) turned out that s3try.rb seems to be the place where things go wrong. The method/function "S3try" handles the connection and the actual transfer of data (I think...). Looking at the code I noticed that the retries are handled in a while loop. However the code which seems to create a connection to S3 is outside this loop at the beginning of S3try. If a connection error (e.g. Broken Pipe) happens the connection does not get reset/rebuild and all other retries of course fail too. It doesn't matter how high you set the retry counter... So I thought it might be good idea to include the connection code inside the loop and let it only run if needed (e.g. broken pipe). All I did was move the line while $S3syncRetriesLeft > 0 do up just below the S3try declaration and added a line which sets forceRetry to false. The if statement is modified to also let a new connection happen if something bad happened before (e.g. a broken pipe resulted in forceRetry=true) So the S3try method now starts like this: def S3sync.S3try(command, bucket, *args) forceRetry=false while $S3syncRetriesLeft > 0 do if(not $S3syncHttp or (bucket != $S3syncLastBucket) or forceRetry==true) $stderr.puts "Creating new connection" if $S3syncOptions['--debug'] $S3syncLastBucket = bucket $S3syncHttp = $S3syncConnection.make_http(bucket) end I'm uploading for some time now, 3 broken pipes so far but connection was re-created and upload continues. So at least for me this problem is no longer - I hope.... Title: Re: Broken Pipe: solved? Post by: ferrix on December 12, 2007, 04:03:39 PM In my experience broken pipe was often as not on the local side.. and having the same connection object ought not to prevent it from REconnecting. But I'll see if I can make it start with more of a "clean slate" in error conditions.
Title: Re: Broken Pipe: solved? Post by: frankholdem on December 26, 2007, 04:40:34 AM Hey dodo,
are you still finding that your solution has helped the broken pipe issue? I'm also seeing many failures due to broken pipe so I'm thinking of giving your solution a try. - cheers Frank Title: Re: Broken Pipe: solved? Post by: dodo on December 27, 2007, 03:56:40 PM There is only one instance where s3sync fails for me and that is when a DNS query fails, seems like s3try doesn't catch that kind of exception. But that happens almost never so I don't bother.
But broken pipes are no longer a problem for me... for me my hack works and I consider this solved. Title: Re: Broken Pipe: solved? Post by: frankholdem on December 28, 2007, 08:51:04 PM There is only one instance where s3sync fails for me and that is when a DNS query fails, seems like s3try doesn't catch that kind of exception. But that happens almost never so I don't bother. But broken pipes are no longer a problem for me... for me my hack works and I consider this solved. Thx Dodo, I'm going to give your hack a try as I've been having a fair share of these broken pipes. I'll report back later on whether my results concur with yours. Title: Re: Broken Pipe: solved? Post by: faris on December 29, 2007, 11:21:48 AM I'm struggling with this problem too.
I've discovered something interesting though. I'm not sure how useful it might be but I thought I'd post it anyway. Yesterday I was able to s3sync just over 10Gb of files with no problem into an EU bucket. Overnight some additional data was added to this fileset so I went to s3sync to get the differences onto S3 and lo and behold I was hit by the EOF/broken pipe issue. If I use a different prefix I don't get the errors. e.g. this is what I used yesterday: s3sync.rb -r --progress /home/faris/totalbackup/ s3eu:totalbackup The same command today results in the EOF/broken pipe issue But if I do this instead, with tb as the prefix instead of totalbackup: s3sync.rb -n -r --progress /home/faris/totalbackup/ s3eu:tb It works fine. But I'm only mentioning this as an aside. What I really think is interesting comes later in my post... All of the above is without the code modification mentioned earlier in the thread. With the code modification the problem is resolved but not quite in the way I expected. Forgive me if I'm giving too much detail, but I'm hoping that it might help find the actual cause of the issue. Essentially I have a 7 day backup cycle, with a full backup on day 1, and incremental backups on subsequent days. I'm backing up a directory structure similar to this: /totalbackup/bak1 file1 file2 file3 (... and a few more files) /totalbackup/bak2 file1 file2 /totalbackup/bak3 (same as bak2) No file is larger than 1GB in size, but most of them are 1GB exactly. Now to explain why I'm wasting your time explaining the file structure... Basically, if I use s3sync.rb -d -r --progress /home/faris/totalbackup/ s3eu:totalbackup (s2eu:totalbackup already contains yesterday's synch. s3try modified as mentioned in this thread) then I see that S3sync examines all the files in /bak1 with no issues and only spits out the EOF error when it starts looking at bak2 With the code modification mentioned here, instead of also giving a broken pipe error and then going round in circles going nowhere, it then continues correctly: Code: (.....) local node object init. Name:bak2/file1 Path:/totalbackup/bak2/file1 Size:206469120 Tag:[redacted] prefix found: /bak2/ s3TreeRecurse s3eu totalbackup /bak2/ Trying command list_bucket s3eu max-keys 200 prefix totalbackup/bak2/ delimiter / with 100 retries left EOF error: end of file reached No result available 99 retries left Creating new connection Trying command list_bucket s3eu max-keys 200 prefix totalbackup/bak2/ delimiter / with 99 retries left Response code: 200 S3 item totalbackup/bak2/file1 (.....) So whatever is going wrong seems to be happening when the second list_bucket command is sent to S3? Or am I misinterpreting what -d is telling me? Title: Re: Broken Pipe: solved? Post by: ferrix on December 29, 2007, 11:08:10 PM I wonder what occurred right *before* that. It is as if the connection is being closed but we don't catch it.
Title: Re: Broken Pipe: solved? Post by: BUMan on January 03, 2008, 02:46:22 PM Hi Faris,
I have run into the same error. And I think I have have fixed it. Open the file s3try.rb and locate: rescue EOFError => e Add these 3 lines: $stderr.puts "Creating new connection" if $S3syncOptions['--debug'] And save. Try it out and report the results. It worked for me. Title: Re: Broken Pipe: solved? Post by: faris on January 04, 2008, 06:43:41 AM Thanks! I'll do it today and report back asap.
Faris. Title: Re: Broken Pipe: solved? Post by: faris on January 05, 2008, 06:10:48 PM I'm afraid something isn't quite right for me:
*I left the original modification in place as well as adding the new one * Could this be the problem? This is the point where, with no modifications, you'd get the complete failure, or with the first modification you'd get a timeout followed by 99 tries left. Unfortunately I still get the timeout as you can see. No change basically. Code: ..... prefix found: /bak2/ s3TreeRecurse mybucket totalbackup /bak2/ Trying command list_bucket mybucket max-keys 200 prefix totalbackup/bak2/ delimiter / with 100 retries left EOF error: end of file reached Creating new connection No result available 99 retries left Creating new connection Trying command list_bucket mybucket max-keys 200 prefix totalbackup/bak2/ delimiter / with 99 retries left Response code: 200 S3 item totalbackup/bak2/file1 s3 node object init. Name:bak2/file1 Path:totalbackup/bak2/file1 Size:223252480 Tag:[redacted] Yes, it is possible I added the code to the wrong place, but it looks right to me: Code: [........] forceRetry = true $stderr.puts "Connection timed out: #{e}" rescue EOFError => e # i THINK this is happening like a connection reset forceRetry = true $stderr.puts "EOF error: #{e}" $stderr.puts "Creating new connection" if $S3syncOptions['--debug'] $S3syncLastBucket = bucket $S3syncHttp = $S3syncConnection.make_http(bucket) rescue OpenSSL::SSL::SSLError => e forceRetry = true [....] Title: Re: Broken Pipe: solved? Post by: ferrix on January 06, 2008, 10:43:32 AM See http://s3sync.net/forum/index.php?topic=133.msg589#msg589
Title: Re: Broken Pipe: solved? Post by: faris on January 06, 2008, 11:44:51 AM Thank you! I'll test it as soon as it is out.
Title: Re: Broken Pipe: solved? Post by: ferrix on January 06, 2008, 11:55:50 AM Out now.
Title: Re: Broken Pipe: solved? Post by: frankholdem on January 07, 2008, 01:26:06 AM I've been running several days without any problems since implementing the 'dodo' hack. I'm going to give the new version a try now and see if I also get problem-free operation. Thanks Ferix for providing this new update.
Title: Re: Broken Pipe: solved? Post by: faris on January 07, 2008, 01:51:43 PM Well, 1.2.4 still gives a broken pipe error but recovers from it gracefully-ish :-)
Code: (....checks 10Gb worth of mostly 1Gb files in bak1 which were backed up previously....) (....gets to last file -- only a few bytes in size -- in bak1 directory then tries to go to next directory, containing some more 1Gb files) S3 item totalbackup/bak1/lastfile s3 node object init. Name:bak1/lastfile Path:totalbackup/bak1/lastfile Size:49 Tag:[reducated] source: bak1/lastfile dest: bak1/lastfile Node bak1/lastfile unchanged local item /home/me/totalbackup/bak2 local node object init. Name:bak2 Path:/home/me/totalbackup/bak2 Size:38 Tag:[redacted] source: bak2 s3 node object init. Name:bak2 Path:totalbackup/bak2 Size: Tag: Create node bak2 totalbackup/bak2 File extension: totalbackup/bak2 Trying command put mybucket totalbackup/bak2 #<S3::S3Object:0x28a2c0c4> Content-Length 38 with 100 retries left Broken pipe: Broken pipe No result available 99 retries left Trying command put mybucket totalbackup/bak2 #<S3::S3Object:0x28a2c0c4> Content-Length 38 with 99 retries left Progress: 38b 1b/s 100% Response code: 200 bak2 is a dir node localTreeRecurse /home/me/totalbackup bak2 Test /home/me/totalbackup/bak2/file1 (.....etc.....) (....correctly syncs everything....) One small wish that I'd make for 1.2.5 would be potentially to include the location of s3sync.rb in the places that s3sync uses to look for the config file. I'm having to specifically export the path as S3CONF with 1.2.4 where I didn't have to in 1.2.3. This is no big deal but it turns out the shell script I've been using to launch s3sync for my tests is not the same script that my cronjob launches, so although I had updated my test script to export S3CONF my cronjob script had not been updated so it failed to sync last night :-) Faris. Title: Re: Broken Pipe: solved? Post by: ferrix on January 07, 2008, 02:29:21 PM One small wish that I'd make for 1.2.5 would be potentially to include the location of s3sync.rb in the places that s3sync uses to look for the config file. Why not just ln -s it to one of the search paths then? I won't make the config search too promiscuous because I can see some kind of vuln where the script is fooled to look at the wrong config, and sends your secret data to someone else. This wasn't an issue prior to 1.2.4 since the script only ran correctly when you were standing in the s3sync dir. Now that it can run "from anywhere" there are extra considerations to have! Title: Re: Broken Pipe: solved? Post by: faris on January 08, 2008, 06:10:18 AM I'm only thinking of backward compatibility. You know....people who upgrade the script and don't test *everything* properly - like me :-) :-)
But yes, you are of course correct. Faris. |