S3Sync.net
February 02, 2014, 01:20:22 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
   Home   Help Search Login Register  
Pages: [1] 2
  Print  
Author Topic: Broken Pipe: solved?  (Read 28569 times)
dodo
Newbie
*
Posts: 2


View Profile
« on: December 12, 2007, 01:57:32 PM »

Hi folks,

I ran into the "Broken pipe" problem. Trying to upload 20 GB of ~1 MB pictures constantly failed after a while with "broken pipe" and the retries simply counted down to 0 with no effect.

A bit of debugging (with no ruby skills) turned out that s3try.rb seems to be the place where things go wrong.

The method/function "S3try" handles the connection and the actual transfer of data (I think...).

Looking at the code I noticed that the retries are handled in a while loop.
However the code which seems to create a connection to S3 is outside this loop at the beginning of S3try.

If a connection error (e.g. Broken Pipe) happens the connection does not get reset/rebuild and all other retries of course fail too. It doesn't matter how high you set the retry counter...

So I thought it might be good idea to include the connection code inside the loop and let it only run if needed (e.g. broken pipe).

All I did was move the line

 while $S3syncRetriesLeft > 0 do

up just below the S3try declaration and added a line which sets forceRetry to false.
The if statement is modified to also let a new connection happen if something bad happened before (e.g. a broken pipe resulted in forceRetry=true)

So the S3try method now starts like this:

def S3sync.S3try(command, bucket, *args)

     forceRetry=false
     while $S3syncRetriesLeft > 0 do
      if(not $S3syncHttp or (bucket != $S3syncLastBucket) or forceRetry==true)
         $stderr.puts "Creating new connection" if $S3syncOptions['--debug']
         $S3syncLastBucket = bucket
         $S3syncHttp = $S3syncConnection.make_http(bucket)
      end


I'm uploading for some time now, 3 broken pipes so far but connection was re-created and upload continues.
So at least for me this problem is no longer - I hope....

Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #1 on: December 12, 2007, 04:03:39 PM »

In my experience broken pipe was often as not on the local side.. and having the same connection object ought not to prevent it from REconnecting.  But I'll see if I can make it start with more of a "clean slate" in error conditions.
Logged
frankholdem
Newbie
*
Posts: 9


View Profile
« Reply #2 on: December 26, 2007, 04:40:34 AM »

Hey dodo,
  are you still finding that your solution has helped the broken pipe issue? I'm also seeing
many failures due to broken pipe so I'm thinking of giving your solution a try.

- cheers
Frank
Logged
dodo
Newbie
*
Posts: 2


View Profile
« Reply #3 on: December 27, 2007, 03:56:40 PM »

There is only one instance where s3sync fails for me and that is when a DNS query fails, seems like s3try doesn't catch that kind of exception. But that happens almost never so I don't bother.

But broken pipes are no longer a problem for me...  for me my hack works and I consider this solved.
Logged
frankholdem
Newbie
*
Posts: 9


View Profile
« Reply #4 on: December 28, 2007, 08:51:04 PM »

There is only one instance where s3sync fails for me and that is when a DNS query fails, seems like s3try doesn't catch that kind of exception. But that happens almost never so I don't bother.

But broken pipes are no longer a problem for me...  for me my hack works and I consider this solved.

Thx Dodo, I'm going to give your hack a try as I've been having a fair share of these broken pipes. I'll report back later on whether my results concur with yours.
Logged
faris
Newbie
*
Posts: 6


View Profile
« Reply #5 on: December 29, 2007, 11:21:48 AM »

I'm struggling with this problem too.

I've discovered something interesting though. I'm not sure how useful it might be but I thought I'd post it anyway.

Yesterday I was able to s3sync just over 10Gb of files with no problem into an EU bucket.

Overnight some additional data was added to this fileset so I went to s3sync to get the differences onto S3 and lo and behold I was hit by the EOF/broken pipe issue.

If I use a different prefix I don't get the errors.

e.g. this is what I used yesterday:

s3sync.rb -r --progress /home/faris/totalbackup/  s3eu:totalbackup

The same command today results in the EOF/broken pipe issue

But if I do this instead, with tb as the prefix instead of totalbackup:

s3sync.rb -n -r --progress /home/faris/totalbackup/  s3eu:tb

It works fine. But I'm only mentioning this as an aside. What I really think is interesting comes later in my post...

All of the above is without the code modification mentioned earlier in the thread.

With the code modification the problem is resolved but not quite in the way I expected.

Forgive me if I'm giving too much detail, but I'm hoping that it might help find the actual cause of the issue.

Essentially I have a 7 day backup cycle, with a full backup on day 1, and incremental backups on subsequent days.

I'm backing up a directory structure similar to this:
/totalbackup/bak1
file1
file2
file3
(... and a few more files)

/totalbackup/bak2
file1
file2

/totalbackup/bak3
(same as bak2)

No file is larger than 1GB in size, but most of them are 1GB exactly.

Now to explain why I'm wasting your time explaining the file structure...

Basically, if I use

s3sync.rb -d -r --progress /home/faris/totalbackup/  s3eu:totalbackup

(s2eu:totalbackup already contains yesterday's synch. s3try modified as mentioned in this thread)

then I see that S3sync examines all the files in /bak1 with no issues and only spits out the EOF error when it starts looking at bak2

With the code modification mentioned here, instead of also giving a broken pipe error and then going round in circles going nowhere, it then continues correctly:

Code:
(.....)
local node object init. Name:bak2/file1 Path:/totalbackup/bak2/file1 Size:206469120 Tag:[redacted]
prefix found: /bak2/
s3TreeRecurse s3eu totalbackup /bak2/
Trying command list_bucket s3eu max-keys 200 prefix totalbackup/bak2/ delimiter / with 100 retries left
EOF error: end of file reached
No result available
99 retries left
Creating new connection
Trying command list_bucket s3eu max-keys 200 prefix totalbackup/bak2/ delimiter / with 99 retries left
Response code: 200
S3 item totalbackup/bak2/file1
(.....)

So whatever is going wrong seems to be happening when the second list_bucket command is sent to S3?
Or am I misinterpreting what -d is telling me?

« Last Edit: December 29, 2007, 11:38:23 AM by faris » Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #6 on: December 29, 2007, 11:08:10 PM »

I wonder what occurred right *before* that.  It is as if the connection is being closed but we don't catch it.
Logged
BUMan
Newbie
*
Posts: 1


View Profile
« Reply #7 on: January 03, 2008, 02:46:22 PM »

Hi Faris,

I have run into the same error. And I think I have have fixed it. Open the file s3try.rb and locate:
                        rescue EOFError => e
                                # i THINK this is happening like a connection reset
                                forceRetry = true
                                $stderr.puts "EOF error: #{e}"


Add these 3 lines:
                                $stderr.puts "Creating new connection" if $S3syncOptions['--debug']
                                $S3syncLastBucket = bucket
                                $S3syncHttp = $S3syncConnection.make_http(bucket)

And save.

Try it out and report the results. It worked for me.
Logged
faris
Newbie
*
Posts: 6


View Profile
« Reply #8 on: January 04, 2008, 06:43:41 AM »

Thanks! I'll do it today and report back asap.

Faris.
Logged
faris
Newbie
*
Posts: 6


View Profile
« Reply #9 on: January 05, 2008, 06:10:48 PM »

I'm afraid something isn't quite right for me:

*I left the original modification in place as well as adding the new one * Could this be the problem?

This is the point where, with no modifications, you'd get the complete failure, or with the first modification you'd get a timeout followed by 99 tries left.

Unfortunately I still get the timeout as you can see. No change basically.


Code:
.....
prefix found: /bak2/
s3TreeRecurse mybucket totalbackup /bak2/
Trying command list_bucket mybucket max-keys 200 prefix totalbackup/bak2/ delimiter / with 100 retries left
EOF error: end of file reached
Creating new connection
No result available
99 retries left
Creating new connection
Trying command list_bucket mybucket max-keys 200 prefix totalbackup/bak2/ delimiter / with 99 retries left
Response code: 200
S3 item totalbackup/bak2/file1
s3 node object init. Name:bak2/file1 Path:totalbackup/bak2/file1 Size:223252480 Tag:[redacted]


Yes, it is possible I added the code to the wrong place, but it looks right to me:

Code:
[........]
                                forceRetry = true
                                $stderr.puts "Connection timed out: #{e}"
                       rescue EOFError => e
                                # i THINK this is happening like a connection reset
                                forceRetry = true
                                $stderr.puts "EOF error: #{e}"
                                $stderr.puts "Creating new connection" if $S3syncOptions['--debug']
                                $S3syncLastBucket = bucket
                                $S3syncHttp = $S3syncConnection.make_http(bucket)
                         rescue OpenSSL::SSL::SSLError => e
                                forceRetry = true
[....]
Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #10 on: January 06, 2008, 10:43:32 AM »

See http://s3sync.net/forum/index.php?topic=133.msg589#msg589
Logged
faris
Newbie
*
Posts: 6


View Profile
« Reply #11 on: January 06, 2008, 11:44:51 AM »

Thank you! I'll test it as soon as it is out.
Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #12 on: January 06, 2008, 11:55:50 AM »

Out now.
Logged
frankholdem
Newbie
*
Posts: 9


View Profile
« Reply #13 on: January 07, 2008, 01:26:06 AM »

I've been running several days without any problems since implementing the 'dodo' hack. I'm going to give the new version a try now and see if I also get problem-free operation. Thanks Ferix for providing this new update.



Logged
faris
Newbie
*
Posts: 6


View Profile
« Reply #14 on: January 07, 2008, 01:51:43 PM »

Well, 1.2.4 still gives a broken pipe error but recovers from it gracefully-ish :-)


Code:

(....checks 10Gb worth of mostly 1Gb files in bak1 which were backed up previously....)

(....gets to last file  -- only a few bytes in size -- in bak1 directory then tries to go to next directory, containing some more 1Gb files)

S3 item totalbackup/bak1/lastfile
s3 node object init. Name:bak1/lastfile Path:totalbackup/bak1/lastfile Size:49 Tag:[reducated]
source: bak1/lastfile
dest: bak1/lastfile
Node bak1/lastfile unchanged
local item /home/me/totalbackup/bak2
local node object init. Name:bak2 Path:/home/me/totalbackup/bak2 Size:38 Tag:[redacted]
source: bak2
s3 node object init. Name:bak2 Path:totalbackup/bak2 Size: Tag:
Create node bak2
totalbackup/bak2
File extension: totalbackup/bak2
Trying command put mybucket totalbackup/bak2 #<S3::S3Object:0x28a2c0c4> Content-Length 38 with 100 retries left
Broken pipe: Broken pipe
No result available
99 retries left
Trying command put mybucket totalbackup/bak2 #<S3::S3Object:0x28a2c0c4> Content-Length 38 with 99 retries left
Progress: 38b  1b/s  100%       Response code: 200

bak2 is a dir node
localTreeRecurse /home/me/totalbackup bak2
Test /home/me/totalbackup/bak2/file1

(.....etc.....)

(....correctly syncs everything....)


One small wish that I'd make for 1.2.5 would be potentially to include the location of s3sync.rb in the places that s3sync uses to look for the config file. I'm having to specifically export the path as S3CONF with 1.2.4 where I didn't have to in 1.2.3. This is no big deal but it turns out the shell script I've been using to launch s3sync for my tests is not the same script that my cronjob launches, so although I had updated my test script to export S3CONF my cronjob script had not been updated so it failed to sync last night :-)

Faris.
Logged
Pages: [1] 2
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!