s3sync froze on connection reset

billg

Newbie

Posts: 1

« on: April 02, 2007, 01:11:59 PM »

Hi.

Below is the tail end of the error output of an update I was doing. The update had been running for over 12 hours and had uploaded 91GB of data, out of about 500GB.

Once it wrote the last bit about "Connection reset by peer," it appears that nothing happened beyond that point. I checked for several hours: no new records were written, and the Amazon account activity transfer billing counter did not increase. The process ran until I killed it. It started up again fine.

Bill

S3 command failed:
put BL-MC-PDF /PDF/isfnxf/B15540/20070319/50B25BEADDA6477A904BDC6927AFB61C.pdf #
<S3::S3Object:0x89fb034> Content-Type application/pdf Content-Length 421033
With result 500 Internal Server Error
52 retries left
S3 command failed:
put BL-MC-PDF /PDF/isfnxf/B1559XA/20070322/B1559XA_39486.pdf #<S3::S3Object:0x89
b6e98> Content-Type application/pdf Content-Length 278984
With result 500 Internal Server Error
51 retries left
S3 command failed:
put BL-MC-PDF /PDF/isfnxf/B15689NL/20070321/B15689NL_8686.pdf #<S3::S3Object:0x8
a2e6b4> Content-Type application/pdf Content-Length 289111
With result 500 Internal Server Error
50 retries left
Connection reset: Connection reset by peer
49 retries left


	Logged

ferrix

Sr. Member

Posts: 363

(I am greg13070 on AWS forum)

Re: s3sync froze on connection reset

« Reply #1 on: June 03, 2007, 12:08:35 AM »

I have no idea.. never seen this nor been able to reproduce it.

If you can come up with circumstances that I can test, then let me know.

You could try running with -d to see if it's doing something in that time; but it doesn't sound like it.

Based on how old this ticket is, I assume you're probably not watching this any more or using s3sync. But let me know if I can help.


	Logged

gcc

Newbie

Posts: 4

Re: s3sync froze on connection reset

« Reply #2 on: September 10, 2007, 10:31:07 AM »

Hi Ferrix,

I've just started using S3 and I'm trying to upload 1.3GB of data using s3sync. At the moment it fails every time on the same file just after 1GB.

I'm running it like this:

./s3sync.rb -r /var/www/vhosts/ewb-uk.org/ 18XYRMTQ7PRP5J1B6SR2.EWB_Server_Backup:www.ewb-uk.org

and the output looks like this, after scanning through almost all my files:

local item /var/www/vhosts/ewb-uk.org/statistics/logs/access_log.processed local node object init. Name:statistics/logs/access_log.processed Path:/var/www/vhosts/ewb-uk.org/statistics/logs/access_log.processed Size:188287179 Tag:9f1a59d3c21ef904387651e16ea4e3b1 S3 item www.ewb-uk.org/statistics/logs/access_log.processed s3 node object init. Name:statistics/logs/access_log.processed Path:www.ewb-uk.org/statistics/logs/access_log.processed Size:188287179 Tag:9f1a59d3c21ef904387651e16ea4e3b1 source: statistics/logs/access_log.processed dest: statistics/logs/access_log.processed Node statistics/logs/access_log.processed unchanged local item /var/www/vhosts/ewb-uk.org/statistics/logs/error_log local node object init. Name:statistics/logs/error_log Path:/var/www/vhosts/ewb-uk.org/statistics/logs/error_log Size:57468411 Tag:48fd54f4ee01b9e3efb9b6fb468b69fb S3 item www.ewb-uk.org/statistics/logs/error_log s3 node object init. Name:statistics/logs/error_log Path:www.ewb-uk.org/statistics/logs/error_log Size:57466964 Tag:efe567d8a6a1e81d5f499b63588da973 source: statistics/logs/error_log dest: statistics/logs/error_log Update node statistics/logs/error_log www.ewb-uk.org/statistics/logs/error_log File extension: org/statistics/logs/error_log Trying command put 18XYRMTQ7PRP5J1B6SR2.EWB_Server_Backup www.ewb-uk.org/statistics/logs/error_log #<S3::S3Object:0x2a973ced08> Content-Length 57468411 with 100 retries left Broken pipe: Broken pipe No result available 99 retries left Trying command put 18XYRMTQ7PRP5J1B6SR2.EWB_Server_Backup www.ewb-uk.org/statistics/logs/error_log #<S3::S3Object:0x2a973ced08> Content-Length 57468411 with 99 retries left /usr/lib/ruby/1.8/net/protocol.rb:175:in `write': Interrupt from /usr/lib/ruby/1.8/net/protocol.rb:175:in `write0' from /usr/lib/ruby/1.8/net/protocol.rb:151:in `write' from /usr/lib/ruby/1.8/net/protocol.rb:166:in `writing' from /usr/lib/ruby/1.8/net/protocol.rb:150:in `write' from /usr/lib/ruby/1.8/net/http.rb:1542:in `write_header' from /usr/lib/ruby/1.8/net/http.rb:1523:in `send_request_with_body_stream' from /usr/lib/ruby/1.8/net/http.rb:1498:in `exec' from /usr/lib/ruby/1.8/net/http.rb:1044:in `_HTTPStremaing_request' from ./HTTPStreaming.rb:43:in `request' from ./S3_s3sync_mod.rb:50:in `make_request' from ./S3.rb:152:in `put' from ./s3try.rb:57:in `S3try' from ./s3sync.rb:513:in `updateFrom' from ./s3sync.rb:395:in `main' from ./s3sync.rb:708

The backtrace is where I killed it after 20 minutes of doing nothing.

I know Broken Pipe is not quite the same as Connection Reset, but I've been getting those as well, with the same symptoms, so perhaps this will help.

I'd appreciate any help you can give me in fixing this. I'd be happy to debug and test patches, but I don't want to run s3sync too many times at the moment because it executes a lot of LIST commands each time it runs, and I get charged for those.

Cheers, Chris.


	Logged

ferrix

Sr. Member

Posts: 363

(I am greg13070 on AWS forum)

Re: s3sync froze on connection reset

« Reply #3 on: September 10, 2007, 11:15:47 AM »

So far, transfer stalls have never been caused by something in the s3 or s3sync code. If you want to put the offending file up somewhere for me to pull and test with, let me know (you can find my contact info in README).


	Logged

gcc

Newbie

Posts: 4

Re: s3sync froze on connection reset

« Reply #4 on: September 11, 2007, 10:10:57 AM »

Hi Ferrix,

Thanks for the suggestion. I think you're right. Turns out that I can't reproduce the problem on a static copy of the data, but I can if the file is being modified while it's being uploaded, e.g. the live error_log on a site, or in my tests, simply appending data to the file while syncing.

The file upload appears to complete, but further attempts to access S3 all fail.

I saw another comment about this in the forums here: http://s3sync.net/forum/index.php?topic=53.0. I think that extra handling for this case would be useful, as it's not at all clear to users what's going on.

My guess is that you check the file size more than once, and if it's changing fast then you get different values which confuses s3sync and causes it to break the HTTP protocol by sending more data than it's supposed to.

Could it be that the extra data ends up locked in the HTTPStreaming output buffer, trying to be sent before any subsequent request, which causes all subsequent requests to fail?

Actually I think (at least one) culprit may be /usr/lib/ruby/1.8/net/http.rb lines ~ 1523:

write_header sock, ver, path
if chunked?
while s = f.read(1024)
sock.write(sprintf("%x\r\n", s.length) << s << "\r\n")
end
sock.write "0\r\n\r\n"
else
while s = f.read(1024)
sock.write s
end
end

This makes no attempt to verify that the content_length() matches the current length of the file. Amazon doesn't support Chunked encoding either (I tried it) so the only option seems to be to fix this method.

Cheers, Chris.


	Logged

ferrix

Sr. Member

Posts: 363

(I am greg13070 on AWS forum)

Re: s3sync froze on connection reset

« Reply #5 on: September 11, 2007, 10:52:59 AM »

I would make no guarantees about files that change. We don't use any kind of locking mechanism, so it's not sufficient to "re-check" the file size or something; if you are backing up things that are in motion, you need something smarter than this system!


	Logged

gcc

Newbie

Posts: 4

Re: s3sync froze on connection reset

« Reply #6 on: September 11, 2007, 11:12:54 AM »

Hi Ferrix,

For log files it's perfectly fine to truncate them without locking them, as they're only appended to.

I agree with the other thread that locking is a nightmare. I don't agree that the solution is to do nothing at all. Detecting the situation and failing with a polite error message, preferably with the option to override it for users who know what they're doing, seems much better to me than the behaviour I saw (which admittedly was a bug in Ruby rather than in s3sync).

What percentage of s3sync users would be able to track this down by themselves in the current state?

Cheers, Chris.


	Logged

ferrix

Sr. Member

Posts: 363

(I am greg13070 on AWS forum)

Re: s3sync froze on connection reset

« Reply #7 on: September 11, 2007, 10:06:19 PM »

Quote from: gcc on September 11, 2007, 11:12:54 AM

What percentage of s3sync users would be able to track this down by themselves in the current state?

I try to catch errors on my side and print out reasonable messages. But I'm not going to make a deadman switch timer that checks for stalls inside ruby, that's just over the top imo. As for file size changes and active file handling, locking is really the only way to fix it, and that assumes that the same manner of lock used is honored by all the things accessing the files. It's a problem that's bigger than a breadbox, and there's just no way I'm going to tackle it.

Your implication is that there's some simple way to catch this and error out, but I think it's a bit of a naive assessment. Each time between the check and the access, the answer could change. You're never going to fix it without some kind of synchronization.

One "right" answer, in some sense, would be to anticipate files that are likely to be open, and exclude them.


	Logged

gcc

Newbie

Posts: 4

Re: s3sync froze on connection reset

« Reply #8 on: September 12, 2007, 02:06:33 PM »

Hi ferrix,

Thanks, I'm trying to persuade the net/http library authors to throw an exception if the file size changes. If that's done, you could catch that exception and retry the upload.

I don't think either file locking or guessing which files might be changing is going to work all the time, and I'd like this to be reliable.

Cheers, Chris.


	Logged

Pages: [1]

« previous next »