sboehler
Newbie
Posts: 5
|
|
« on: August 28, 2007, 02:37:50 PM » |
|
Hello
When I s3sync a file containing any international characters (e.g. German umlauts ö ä ü, French & Spanish accents é à è...) the uploaded file listed by 's3cmd.rb list' have those characters messed up. Is this a Windows, a Ruby or an s3sync problem? It works fine under Linux.
This is quite a showstopper bug for me as I wouldn't be able to sync the same directory alternatively under Windows and Linux (and renaming my files is not an option).
Otherwise kudos to the developer, it's a great little tools!
Cheers Martin
|
|
|
Logged
|
|
|
|
sboehler
Newbie
Posts: 5
|
|
« Reply #1 on: August 28, 2007, 02:39:20 PM » |
|
ps: The problem is when the FILENAME contains special characters...file contents are not affected...sorry
|
|
|
Logged
|
|
|
|
ferrix
|
|
« Reply #2 on: August 28, 2007, 11:43:31 PM » |
|
Send me a zip file containing a directory structure and files that cause this problem (contact info is in the README) and I'll find some time to take a look at it. Character encoding issues are supposed to be "solved" already.
|
|
|
Logged
|
|
|
|
sboehler
Newbie
Posts: 5
|
|
« Reply #3 on: August 30, 2007, 03:42:48 AM » |
|
Hello Ferrit
I just run another testcase and I found the problem isn't as bad as I thought. Actually if I verify the uploaded files using the cockpit application from jetS3t, I find that the filenames have been uploaded correctly by s3sync.
What confused me was that the output of s3sync.rb and s3cmd.rb on the windows command line is garbled. For example, see this example with two files, föö.txt and bär.txt containing umlauts in the filename:
C:\Silvio\Tools\s3sync>dir myfolder Datenträger in Laufwerk C: ist Local 80G Datenträgernummer: A4AD-1847
Verzeichnis von C:\Silvio\Tools\s3sync\myfolder
30.08.2007 10:11 <DIR> . 30.08.2007 10:11 <DIR> .. 30.08.2007 10:11 18 bär.txt 30.08.2007 10:10 20 föö.txt 2 Datei(en) 38 Bytes 2 Verzeichnis(se), 13'156'741'120 Bytes frei
Now, using s3sync, in the output they are garbled:
C:\Silvio\Tools\s3sync>ruby s3sync.rb -r --progress myfolder silvio:test Create node Create node bõr.txt Create node f÷÷.txt
Same if I use s3cmd to list the bucket:
C:\Silvio\Tools\s3sync>ruby s3cmd.rb list silvio:test -------------------- test/myfolder test/myfolder/bõr.txt test/myfolder/f÷÷.txt
However, using jetS3t cockpit, I can verify that indeed the filenames are correct. And trying to sync again, s3sync will correctly refuse to upload the files again, as they already exist. And if I sync them from my S3 bucket to my local directory, the filenames are correct again as well.
So the problem is really just a cosmetical one and doesn't affect the functionality. But I still wonder what the issue is, maybe ruby's encoding settings are somehow confused? I'll be sending you my testfiles in case.
Cheers & thanks for your efforts!
Martin
|
|
|
Logged
|
|
|
|
ferrix
|
|
« Reply #4 on: August 30, 2007, 06:26:01 AM » |
|
Just to get this out of the way... are you setting S3SYNC_NATIVE_CHARSET correctly? On windows the default value isn't correct and should probably be set to "Windows-1252"
|
|
|
Logged
|
|
|
|
sboehler
Newbie
Posts: 5
|
|
« Reply #5 on: August 30, 2007, 11:18:58 AM » |
|
I just tried it and set the variable:
set S3SYNC_NATIVE_CHARSET=Windows-1252
But I still get garblet output e.g. when listing a bucket containing files with special characters.
|
|
|
Logged
|
|
|
|
ferrix
|
|
« Reply #6 on: August 30, 2007, 02:34:55 PM » |
|
Did you start over and re-send the files to S3? If you used the wrong character set before, the files on S3 will remain wrong!
|
|
|
Logged
|
|
|
|
sboehler
Newbie
Posts: 5
|
|
« Reply #7 on: August 31, 2007, 09:38:00 AM » |
|
I did a few more tests. I used the cockpit tool from jetS3t as a reference to determine whether the files have been uploaded correctly.
On Linux my charset is UTF-8, and it seems the default charset for s3sync is something else. If I don't set S3SYNC_NATIVE_CHARSET to utf-8 the characters on the remote files in cockpit become garbled. s3cmd, on the other hand, automatically takes back the 'garbling' and shows the correct files - this is why I assumed that on Linux it works automatically. But it doesn't: it needs the charset to be set to utf-8 explicitly if the files should be correct for other applications than s3sync/s3cmd.
On Windows funnily the special characters are handled correctly, even without setting S3SYNC_NATIVE_CHARSET. So a file called öäüàéè on my local drive will show ungarbled in cockpit. So internally everything is fine, but there is a problem with the output. In my cmd console, even though it shows local file names correctly, output from s3sync/s3cmd is garbled. And this seems not to be affected by S3SYNC_NATIVE_CHARSET. So indeed, the bug is just cosmetical, and s3sync/s3cmd is using the wrong charset for console output.
Is that something determined by the local ruby installation or can this be corrected from within s3sync? I don't know Ruby, any help is apreciated.
Many thanks, Martin
ps: I assume during all my test that jetS3t cockpit does it right, as a benchmark...I hope this is justified
|
|
|
Logged
|
|
|
|
ferrix
|
|
« Reply #8 on: August 31, 2007, 12:30:19 PM » |
|
I don't know of a way to guess the correct native character set, or else I would have put that in instead of making it a setting. Anyone else is welcome to help if they know better.
(I'm not a ruby expert either!)
|
|
|
Logged
|
|
|
|
|