S3Sync.net
February 02, 2014, 01:26:07 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Special characters (e.g. éàèüäö) not handled correctly under Windows  (Read 7178 times)
sboehler
Newbie
*
Posts: 5


View Profile
« on: August 28, 2007, 02:37:50 PM »

Hello

When I s3sync a file containing any international characters (e.g. German umlauts ö ä ü, French & Spanish accents é à è...) the uploaded file listed by 's3cmd.rb list' have those characters messed up. Is this a Windows, a Ruby or an s3sync problem? It works fine under Linux.

This is quite a showstopper bug for me as I wouldn't be able to sync the same directory alternatively under Windows and Linux (and renaming my files is not an option).

Otherwise kudos to the developer, it's a great little tools!

Cheers
Martin
Logged
sboehler
Newbie
*
Posts: 5


View Profile
« Reply #1 on: August 28, 2007, 02:39:20 PM »

ps: The problem is when the FILENAME contains special characters...file contents are not affected...sorry
Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #2 on: August 28, 2007, 11:43:31 PM »

Send me a zip file containing a directory structure and files that cause this problem (contact info is in the README) and I'll find some time to take a look at it.  Character encoding issues are supposed to be "solved" already.
Logged
sboehler
Newbie
*
Posts: 5


View Profile
« Reply #3 on: August 30, 2007, 03:42:48 AM »

Hello Ferrit

I just run another testcase and I found the problem isn't as bad as I thought. Actually if I verify the uploaded files using the cockpit application from jetS3t, I find that the filenames have been uploaded correctly by s3sync.

What confused me was that the output of s3sync.rb and s3cmd.rb on the windows command line is garbled. For example, see this example with two files, föö.txt and bär.txt containing umlauts in the filename:

C:\Silvio\Tools\s3sync>dir myfolder
 Datenträger in Laufwerk C: ist Local 80G
 Datenträgernummer: A4AD-1847

 Verzeichnis von C:\Silvio\Tools\s3sync\myfolder

30.08.2007  10:11       <DIR>          .
30.08.2007  10:11       <DIR>          ..
30.08.2007  10:11                   18 bär.txt
30.08.2007  10:10                   20 föö.txt
               2 Datei(en)             38 Bytes
               2 Verzeichnis(se),  13'156'741'120 Bytes frei


Now, using s3sync, in the output they are garbled:

C:\Silvio\Tools\s3sync>ruby s3sync.rb -r --progress myfolder silvio:test
Create node
Create node bõr.txt
Create node f÷÷.txt


Same if I use s3cmd to list the bucket:

C:\Silvio\Tools\s3sync>ruby s3cmd.rb list silvio:test
--------------------
test/myfolder
test/myfolder/bõr.txt
test/myfolder/f÷÷.txt


However, using jetS3t cockpit, I can verify that indeed the filenames are correct. And trying to sync again, s3sync will correctly refuse to upload the files again, as they already exist. And if I sync them from my S3 bucket to my local directory, the filenames are correct again as well.

So the problem is really just a cosmetical one and doesn't affect the functionality. But I still wonder what the issue is, maybe ruby's encoding settings are somehow confused? I'll be sending you my testfiles in case.

Cheers & thanks for your efforts!

Martin





Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #4 on: August 30, 2007, 06:26:01 AM »

Just to get this out of the way... are you setting S3SYNC_NATIVE_CHARSET correctly?  On windows the default value isn't correct and should probably be set to "Windows-1252"
Logged
sboehler
Newbie
*
Posts: 5


View Profile
« Reply #5 on: August 30, 2007, 11:18:58 AM »

I just tried it and set the variable:

set S3SYNC_NATIVE_CHARSET=Windows-1252

But I still get garblet output e.g. when listing a bucket containing files with special characters.
Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #6 on: August 30, 2007, 02:34:55 PM »

Did you start over and re-send the files to S3?  If you used the wrong character set before, the files on S3 will remain wrong!
Logged
sboehler
Newbie
*
Posts: 5


View Profile
« Reply #7 on: August 31, 2007, 09:38:00 AM »

I did a few more tests. I used the cockpit tool from jetS3t as a reference to determine whether the files have been uploaded correctly.

On Linux my charset is UTF-8, and it seems the default charset for s3sync is something else. If I don't set S3SYNC_NATIVE_CHARSET to utf-8 the characters on the remote files in cockpit become garbled. s3cmd, on the other hand, automatically takes back the 'garbling' and shows the correct files - this is why I assumed that on Linux it works automatically. But it doesn't: it needs the charset to be set to utf-8 explicitly if the files should be correct for other applications than s3sync/s3cmd.

On Windows funnily the special characters are handled correctly, even without setting S3SYNC_NATIVE_CHARSET. So a file called öäüàéè on my local drive will show ungarbled in cockpit. So internally everything is fine, but there is a problem with the output. In my cmd console, even though it shows local file names correctly, output from s3sync/s3cmd is garbled. And this seems not to be affected by S3SYNC_NATIVE_CHARSET. So indeed, the bug is just cosmetical, and s3sync/s3cmd is using the wrong charset for console output.

Is that something determined by the local ruby installation or can this be corrected from within s3sync? I don't know Ruby, any help is apreciated.

Many thanks, Martin

ps: I assume during all my test that jetS3t cockpit does it right, as a benchmark...I hope this is justified
Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #8 on: August 31, 2007, 12:30:19 PM »

I don't know of a way to guess the correct native character set, or else I would have put that in instead of making it a setting.  Anyone else is welcome to help if they know better.

(I'm not a ruby expert either!)
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!