S3Sync.net
February 02, 2014, 01:35:02 PM *
Welcome, Guest. Please login or register.

Login with username, password and session length
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: translation of node names on Mac causes sync to mis-recognize identical files  (Read 3343 times)
bangpound
Newbie
*
Posts: 1


View Profile
« on: December 19, 2007, 12:15:18 PM »

On the Mac's HFS+ filesystem (the default), the character set is UTF-8 but characters are decomposed. I don't know much about it yet, but in Unicode calls this "Normalization Form D." The HTTP standards require "Normalization Form C" (composed). In my situation, the UTF-8 character (latin small letter e acute) in my local filename is displayed on s3sync.rb's verbose and debug output as but it is saved in S3 as é (that's three characters: latin small letter e, latin capital letter I with grave, and unicode character 129 [0x81]).

This causes the comparisons to always fail, and the files are always re-synced unnecessarily (and the existing mis-named copies are deleted if s3sync.rb's --delete option is on).

Apple has patched libiconv to allow for a character set called UTF-8-MAC that is supposed to handle this situation. However, when I set S3SYNC_NATIVE_CHARSET=UTF-8-MAC, I still don't get the right results.

I'll try to do my best to work up a patch, but I am not a Ruby programmer! Let me know if you have any ideas or need more information.
Logged
ferrix
Sr. Member
****
Posts: 363


(I am greg13070 on AWS forum)


View Profile
« Reply #1 on: December 21, 2007, 01:53:21 PM »

I don't have any idea what to do about it.  The char code stuff is all a black box ruby lib that I call into.
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.19 | SMF © 2013, Simple Machines Valid XHTML 1.0! Valid CSS!