Title: translation of node names on Mac causes sync to mis-recognize identical files Post by: bangpound on December 19, 2007, 12:15:18 PM On the Mac's HFS+ filesystem (the default), the character set is UTF-8 but characters are decomposed. I don't know much about it yet, but in Unicode calls this "Normalization Form D." The HTTP standards require "Normalization Form C" (composed). In my situation, the UTF-8 character (latin small letter e acute) in my local filename is displayed on s3sync.rb's verbose and debug output as but it is saved in S3 as é (that's three characters: latin small letter e, latin capital letter I with grave, and unicode character 129 [0x81]).
This causes the comparisons to always fail, and the files are always re-synced unnecessarily (and the existing mis-named copies are deleted if s3sync.rb's --delete option is on). Apple has patched libiconv to allow for a character set called UTF-8-MAC that is supposed to handle this situation. However, when I set S3SYNC_NATIVE_CHARSET=UTF-8-MAC, I still don't get the right results. I'll try to do my best to work up a patch, but I am not a Ruby programmer! Let me know if you have any ideas or need more information. Title: Re: translation of node names on Mac causes sync to mis-recognize identical file Post by: ferrix on December 21, 2007, 01:53:21 PM I don't have any idea what to do about it. The char code stuff is all a black box ruby lib that I call into.
|