[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-tar] star <-> GNU tar interchange issue
From: |
Nathan Stratton Treadway |
Subject: |
Re: [Bug-tar] star <-> GNU tar interchange issue |
Date: |
Mon, 25 Mar 2013 01:00:36 -0400 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
On Sat, Mar 23, 2013 at 19:16:57 -0000, Mark wrote:
> I looked at a hex dump of the test_star.tar archive. For all files except
> the ...78.bin file, the o-umlaut character is represented by two bytes:
> 0xC3 0xB6. For the ...78.bin file the o-umlauts are represented by C3 83
> C2 B6 (see offsets 0x0C69 and 0x0C9D in the file).
It doesn't explain anything about why it's happening in the first place,
but I did notice that four-byte string appears to be the result of some
sort of double latin1 -> UTF-8 conversion.
That is, the o-umlaut character in latin1 is the F6 byte; when
represented in UTF-8 that expands to the two bytes C3 B6.
Those bytes, if then treated as latin1 characters instead of UTF-8 for
some reason, would display as "ö", and after another round of latin1 ->
UTF-8 conversion, would end up as C3 83 C2 B6....
Nathan
----------------------------------------------------------------------------
Nathan Stratton Treadway - address@hidden - Mid-Atlantic region
Ray Ontko & Co. - Software consulting services - http://www.ontko.com/
GPG Key: http://www.ontko.com/~nathanst/gpg_key.txt ID: 1023D/ECFB6239
Key fingerprint = 6AD8 485E 20B9 5C71 231C 0C32 15F3 ADCD ECFB 6239