[Help-smalltalk] Re: [bug] UnicodeString encoding weirdness

help-smalltalk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Help-smalltalk] Re: [bug] UnicodeString encoding weirdness

From:	Paolo Bonzini
Subject:	[Help-smalltalk] Re: [bug] UnicodeString encoding weirdness
Date:	Mon, 22 Oct 2007 02:01:23 -0700

Issue status update forhttp://smalltalk.gnu.org/project/issue/113Post a follow up:http://smalltalk.gnu.org/project/comments/add/113


Project:      GNU Smalltalk
Version:      <none>
Component:    Base classes
Category:     bug reports
Priority:     normal
Assigned to:  Unassigned
Reported by:  elmex
Updated by:   bonzinip
Status:       active
Attachment:   http://smalltalk.gnu.org/files/issues/gst-encoding-lazy.patch 
(594 bytes)

EF-BF-BE is the unicode "byte order mark" (BOM) encoded in UTF-8.  It
was born as a way to distinguish big- and little-endian UTF-16.  Since
it's not really a character, Iconv tries to strip it when converting to
a UnicodeString, but it is failing to do so in this case.

Now, under Mac OS X I get the expected result, under Linux I get yours.
The reason is that my Mac is big-endian, so Iconv produces big-endian
UTF-16, while Linux produces little-endian UTF-16.  Since the default
encoding of UTF-16 is big-endian, the Mac happens to get the right
thing, while Linux messes up the encoding.  So later on the "pipe
peekFor: $<16rFEFF>" statement to strip the BOM does not work.

The attached patch fixes this by making EncodedString look for a BOM
when retrieving the encoding, rather than when setting it.

[Prev in Thread]

Current Thread

[Next in Thread]

[Help-smalltalk] [bug] UnicodeString encoding weirdness, Robin Redeker, 2007/10/22
- Message not available
  - Message not available
    - [Help-smalltalk] Re: [bug] UnicodeString encoding weirdness, Paolo Bonzini <=
    - Message not available
    - [Help-smalltalk] Re: [bug] UnicodeString encoding weirdness, Paolo Bonzini, 2007/10/22
    - Re: [Help-smalltalk] Re: [bug] UnicodeString encoding weirdness, Robin Redeker, 2007/10/22

Prev by Date: [Help-smalltalk] Re: [bug] UnicodeString conversion truncation
Next by Date: [Help-smalltalk] Re: [bug] UnicodeString encoding weirdness
Previous by thread: [Help-smalltalk] [bug] UnicodeString encoding weirdness
Next by thread: [Help-smalltalk] Re: [bug] UnicodeString encoding weirdness
Index(es):
- Date
- Thread