[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes
From: |
Jeroen Frijters |
Subject: |
RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes |
Date: |
Wed, 17 Nov 2004 16:05:50 +0100 |
Archie Cobbs wrote:
> Jeroen Frijters wrote:
> > I committed the attached patch to remove the throwing of
> > CharConversionException from the character encoders/decoders.
> >
> > For encoders, unsupported characters are now always
> replaced with a '?'
> > byte and for the UTF8 decoder, invalid UTF-8 bytes are replaced by a
> > Unicode REPLACEMENT CHARACTER (\uFFFD) in the output stream.
>
> Just curious.. does this implementation have the same problem as
> described in
> http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4628881 ?
> I.e., is it a lossy encoding for "invalid" characters?
At the moment the UTF-8 encoder/decoder is fully symmetrical for all
"characters" (really UTF-16 codepoints), but this is actually a bug, IMO
unpaired surrogate pairs shouldn't be decoded (like the bug parade
comment says, the test case is bogus).
Regards,
Jeroen