RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes

classpath-patches

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes

From:	Jeroen Frijters
Subject:	RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes
Date:	Thu, 18 Nov 2004 17:37:10 +0100

Archie Cobbs wrote:
> I'm simply complaining that the following doesn't work:
> 
>       String s = "\ud8aa";
>       byte[] b = s.getBytes("UTF-8");
>       String t = new String(b, "UTF-8");
>       System.out.println(s.equals(t));        // prints false!
> 
> If you run this under the JDK, it prints "false".

The string isn't valid Unicode so the UTF-8 encoder is within its rights
to encode the surrogate as an invalid character.

> In other words, there are certain String objects that Sun's 
> UTF-8 encoding is not capable of encoding, because it doesn't
> handle all possible character values in the range
> 0x0000 - 0xffff.

I understand what you mean, but you have to face the fact that the range
of 0xD800-0xDFFF doesn't contain valid unicode character and as such
will not be encoded by UTF-8.

> Yes, which is how I came across this bug. There are classes 
> in Classpath that store arbitrary binary data within String
> objects.

Class files don't use UTF-8 to encode strings, they use the format used
by DataOutputStream.writeUTF (what Sun calls "modified UTF").

So maybe all we need to do is make sure that
DataOutputStream.writeUTF/DataInputStream.readUTF can roundtrip *any*
string (even if it has invalid Unicode characters).

Regards,
Jeroen

[Prev in Thread]

Current Thread

[Next in Thread]

[cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes, Jeroen Frijters, 2004/11/17
- Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes, Archie Cobbs, 2004/11/17
- RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes, Jeroen Frijters, 2004/11/17
  - Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes, Archie Cobbs, 2004/11/17
- RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes, Jeroen Frijters, 2004/11/18
  - Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes, Archie Cobbs, 2004/11/18
- RE: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes, Jeroen Frijters <=
  - Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes, Archie Cobbs, 2004/11/18

Prev by Date: [cp-patches] [Patch] gnu.java.nio.FileLockImpl
Next by Date: RE: [cp-patches] [Patch] support for direct buffers
Previous by thread: Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes
Next by thread: Re: [cp-patches] FYI: Patch: character encoder/decoder cleanup/fixes
Index(es):
- Date
- Thread