[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[cp-patches] RFC: gnu.java.nio.charset.UnicodeLittle
From: |
Ito Kazumitsu |
Subject: |
[cp-patches] RFC: gnu.java.nio.charset.UnicodeLittle |
Date: |
Sat, 05 Nov 2005 01:49:37 +0900 (JST) |
Hi,
I have been admitted to the GNU Classpath family and this
is the first patch I would like to check in.
This path should fix the bug #23008 (The charset decoder for
UnicodeLittle gives wrong results).
As reported in Comment #1 of the bug #23008,
Seeing the behavior of Sun's JDK, UnicodeLittle with or without byte order
mark should be treated as follows:
UnicodeLittle with correct byte order mark:
Ignore the byte order mark and continue assuming the byte order
to be little endian.
UnicodeLittle with incorrect byte order mark:
The byte sequence is malformed.
UnicodeLittle without byte order mark:
Continue assuming the byte order to be little endian.
This patch has already been applied in Kaffe for several months
and has worked fine.
2005-11-04 Ito Kazumitsu <address@hidden>
* gnu/java/nio/charset/UTF_16Decoder.java
MAYBE_BIG_ENDIAN, MAYBE_LITTLE_ENDIAN: New constants representing
such endianness which is similar to UNKNOWN_ENDIAN but defaults
to big/little endian without a byte order mark.
(decodeLoop): Handle MAYBE_BIG_ENDIAN and MAYBE_LITTLE_ENDIAN.
* gnu/java/nio/charset/UnicodeLittle.java
(newDecoder): Set the endianness to MAYBE_LITTLE_ENDIAN.
--- gnu/java/nio/charset/UnicodeLittle.java.orig Sun Jul 3 05:32:13 2005
+++ gnu/java/nio/charset/UnicodeLittle.java Sat Nov 5 01:07:47 2005
@@ -64,7 +64,7 @@
public CharsetDecoder newDecoder ()
{
- return new UTF_16Decoder (this, UTF_16Decoder.UNKNOWN_ENDIAN);
+ return new UTF_16Decoder (this, UTF_16Decoder.MAYBE_LITTLE_ENDIAN);
}
public CharsetEncoder newEncoder ()
--- gnu/java/nio/charset/UTF_16Decoder.java.orig Fri Aug 12 08:51:30 2005
+++ gnu/java/nio/charset/UTF_16Decoder.java Sat Nov 5 01:08:10 2005
@@ -54,6 +54,8 @@
static final int BIG_ENDIAN = 0;
static final int LITTLE_ENDIAN = 1;
static final int UNKNOWN_ENDIAN = 2;
+ static final int MAYBE_BIG_ENDIAN = 3;
+ static final int MAYBE_LITTLE_ENDIAN = 4;
private static final char BYTE_ORDER_MARK = 0xFEFF;
private static final char REVERSED_BYTE_ORDER_MARK = 0xFFFE;
@@ -81,26 +83,37 @@
byte b2 = in.get ();
// handle byte order mark
- if (byteOrder == UNKNOWN_ENDIAN)
+ if (byteOrder == UNKNOWN_ENDIAN ||
+ byteOrder == MAYBE_BIG_ENDIAN ||
+ byteOrder == MAYBE_LITTLE_ENDIAN)
{
char c = (char) (((b1 & 0xFF) << 8) | (b2 & 0xFF));
if (c == BYTE_ORDER_MARK)
{
+ if (byteOrder == MAYBE_LITTLE_ENDIAN)
+ {
+ return CoderResult.malformedForLength (2);
+ }
byteOrder = BIG_ENDIAN;
inPos += 2;
continue;
}
else if (c == REVERSED_BYTE_ORDER_MARK)
{
+ if (byteOrder == MAYBE_BIG_ENDIAN)
+ {
+ return CoderResult.malformedForLength (2);
+ }
byteOrder = LITTLE_ENDIAN;
inPos += 2;
continue;
}
else
{
- // assume big endian, do not consume bytes,
+ // assume big or little endian, do not consume bytes,
// continue with normal processing
- byteOrder = BIG_ENDIAN;
+ byteOrder = (byteOrder == MAYBE_LITTLE_ENDIAN ?
+ LITTLE_ENDIAN : BIG_ENDIAN);
}
}
- [cp-patches] RFC: gnu.java.nio.charset.UnicodeLittle,
Ito Kazumitsu <=