classpath-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[cp-patches] RFC: gnu.java.nio.charset.UnicodeLittle


From: Ito Kazumitsu
Subject: [cp-patches] RFC: gnu.java.nio.charset.UnicodeLittle
Date: Sat, 05 Nov 2005 01:49:37 +0900 (JST)

Hi,

I have been admitted to the GNU Classpath family and this
is the first patch I would like to check in.

This path should fix the bug #23008 (The charset decoder for
UnicodeLittle gives wrong results).

As reported in Comment #1 of the  bug #23008,

Seeing the behavior of Sun's JDK, UnicodeLittle with or without byte order
mark should be treated as follows:
  
  UnicodeLittle with correct byte order mark:
    Ignore the byte order mark and continue assuming the byte order
    to be little endian.
  
  UnicodeLittle with incorrect byte order mark:
    The byte sequence is malformed.

  UnicodeLittle without byte order mark:
    Continue assuming the byte order to be little endian.

This patch has already been applied in Kaffe for several months
and has worked fine.

2005-11-04  Ito Kazumitsu  <address@hidden>

        * gnu/java/nio/charset/UTF_16Decoder.java
        MAYBE_BIG_ENDIAN, MAYBE_LITTLE_ENDIAN: New constants representing
        such endianness which is similar to UNKNOWN_ENDIAN but defaults
        to big/little endian without a byte order mark.

        (decodeLoop): Handle MAYBE_BIG_ENDIAN and MAYBE_LITTLE_ENDIAN.

        * gnu/java/nio/charset/UnicodeLittle.java
        (newDecoder): Set the endianness to MAYBE_LITTLE_ENDIAN.

--- gnu/java/nio/charset/UnicodeLittle.java.orig        Sun Jul  3 05:32:13 2005
+++ gnu/java/nio/charset/UnicodeLittle.java     Sat Nov  5 01:07:47 2005
@@ -64,7 +64,7 @@
 
   public CharsetDecoder newDecoder ()
   {
-    return new UTF_16Decoder (this, UTF_16Decoder.UNKNOWN_ENDIAN);
+    return new UTF_16Decoder (this, UTF_16Decoder.MAYBE_LITTLE_ENDIAN);
   }
 
   public CharsetEncoder newEncoder ()
--- gnu/java/nio/charset/UTF_16Decoder.java.orig        Fri Aug 12 08:51:30 2005
+++ gnu/java/nio/charset/UTF_16Decoder.java     Sat Nov  5 01:08:10 2005
@@ -54,6 +54,8 @@
   static final int BIG_ENDIAN = 0;
   static final int LITTLE_ENDIAN = 1;
   static final int UNKNOWN_ENDIAN = 2;
+  static final int MAYBE_BIG_ENDIAN = 3;
+  static final int MAYBE_LITTLE_ENDIAN = 4;
 
   private static final char BYTE_ORDER_MARK = 0xFEFF;
   private static final char REVERSED_BYTE_ORDER_MARK = 0xFFFE;
@@ -81,26 +83,37 @@
             byte b2 = in.get ();
 
             // handle byte order mark
-            if (byteOrder == UNKNOWN_ENDIAN)
+            if (byteOrder == UNKNOWN_ENDIAN ||
+                byteOrder == MAYBE_BIG_ENDIAN ||
+                byteOrder == MAYBE_LITTLE_ENDIAN)
               {
                 char c = (char) (((b1 & 0xFF) << 8) | (b2 & 0xFF));
                 if (c == BYTE_ORDER_MARK)
                   {
+                    if (byteOrder == MAYBE_LITTLE_ENDIAN)
+                      {
+                        return CoderResult.malformedForLength (2);
+                      }
                     byteOrder = BIG_ENDIAN;
                     inPos += 2;
                     continue;
                   }
                 else if (c == REVERSED_BYTE_ORDER_MARK)
                   {
+                    if (byteOrder == MAYBE_BIG_ENDIAN)
+                      {
+                        return CoderResult.malformedForLength (2);
+                      }
                     byteOrder = LITTLE_ENDIAN;
                     inPos += 2;
                     continue;
                   }
                 else
                   {
-                    // assume big endian, do not consume bytes,
+                    // assume big or little endian, do not consume bytes,
                     // continue with normal processing
-                    byteOrder = BIG_ENDIAN;
+                    byteOrder = (byteOrder == MAYBE_LITTLE_ENDIAN ?
+                                 LITTLE_ENDIAN : BIG_ENDIAN);
                   }
               }
 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]