classpath-patches
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [cp-patches] gnu/xml/transform/StreamSerializer.java: compatibilityM


From: Ito Kazumitsu
Subject: Re: [cp-patches] gnu/xml/transform/StreamSerializer.java: compatibilityMode setting
Date: 12 Mar 2005 22:56:27 -0000
User-agent: SEMI/1.13.7 (Awazu) FLIM/1.13.2 (Kasanui) Emacs/21.3.50 (i386-unknown-freebsd5.3) MULE/5.0 (SAKAKI)

In message "Re: [cp-patches]  gnu/xml/transform/StreamSerializer.java: 
compatibilityMode setting"
    on 05/02/13, Chris Burdess <address@hidden> writes:

:> Unfortunately your patch is almost guaranteed to produce 
:> non-well-formed XML.

OK, I do not insist on my patch, and I do not use the patched program
myself now: I use UTF-8 and "iconv -f UTF-8 -t EUC-JP".

From a practical viewpoint of mine, whether the produced XML is valid
is less important than whether it is compact and human-readable.
When I am handling a Japanese text, I can assume that only Japanese and
ASCII characters appear in it.

I understand that a commonly used system like GNU Classpath cannot take
this practical viewpoint and must take the safest choice. 


:> I agree that compatibilityMode is a hack. What's really needed is a way 
:> to detect whether a character is a valid member of a given encoding, 

As for CJK characters, I cannot imagine such a way of testing a character
without having a table of all valid characters.

I used to use Saxon as an XSLT processor, and this is what Saxon does:

Saxon itself does not support character encodings other than those standard
ones as UTF-8 or ISO-8859-1, and relies on java.nio.charsets package to
handle general character encodings. In addition to that, Saxon provides
a API with which a user can write his own character set handler which
tells whether a character is a valid member of a given encoding.

In order to satisfy my needs, I wrote my own Japanese character handler
which tells a lie that all Unicode characters are Japanese characters,
just like I set the compatibilityMode for gnu/xml/transform/StreamSerializer
to true.

I think this is a good idea. Saxon can be free from the risk of
producing invalid XML documents and responsible users can do
anything they like.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]