[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Add "java_quoting_style" to quotearg
From: |
Bruno Haible |
Subject: |
Re: Add "java_quoting_style" to quotearg |
Date: |
Sun, 6 Nov 2011 11:49:22 +0100 |
User-agent: |
KMail/1.13.6 (Linux/2.6.37.6-0.5-desktop; KDE/4.6.0; x86_64; ; ) |
Tim Landscheidt wrote:
> I don't know what the result
> of `quotearg ("äöü")' should look like and what it should
> depend on.
It depends on the output destination of the quoted string.
If it is for output on stderr, like in bison/src/parse-gram.y:193
%printer { fputs (quotearg_style (c_quoting_style, $$), stderr); }
then you can most likely emit "äöü" with the same multibyte characters.
If it is for inclusion in a Java program, in comments, you also don't
need to do particular processing of multibyte characters.
If it is for use as a literal string in a Java program, then the
interpretation of source code depends on the -encoding parameter passed
as argument to the Java compiler (see [1]). If you emit "äöü" directly
into the source code, the developer needs to add a -encoding option;
this is normally not welcome. To avoid this, the notation \unnnn
can be used in strings for UTF-16 codepoints, excluding LF and CR
(\u000A and \u000D are invalid inside strings). So, the algorithm is:
- Determine the encoding of the string's origin (if it's from a
file name or a tty, you can assume locale_charset() is the right
guess; if it's from a file, use a command-line argument to specify
its encoding).
- Convert the multibyte string to UTF-16 (either through module
'striconv' or through a hand-written code in the same style as
lib/unicodeio.c [just in the reverse direction]).
- Replace LF with \n, CR with \r, and all other UTF-16 code points
outside the range U+0020..U+007E with \unnnn.
Bruno
[1] http://download.oracle.com/javase/1,5.0/docs/tooldocs/solaris/javac.html
--
In memoriam Louis Philippe d'Orléans
<http://en.wikipedia.org/wiki/Louis_Philippe_II,_Duke_of_Orléans>