bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversi


From: Laimonas Vėbra
Subject: bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion)
Date: Fri, 23 Jul 2010 15:57:46 +0300
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6

Eli Zaretskii wrote:
Date: Thu, 22 Jul 2010 23:59:09 +0300
From: Laimonas Vėbra<laimonas.vebra@gmail.com>
CC: 6705@debbugs.gnu.org

Eli Zaretskii wrote:

Sorry, I cannot understand your comments.  You talk about corrupted
conversion, but never add any detailed explanations, just examples.
Could you please elaborate?

That was supposed to be detailed explanations through the detailed
examples. It is the way it happens. I did check/investigate;

I don't doubt that you checked, I just don't understand the
description of the problem.

Once again, if all you want to say is that you want to invoke external
programs with command-line arguments encoded in anything other than
the current locale's encoding, then this will not currently work in
the native Windows build.  But if you are trying to say anything else,
please elaborate.

Well, it will work. It's not the problem to pass utf-8 arguments to natvive (mingw) apps. It won't work with cygwin, and that „won't work“ is not for sure (it can, under some circumstances, and i'd say inproper setup). So i think i should elaborate.


None of it.  Perhaps instead of going by example, just describe what
encoding you used, in what Emacs command, and what was corrupted as
result.

Ok, from the begining.
I'd like to grep for some utf-8 encoded string. Choose it whatever (non ascii) value you like, let's say 'ĔĿİ' (hex: 0x[C494, C4BF, C4B0]).

echo -e "-ĔĿİ-\n_ĔĿİ_\nELI\nĔĿİ" > file.txt

grep --version
GNU grep 2.6.3 (cygwin)

wscript.echo (GetLocale())
1063
http://www.cryer.co.uk/brian/windows/info_windows_locale_table.htm

LANG="" (that means not set, cygwin default value "C.UTF-8")

M-x grep
grep -nH -e "ĔĿİ" file.txt

Grep finished with no matches found at Fri Jul 23 13:56:22

Why?

Because:
grep.c gets args "Ä”ÄæÄ°" (utf-8 string, hex: 0x[C384, E2809D, C384, C3A6, C384, C2B0]).

Why?
Because original string value 0x[C494, C4BF, C4B0] in interpreted to be in the current locale codepage (cp1257) encoding/charset:
http://msdn.microsoft.com/en-us/goglobal/cc305150.aspx

and is interpreted (by the cygwin/os api) as six characters: 0x (C4, 94, C4, BF, C4, B0); i.e. 'Ä”ÄæÄ°', converted to utf-16 and then to utf-8.



I didn't try to imply that Cygwin was the problem.  I was suggesting
to use the Cygwin build of Emacs.  Why do you insist on using the
native w32 build, when it is obvious that the compatibility between
what it does and what Cygwin expects is marginal at best?

I tried to imply, that cygwin tools is mature/consistent enough for the w32 to work with. And from that point of view there is no advantage of using cygwin Emacs build instead of native one (cygwin build is slower and potentially more buggy)

Yes.  But it doesn't make sense to do this kind of surgery in Emacs
without benefiting from the *W APIs all over, does it?

Why? We benefit at least in that sense, that both of them (native and cygwin app) will work (correctly) on w32. As correctly as it's possible with the current code. In other words -- (why) do you think it's not worth to tune Emacs with cygwin system (plenty of useful tools; especially if we think about working (efficiently, the same) with emacs on different systems: *nix, w32)?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]