|
From: | Laimonas Vėbra |
Subject: | bug#6705: w32 cmdproxy.c pass args to cygwin; erroneous charset conversion (problem description, solution/suggestion) |
Date: | Fri, 23 Jul 2010 15:57:46 +0300 |
User-agent: | Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6 |
Eli Zaretskii wrote:
Date: Thu, 22 Jul 2010 23:59:09 +0300 From: Laimonas Vėbra<laimonas.vebra@gmail.com> CC: 6705@debbugs.gnu.org Eli Zaretskii wrote:Sorry, I cannot understand your comments. You talk about corrupted conversion, but never add any detailed explanations, just examples. Could you please elaborate?That was supposed to be detailed explanations through the detailed examples. It is the way it happens. I did check/investigate;I don't doubt that you checked, I just don't understand the description of the problem. Once again, if all you want to say is that you want to invoke external programs with command-line arguments encoded in anything other than the current locale's encoding, then this will not currently work in the native Windows build. But if you are trying to say anything else, please elaborate.
Well, it will work. It's not the problem to pass utf-8 arguments to natvive (mingw) apps. It won't work with cygwin, and that „won't work“ is not for sure (it can, under some circumstances, and i'd say inproper setup). So i think i should elaborate.
None of it. Perhaps instead of going by example, just describe what encoding you used, in what Emacs command, and what was corrupted as result.
Ok, from the begining.I'd like to grep for some utf-8 encoded string. Choose it whatever (non ascii) value you like, let's say 'ĔĿİ' (hex: 0x[C494, C4BF, C4B0]).
echo -e "-ĔĿİ-\n_ĔĿİ_\nELI\nĔĿİ" > file.txt grep --version GNU grep 2.6.3 (cygwin) wscript.echo (GetLocale()) 1063 http://www.cryer.co.uk/brian/windows/info_windows_locale_table.htm LANG="" (that means not set, cygwin default value "C.UTF-8") M-x grep grep -nH -e "ĔĿİ" file.txt Grep finished with no matches found at Fri Jul 23 13:56:22 Why? Because:grep.c gets args "Ä”ÄæÄ°" (utf-8 string, hex: 0x[C384, E2809D, C384, C3A6, C384, C2B0]).
Why?Because original string value 0x[C494, C4BF, C4B0] in interpreted to be in the current locale codepage (cp1257) encoding/charset:
http://msdn.microsoft.com/en-us/goglobal/cc305150.aspxand is interpreted (by the cygwin/os api) as six characters: 0x (C4, 94, C4, BF, C4, B0); i.e. 'Ä”ÄæÄ°', converted to utf-16 and then to utf-8.
I didn't try to imply that Cygwin was the problem. I was suggesting to use the Cygwin build of Emacs. Why do you insist on using the native w32 build, when it is obvious that the compatibility between what it does and what Cygwin expects is marginal at best?
I tried to imply, that cygwin tools is mature/consistent enough for the w32 to work with. And from that point of view there is no advantage of using cygwin Emacs build instead of native one (cygwin build is slower and potentially more buggy)
Yes. But it doesn't make sense to do this kind of surgery in Emacs without benefiting from the *W APIs all over, does it?
Why? We benefit at least in that sense, that both of them (native and cygwin app) will work (correctly) on w32. As correctly as it's possible with the current code. In other words -- (why) do you think it's not worth to tune Emacs with cygwin system (plenty of useful tools; especially if we think about working (efficiently, the same) with emacs on different systems: *nix, w32)?
[Prev in Thread] | Current Thread | [Next in Thread] |