[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: non-ascii chars in octal in sub-shell windows
From: |
Joseph Brenner |
Subject: |
Re: non-ascii chars in octal in sub-shell windows |
Date: |
Fri, 15 Jan 2010 14:08:11 -0800 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/23.1.90 (gnu/linux) |
Peter Dyballa <Peter_Dyballa@Web.DE> writes:
> Joseph Brenner:
>> When running a program that outputs utf-8 characters such as u-umlaut,
>> in a terminal window I'll see the actual character, but in an emacs
>> sub-shell I'm seeing the octal form (which looks like: \374).
>
> No, you're not running such a programme! The LATIN SMALL LETTER U
> WITH DIAERESIS, ü, is encoded in UTF-8 as C3BC. In UTF-16 it is
> 00FC – exactly two bytes! Obviously your programme just outputs
> some ISO Latin dialect or such...
Correct.
If anyone's interested in the details of the screw-up, here's some
off-topic chattering about perl programming:
A typical perl test script is based on the Test::More module,
which provides features to do checks such as:
is_deeply( $some_structure, $expected_structure,
"Testing whether structure is as expected.");
This routine outputs different messages to STDOUT and/or STDERR
depending on whether the check passes or fails.
I was seeing octal junk in those output messages, even after adding
some commands to the *.t script like so:
binmode STDOUT, ':encoding(utf8)';
binmode STDERR, ':encoding(utf8)';
Normally, that would be all it would take to convince perl it needs to
output UTF-8, in the case of Test::More routines, this approach fails,
because it creates new output handles of it's own.
Unbeknownst to me, the documentation for Test::More has been
recommending doing something more like this:
my $builder = Test::More->builder;
binmode $builder->output, ":encoding(utf8)";
binmode $builder->failure_output, ":encoding(utf8)";
Note that merely doing this sort of thing has no effect:
use utf8;
use locale;