[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8
From: |
Max Horn |
Subject: |
Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8 |
Date: |
Mon, 18 Jun 2012 16:56:27 +0200 |
Hi again,
it would be really, really nice to get this issue resolved, one way or another
:-). As mentioned, in the current state of things, GNU sed (via gnulib) does
not work correctly on Mac OS X when e.g. LANG=C is set, leading to real-world
errors for users when using e.g. git while GNU sed is installed.
On the other hand, so far I saw no reply to my attempts to refute
counterarguments against these patches. So, should I just submit git patches
for one (or both) of them, for inclusion? Or does anybody still have
reservations about this?
Cheers,
Max
Am 11.06.2012 um 00:31 schrieb Max Horn:
> Hi again,
>
>
> Am 07.06.2012 um 14:07 schrieb Bruno Haible:
>
> [...]
>
>>
>>> But this is dangerous, because now UTF-8 is set but MB_CUR_MAX is 1
>>> and various parts of sed interpret "Rémi Leblond" as an invalid
>>> character sequence for a UTF-8 character set.
>>
>> Indeed, I can see how this inconsistency leads to bugs like the described
>> ones.
>>
>> The fix could be to have two different locale_charset() functions,
>> one that returns "US-ASCII" and another one that returns "UTF-8".
>> The first one to be used when MB_CUR_MAX and mbrtowc() are used as
>> well, the second one to be used by gettext(). But the separation
>> line between the two cases is not yet clear to me. Any insights?
>
> Hum, that sounds quite complicated -- could you explain what this would gain
> over the idea of simply mapping "US-ASCII" to "ASCII", or over the patch Paul
> suggested:
>
>> --- a/lib/localcharset.c
>> +++ b/lib/localcharset.c
>> @@ -542,5 +542,12 @@ locale_charset (void)
>> if (codeset[0] == '\0')
>> codeset = "ASCII";
>>
>> +#ifdef DARWIN7
>> + /* MacOS X sets MB_CUR_MAX to 1 when LC_ALL=C, and "UTF-8"
>> + (the default codeset) does not work when MB_CUR_MAX is 1. */
>> + if (strcmp (codeset, "UTF-8") == 0 && MB_CUR_MAX <= 1)
>> + codeset = "ASCII";
>> +#endif
>> +
>> return codeset;
>> }
>
>
> Cheers,
> Max
>
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, (continued)
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Bruno Haible, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Eric Blake, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Paolo Bonzini, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Pádraig Brady, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Eric Blake, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Paolo Bonzini, 2012/06/07
- Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Eric Blake, 2012/06/07
Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Max Horn, 2012/06/10
Re: GNU sed version 4.2.1: on OS X, C locale gets aliased to UTF-8, Max Horn, 2012/06/06