[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Fix libunistring in MS-Windows locales
From: |
Eli Zaretskii |
Subject: |
Re: Fix libunistring in MS-Windows locales |
Date: |
Tue, 15 Jul 2014 18:53:39 +0300 |
Ping! Almost 2 weeks went by with no response.
TIA
> Date: Thu, 03 Jul 2014 18:23:32 +0300
> From: Eli Zaretskii <address@hidden>
>
> I submitted these changes to the libunistring mailing list, but was
> advised to send them here, as libunistring appears to be a collection
> of Gnulib modules. So here I am, repeating what I sent to
> libunistring list. (The patch below is relative to the latest Gnulib
> master.)
>
> Libunistring on MS-Windows is currently less useful than it could be.
> In a nutshell, it supports only the default system locale; setting the
> locale to anything else using 'setlocale' will cause many libunistring
> functions to fail.
>
> One reason is that libunistring needs to know the codeset of the
> locale, to use it when it converts strings to and from Unicode using
> iconv. However, on Windows the function locale_charset always returns
> the default ANSI codepage, using the GetACP API. That API always
> returns the same codepage regardless of locale changes. The result is
> that when libunistring is passed a string in any encoding incompatible
> with the default system codepage, any calls to libiconv will fail with
> EILSEQ. And since most libunistring functions call libiconv, they all
> fail.
>
> The second problem that gets in the way is the one in gl_locale_name.
> On MS-Windows, this function returns only the default locale (by
> calling GetThreadLocale, which is again oblivious to any locale
> changes made by 'setlocale'). So again, only the default locale can
> be supported correctly. E.g., functions that need to apply
> language-specific rules, like for Turkish, will work incorrectly
> because the language they get from gl_locale_name is not Turkish, even
> though 'setlocale' was previously called to set the Turkish locale.
>
> To fix these problems, I propose the following changes. They have
> been extensively tested using Guile's test suite, in particular the
> i18n.test there.
>
> I hope these changes will be accepted, because as I said, without them
> libunistring is much less useful on MS-Windows than it could be.
>
> Thanks.
>
> 2014-07-02 Eli Zaretskii <address@hidden>
>
> Improve support for non-default locales on MS-Windows.
> * lib/localcharset.c (locale_charset) [WINDOWS_NATIVE]: Before
> falling back on the default system codepage, try extracting
> the codepage from what 'setlocale' returns. This allows to
> take into account changes of the codeset due to non-default
> locale set by a previous call to 'setlocale'.
>
> * lib/localename.c (LOCALE_NAME_MAX_LENGTH) [WINDOWS_NATIVE]:
> Define if not already defined.
> (enum_locales_fn, get_lcid) [WINDOWS_NATIVE]: New functions.
> (gl_locale_name_thread) [WINDOWS_NATIVE]: Produce the
> current locale by calling 'setlocale', then converting the
> locale name into LCID by calling 'get_lcid'. This allows to
> take into account changes in the current locale from the
> default one, in contrast to GetThreadLocale.
>
> --- lib/localcharset.c~0 2014-02-09 09:41:26 +0200
> +++ lib/localcharset.c 2014-07-03 18:17:44 +0300
> @@ -34,6 +34,7 @@
>
> #if defined _WIN32 || defined __WIN32__
> # define WINDOWS_NATIVE
> +# include <locale.h>
> #endif
>
> #if defined __EMX__
> @@ -461,14 +462,34 @@ locale_charset (void)
>
> static char buf[2 + 10 + 1];
>
> - /* The Windows API has a function returning the locale's codepage as a
> - number: GetACP().
> - When the output goes to a console window, it needs to be provided in
> - GetOEMCP() encoding if the console is using a raster font, or in
> - GetConsoleOutputCP() encoding if it is using a TrueType font.
> - But in GUI programs and for output sent to files and pipes, GetACP()
> - encoding is the best bet. */
> - sprintf (buf, "CP%u", GetACP ());
> + /* The Windows API has a function returning the locale's codepage as
> + a number, but the value doesn't change according to what the
> + 'setlocale' call specified. So we use it as a last resort, in
> + case the string returned by 'setlocale' doesn't specify the
> + codepage. */
> + char *current_locale = setlocale (LC_ALL, NULL);
> + char *pdot;
> +
> + /* If they set different locales for different categories,
> + 'setlocale' will return a semi-colon separated list of locale
> + values. To make sure we use the correct one, we choose LC_CTYPE. */
> + if (strchr (current_locale, ';'))
> + current_locale = setlocale (LC_CTYPE, NULL);
> +
> + pdot = strrchr (current_locale, '.');
> + if (pdot)
> + sprintf (buf, "CP%s", pdot + 1);
> + else
> + {
> + /* The Windows API has a function returning the locale's codepage as a
> + number: GetACP().
> + When the output goes to a console window, it needs to be provided in
> + GetOEMCP() encoding if the console is using a raster font, or in
> + GetConsoleOutputCP() encoding if it is using a TrueType font.
> + But in GUI programs and for output sent to files and pipes, GetACP()
> + encoding is the best bet. */
> + sprintf (buf, "CP%u", GetACP ());
> + }
> codeset = buf;
>
> #elif defined OS2
>
>
> --- lib/localename.c~0 2014-02-09 09:41:26 +0200
> +++ lib/localename.c 2014-07-03 18:11:45 +0300
> @@ -60,6 +60,7 @@
> #if defined WINDOWS_NATIVE || defined __CYGWIN__ /* Native Windows or Cygwin
> */
> # define WIN32_LEAN_AND_MEAN
> # include <windows.h>
> +# include <winnls.h>
> /* List of language codes, sorted by value:
> 0x01 LANG_ARABIC
> 0x02 LANG_BULGARIAN
> @@ -1124,6 +1125,9 @@
> # ifndef LOCALE_SNAME
> # define LOCALE_SNAME 0x5c
> # endif
> +# ifndef LOCALE_NAME_MAX_LENGTH
> +# define LOCALE_NAME_MAX_LENGTH 85
> +# endif
> #endif
>
>
> @@ -2502,7 +2506,69 @@ gl_locale_name_from_win32_LCID (LCID lci
> return gl_locale_name_from_win32_LANGID (langid);
> }
>
> -#endif
> +#ifdef WINDOWS_NATIVE
> +
> +/* Two variables to interface between get_lcid and the EnumLocales
> + callback function below. */
> +static LCID found_lcid;
> +static char lname[LC_MAX * (LOCALE_NAME_MAX_LENGTH + 1) + 1];
> +
> +/* Callback function for EnumLocales. */
> +static BOOL CALLBACK
> +enum_locales_fn (LPTSTR locale_num_str)
> +{
> + char *endp;
> + char locval[2 * LOCALE_NAME_MAX_LENGTH + 1 + 1];
> + LCID try_lcid = strtoul (locale_num_str, &endp, 16);
> +
> + if (GetLocaleInfo (try_lcid, LOCALE_SENGLANGUAGE,
> + locval, LOCALE_NAME_MAX_LENGTH))
> + {
> + strcat (locval, "_");
> + if (GetLocaleInfo (try_lcid, LOCALE_SENGCOUNTRY,
> + locval + strlen (locval), LOCALE_NAME_MAX_LENGTH))
> + {
> + size_t locval_len = strlen (locval);
> +
> + if (strncmp (locval, lname, locval_len) == 0
> + && (lname[locval_len] == '.'
> + || lname[locval_len] == '\0'))
> + {
> + found_lcid = try_lcid;
> + return FALSE;
> + }
> + }
> + }
> + return TRUE;
> +}
> +
> +/* Return the Locale ID (LCID) number given the locale's name, a
> + string, in LOCALE_NAME. This works by enumerating all the locales
> + supported by the system, until we find one whose name matches
> + LOCALE_NAME. */
> +static LCID
> +get_lcid (const char *locale_name)
> +{
> + /* A simple cache. */
> + static LCID last_lcid;
> + static char last_locale[1000];
> +
> + if (last_lcid > 0 && strcmp (locale_name, last_locale) == 0)
> + return last_lcid;
> + strncpy (lname, locale_name, sizeof (lname) - 1);
> + lname[sizeof (lname) - 1] = '\0';
> + found_lcid = 0;
> + EnumSystemLocales (enum_locales_fn, LCID_SUPPORTED);
> + if (found_lcid > 0)
> + {
> + last_lcid = found_lcid;
> + strcpy (last_locale, locale_name);
> + }
> + return found_lcid;
> +}
> +
> +#endif /* WINDOWS_NATIVE */
> +#endif /* WINDOWS_NATIVE || __CYGWIN__ */
>
>
> #if HAVE_USELOCALE /* glibc or Mac OS X */
> @@ -2660,6 +2726,26 @@ gl_locale_name_thread (int category, con
> const char *name = gl_locale_name_thread_unsafe (category, categoryname);
> if (name != NULL)
> return struniq (name);
> +#elif defined WINDOWS_NATIVE
> + if (LC_MIN <= category && category <= LC_MAX)
> + {
> + char *locname = setlocale (category, NULL);
> +
> + /* If CATEGORY is LC_ALL, the result might be a semi-colon
> + separated list of locales. We need only one, so we take the
> + one corresponding to LC_CTYPE, as the most important for
> + character translations. */
> + if (strchr (locname, ';'))
> + locname = setlocale (LC_CTYPE, NULL);
> +
> + /* Convert locale name to LCID. We don't want to use
> + LocaleNameToLCID because (a) it is only available since Vista,
> + and (b) it doesn't accept locale names returned by 'setlocale'. */
> + LCID lcid = get_lcid (locname);
> +
> + if (lcid > 0)
> + return gl_locale_name_from_win32_LCID (lcid);
> + }
> #endif
> return NULL;
> }
>
>
Re: Fix libunistring in MS-Windows locales,
Eli Zaretskii <=
- Re: Fix libunistring in MS-Windows locales, Paul Eggert, 2014/07/15
- Re: Fix libunistring in MS-Windows locales, Eli Zaretskii, 2014/07/15
- Re: Fix libunistring in MS-Windows locales, Daiki Ueno, 2014/07/16
- Re: Fix libunistring in MS-Windows locales, Eli Zaretskii, 2014/07/16
- Re: Fix libunistring in MS-Windows locales, Daiki Ueno, 2014/07/17
- Re: Fix libunistring in MS-Windows locales, Eli Zaretskii, 2014/07/17
- Re: Fix libunistring in MS-Windows locales, Paul Eggert, 2014/07/17
- Re: Fix libunistring in MS-Windows locales, Daiki Ueno, 2014/07/17
- Re: Fix libunistring in MS-Windows locales, Daiki Ueno, 2014/07/18
- Re: Fix libunistring in MS-Windows locales, Eli Zaretskii, 2014/07/18