qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all cod


From: Markus Armbruster
Subject: Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all code
Date: Tue, 16 Apr 2019 18:01:46 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

Daniel P. Berrangé <address@hidden> writes:

> On Tue, Apr 16, 2019 at 09:49:09AM +0200, Markus Armbruster wrote:
>> Daniel P. Berrangé <address@hidden> writes:
>> 
>> > Localization is not a feature whose impact is limited to the UI
>> > frontends. Other parts of QEMU rely in localization.

Typo: rely on

>> Typically by accident :)
>> 
>> >                                                      In particular the
>> > USB MTP driver needs to be able to convert filenames from the locale
>> > specified character set into UTF-16 / UCS-2 encoded wide characters.
>> 
>> Oookay.  I guess that makes some sense.
>> 
>> > setlocale() is only set from two of the UI frontends though, and worse,
>> > there is inconsistent behaviour with GTK setting LC_CTYPE to C.UTF-8,
>> > while ncurses honours whatever is set in the user's environment.
>> >
>> > This causes the USP MTP driver to behave differently depending on which
>> > UI frontend is activated.
>> >
>> > Furthermore, the curses settings are dangerous because LC_CTYPE will affect
>> > the is{upper,lower,alnum} functions which much QEMU code assumes to have
>> > C locale sorting behaviour. This also breaks QMP if the env requests a
>> > non-UTF-8 locale, since QMP is defined to use UTF-8 encoding for JSON.
>> 
>> Can you elaborate on how QMP breaks with a non-UTF-8 locale?  Ah, you do
>> in the comment you add to vl.c.
>> 
>> > This problematic curses code was introduced in:
>> >
>> >   commit 2f8b7cd587558944532f587abb5203ce54badba9
>> >   Author: Samuel Thibault <address@hidden>
>> >   Date:   Mon Mar 11 14:51:27 2019 +0100
>> >
>> >     curses: add option to specify VGA font encoding
>> >
>> > This patch moves the GTK frontend setlocale() handling into the main()
>> > method. This ensures QMP and other QEMU code has a predictable C.UTF-8.
>> >
>> > Eventually QEMU should set LC_ALL, honouring the full user environment,
>> > but this needs various cleanups in QEMU code first.
>> 
>> Eventually, we'll all be dead.
>> 
>> >                                                     Hardcoding LC_CTYPE
>> > to C.UTF-8 is a partial regression vs the above curses commit, since it
>> > will break the curses wide character handling for non-UTF-8 locales but
>> > this is unavoidable until QEMU is cleaned up to cope with non-UTF-8
>> > locales fully.
>> >
>> > Setting of LC_MESSAGES is left in the GTK code since only the GTK
>> > frontend is using translation of strings. This lets us avoid the
>> > platform portability problem where LC_MESSAGES is not provided by
>> > locale.h on MinGW. GTK pulls it in indirectly from libintl.h via
>> > gi18n.h header, but we don't want to pull that into the global
>> > QEMU namespace.
>> >
>> > Signed-off-by: Daniel P. Berrangé <address@hidden>
>> > ---
>> >
>> > Changed in v2:
>> >
>> >  - Leave LC_MESSAGES setting in gtk code to avoid platform
>> >    portability problems.
>> >
>> >  ui/curses.c |  2 --
>> >  ui/gtk.c    | 32 +++++++++++---------------------
>> >  vl.c        | 40 ++++++++++++++++++++++++++++++++++++++++
>> >  3 files changed, 51 insertions(+), 23 deletions(-)
>> >
>> > diff --git a/ui/curses.c b/ui/curses.c
>> > index cc6d6da684..403cd57913 100644
>> > --- a/ui/curses.c
>> > +++ b/ui/curses.c
>> > @@ -27,7 +27,6 @@
>> >  #include <sys/ioctl.h>
>> >  #include <termios.h>
>> >  #endif
>> > -#include <locale.h>
>> >  #include <wchar.h>
>> >  #include <langinfo.h>
>> >  #include <iconv.h>
>> > @@ -716,7 +715,6 @@ static void curses_display_init(DisplayState *ds, 
>> > DisplayOptions *opts)
>> >      }
>> >  #endif
>> >  
>> > -    setlocale(LC_CTYPE, "");
>> >      if (opts->u.curses.charset) {
>> >          font_charset = opts->u.curses.charset;
>> >      }
>> > diff --git a/ui/gtk.c b/ui/gtk.c
>> > index e96e15435a..e6a41c79ab 100644
>> > --- a/ui/gtk.c
>> > +++ b/ui/gtk.c
>> > @@ -2208,12 +2208,12 @@ static void gtk_display_init(DisplayState *ds, 
>> > DisplayOptions *opts)
>> >  
>> >      s->free_scale = FALSE;
>> >  
>> > -    /* Mostly LC_MESSAGES only. See early_gtk_display_init() for details. 
>> > For
>> > -     * LC_CTYPE, we need to make sure that non-ASCII characters are 
>> > considered
>> > -     * printable, but without changing any of the character classes to 
>> > make
>> > -     * sure that we don't accidentally break implicit assumptions.  */
>> > +    /*
>> > +     * See comment in main() for why it has only setup LC_CTYPE,
>> > +     * as opposed to LC_ALL. Given that we need to enable
>> > +     * LC_MESSAGES too for menu translations.
>> > +     */
>> >      setlocale(LC_MESSAGES, "");
>> > -    setlocale(LC_CTYPE, "C.UTF-8");
>> >      bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
>> >      textdomain("qemu");
>> >  
>> > @@ -2262,22 +2262,12 @@ static void gtk_display_init(DisplayState *ds, 
>> > DisplayOptions *opts)
>> >  
>> >  static void early_gtk_display_init(DisplayOptions *opts)
>> >  {
>> > -    /* The QEMU code relies on the assumption that it's always run in
>> > -     * the C locale. Therefore it is not prepared to deal with
>> > -     * operations that produce different results depending on the
>> > -     * locale, such as printf's formatting of decimal numbers, and
>> > -     * possibly others.
>> > -     *
>> > -     * Since GTK+ calls setlocale() by default -importing the locale
>> > -     * settings from the environment- we must prevent it from doing so
>> > -     * using gtk_disable_setlocale().
>> > -     *
>> > -     * QEMU's GTK+ UI, however, _does_ have translations for some of
>> > -     * the menu items. As a trade-off between a functionally correct
>> > -     * QEMU and a fully internationalized UI we support importing
>> > -     * LC_MESSAGES from the environment (see the setlocale() call
>> > -     * earlier in this file). This allows us to display translated
>> > -     * messages leaving everything else untouched.
>> > +    /*
>> > +     * GTK calls setlocale() by default, importing the locale
>> > +     * settings from the environment. QEMU's main() method will
>> > +     * have set LC_MESSAGES and LC_CTYPE to allow GTK to display
>> > +     * translated messages, including use of wide characters. We
>> > +     * must not allow GTK to change other aspects of the locale.
>> >       */
>> >      gtk_disable_setlocale();
>> >      gtkinit = gtk_init_check(NULL, NULL);
>> > diff --git a/vl.c b/vl.c
>> > index c696ad2a13..87a55cddb3 100644
>> > --- a/vl.c
>> > +++ b/vl.c
>> > @@ -49,6 +49,7 @@ int main(int argc, char **argv)
>> >  #define main qemu_main
>> >  #endif /* CONFIG_COCOA */
>> >  
>> > +#include <locale.h>
>> >  
>> >  #include "qemu/error-report.h"
>> >  #include "qemu/sockets.h"
>> > @@ -3022,6 +3023,45 @@ int main(int argc, char **argv, char **envp)
>> >      char *dir, **dirs;
>> >      BlockdevOptionsQueue bdo_queue = QSIMPLEQ_HEAD_INITIALIZER(bdo_queue);
>> >  
>> > +    /*
>> > +     * Ideally we would set LC_ALL, but QEMU currently isn't able to cope
>> > +     * with arbitrary localization settings. In particular there are two
>> > +     * known problems
>> > +     *
>> > +     *   - The QMP monitor needs to use the C locale rules for numeric
>> > +     *     formatting. This would need a double/int -> string formatter
>> > +     *     that is locale independant.
>> 
>> Aha.
>> 
>> Reminds me of the story of a FORTRAN compiler that rejected only
>> far-away customers' programs...
>> 
>> > +     *
>> > +     *   - The QMP monitor needs to encode all data as UTF-8. This needs
>> > +     *     to be updated to use iconv(3) to explicitly convert the current
>> > +     *     locale's charset into utf-8
>> 
>> Really?  What locale-dependent text gets sent to QMP?
>
> The main thing I can see would be filenames.
>
> Though having said it is UTF-8 on looking more closely I think QEMU is
> probably 8-bit clean in its handling, so will just be blindly passing
> whatever filename string it get from libvirt straight on to the kernel
> with no interpretation.

Sounds good to me.

> Libvirt has enabled UTF-8 validation in its JSON library when encoding
> data it sends to QEMU, so any data libvirt is sending will be a valid
> UTF-8 byte sequence at least. Libvirt doesn't axctually do any charset
> conversion though, so if libvirt runs in a non-UTF8 locale it will
> likely trip over this UTF-8 validation.

QMP input must be encoded in UTF-8.  Converting from other encodings to
UTF-8 is the QMP client's problem.

The more interesting direction is the one I inquired about: QMP output.
If locale-dependent text gets sent to QMP, converting it to UTF-8 is
QEMU's problem.

On closer look, anything but JSON string contents is plain ASCII by
construction.  JSON string contents gets assembled in to_json() case
QTYPE_QSTRING.  It expects QString to use UTF-8[*].  You can have any
locale as long as it uses ASCII or UTF-8.

>> > +     *
>> > +     *   - Lots of codes uses is{upper,lower,alnum,...} functions, 
>> > expecting
>> > +     *     C locale sorting behaviour. Most QEMU usage should likely be
>> > +     *     changed to g_ascii_is{upper,lower,alnum...} to match code
>> > +     *     assumptions, without being broken by locale settnigs.
>> > +     *
>> > +     * We do still have two requirements
>> > +     *
>> > +     *   - Ability to correct display translated text according to the
>> > +     *     user's locale
>> > +     *
>> > +     *   - Ability to handle multibyte characters, ideally according to
>> > +     *     user's locale specified character set. This affects ability
>> > +     *     of usb-mtp to correctly convert filenames to UCS16 and curses
>> > +     *     & GTK frontends wide character display.
>> > +     *
>> > +     * The second requirement would need LC_CTYPE to be honoured, but
>> > +     * this conflicts with the 2nd & 3rd problems listed earlier. For
>> > +     * now we make a tradeoff, trying to set an explicit UTF-8 localee
>> > +     *
>> > +     * Note we can't set LC_MESSAGES here, since mingw doesn't define
>> > +     * this constant in locale.h Fortunately we only need it for the
>> > +     * GTK frontend and that uses gi18n.h which pulls in a definition
>> > +     * of LC_MESSAGES.
>> > +     */
>> > +    setlocale(LC_CTYPE, "C.UTF-8");
>> > +
>> >      module_call_init(MODULE_INIT_TRACE);
>> >  
>> >      qemu_init_cpu_list();
>> 
>> We should've stayed out of the GUI business.
>
> This isn't only a GUI problem as above, it affects USB MTP.

I believe setlocale() in QEMU is basically wrong.  Finding all the
places that rely on the current locale when they shouldn't and
converting them to locale-independent alternatives is a huge amount of
work.  Even if we managed to complete it, it wouldn't stay complete.

Instead, find the places that have reason to use the locale, and fix
them to uselocale().



[*] Actually modified UTF-8, but I don't think we rely on the "modified"
part.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]