qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all cod


From: Daniel P . Berrangé
Subject: Re: [Qemu-devel] [PATCH v2] vl: set LC_CTYPE early in main() for all code
Date: Tue, 16 Apr 2019 10:03:58 +0100
User-agent: Mutt/1.11.3 (2019-02-01)

On Tue, Apr 16, 2019 at 09:49:09AM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé <address@hidden> writes:
> 
> > Localization is not a feature whose impact is limited to the UI
> > frontends. Other parts of QEMU rely in localization.
> 
> Typically by accident :)
> 
> >                                                      In particular the
> > USB MTP driver needs to be able to convert filenames from the locale
> > specified character set into UTF-16 / UCS-2 encoded wide characters.
> 
> Oookay.  I guess that makes some sense.
> 
> > setlocale() is only set from two of the UI frontends though, and worse,
> > there is inconsistent behaviour with GTK setting LC_CTYPE to C.UTF-8,
> > while ncurses honours whatever is set in the user's environment.
> >
> > This causes the USP MTP driver to behave differently depending on which
> > UI frontend is activated.
> >
> > Furthermore, the curses settings are dangerous because LC_CTYPE will affect
> > the is{upper,lower,alnum} functions which much QEMU code assumes to have
> > C locale sorting behaviour. This also breaks QMP if the env requests a
> > non-UTF-8 locale, since QMP is defined to use UTF-8 encoding for JSON.
> 
> Can you elaborate on how QMP breaks with a non-UTF-8 locale?  Ah, you do
> in the comment you add to vl.c.
> 
> > This problematic curses code was introduced in:
> >
> >   commit 2f8b7cd587558944532f587abb5203ce54badba9
> >   Author: Samuel Thibault <address@hidden>
> >   Date:   Mon Mar 11 14:51:27 2019 +0100
> >
> >     curses: add option to specify VGA font encoding
> >
> > This patch moves the GTK frontend setlocale() handling into the main()
> > method. This ensures QMP and other QEMU code has a predictable C.UTF-8.
> >
> > Eventually QEMU should set LC_ALL, honouring the full user environment,
> > but this needs various cleanups in QEMU code first.
> 
> Eventually, we'll all be dead.
> 
> >                                                     Hardcoding LC_CTYPE
> > to C.UTF-8 is a partial regression vs the above curses commit, since it
> > will break the curses wide character handling for non-UTF-8 locales but
> > this is unavoidable until QEMU is cleaned up to cope with non-UTF-8
> > locales fully.
> >
> > Setting of LC_MESSAGES is left in the GTK code since only the GTK
> > frontend is using translation of strings. This lets us avoid the
> > platform portability problem where LC_MESSAGES is not provided by
> > locale.h on MinGW. GTK pulls it in indirectly from libintl.h via
> > gi18n.h header, but we don't want to pull that into the global
> > QEMU namespace.
> >
> > Signed-off-by: Daniel P. Berrangé <address@hidden>
> > ---
> >
> > Changed in v2:
> >
> >  - Leave LC_MESSAGES setting in gtk code to avoid platform
> >    portability problems.
> >
> >  ui/curses.c |  2 --
> >  ui/gtk.c    | 32 +++++++++++---------------------
> >  vl.c        | 40 ++++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 51 insertions(+), 23 deletions(-)
> >
> > diff --git a/ui/curses.c b/ui/curses.c
> > index cc6d6da684..403cd57913 100644
> > --- a/ui/curses.c
> > +++ b/ui/curses.c
> > @@ -27,7 +27,6 @@
> >  #include <sys/ioctl.h>
> >  #include <termios.h>
> >  #endif
> > -#include <locale.h>
> >  #include <wchar.h>
> >  #include <langinfo.h>
> >  #include <iconv.h>
> > @@ -716,7 +715,6 @@ static void curses_display_init(DisplayState *ds, 
> > DisplayOptions *opts)
> >      }
> >  #endif
> >  
> > -    setlocale(LC_CTYPE, "");
> >      if (opts->u.curses.charset) {
> >          font_charset = opts->u.curses.charset;
> >      }
> > diff --git a/ui/gtk.c b/ui/gtk.c
> > index e96e15435a..e6a41c79ab 100644
> > --- a/ui/gtk.c
> > +++ b/ui/gtk.c
> > @@ -2208,12 +2208,12 @@ static void gtk_display_init(DisplayState *ds, 
> > DisplayOptions *opts)
> >  
> >      s->free_scale = FALSE;
> >  
> > -    /* Mostly LC_MESSAGES only. See early_gtk_display_init() for details. 
> > For
> > -     * LC_CTYPE, we need to make sure that non-ASCII characters are 
> > considered
> > -     * printable, but without changing any of the character classes to make
> > -     * sure that we don't accidentally break implicit assumptions.  */
> > +    /*
> > +     * See comment in main() for why it has only setup LC_CTYPE,
> > +     * as opposed to LC_ALL. Given that we need to enable
> > +     * LC_MESSAGES too for menu translations.
> > +     */
> >      setlocale(LC_MESSAGES, "");
> > -    setlocale(LC_CTYPE, "C.UTF-8");
> >      bindtextdomain("qemu", CONFIG_QEMU_LOCALEDIR);
> >      textdomain("qemu");
> >  
> > @@ -2262,22 +2262,12 @@ static void gtk_display_init(DisplayState *ds, 
> > DisplayOptions *opts)
> >  
> >  static void early_gtk_display_init(DisplayOptions *opts)
> >  {
> > -    /* The QEMU code relies on the assumption that it's always run in
> > -     * the C locale. Therefore it is not prepared to deal with
> > -     * operations that produce different results depending on the
> > -     * locale, such as printf's formatting of decimal numbers, and
> > -     * possibly others.
> > -     *
> > -     * Since GTK+ calls setlocale() by default -importing the locale
> > -     * settings from the environment- we must prevent it from doing so
> > -     * using gtk_disable_setlocale().
> > -     *
> > -     * QEMU's GTK+ UI, however, _does_ have translations for some of
> > -     * the menu items. As a trade-off between a functionally correct
> > -     * QEMU and a fully internationalized UI we support importing
> > -     * LC_MESSAGES from the environment (see the setlocale() call
> > -     * earlier in this file). This allows us to display translated
> > -     * messages leaving everything else untouched.
> > +    /*
> > +     * GTK calls setlocale() by default, importing the locale
> > +     * settings from the environment. QEMU's main() method will
> > +     * have set LC_MESSAGES and LC_CTYPE to allow GTK to display
> > +     * translated messages, including use of wide characters. We
> > +     * must not allow GTK to change other aspects of the locale.
> >       */
> >      gtk_disable_setlocale();
> >      gtkinit = gtk_init_check(NULL, NULL);
> > diff --git a/vl.c b/vl.c
> > index c696ad2a13..87a55cddb3 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -49,6 +49,7 @@ int main(int argc, char **argv)
> >  #define main qemu_main
> >  #endif /* CONFIG_COCOA */
> >  
> > +#include <locale.h>
> >  
> >  #include "qemu/error-report.h"
> >  #include "qemu/sockets.h"
> > @@ -3022,6 +3023,45 @@ int main(int argc, char **argv, char **envp)
> >      char *dir, **dirs;
> >      BlockdevOptionsQueue bdo_queue = QSIMPLEQ_HEAD_INITIALIZER(bdo_queue);
> >  
> > +    /*
> > +     * Ideally we would set LC_ALL, but QEMU currently isn't able to cope
> > +     * with arbitrary localization settings. In particular there are two
> > +     * known problems
> > +     *
> > +     *   - The QMP monitor needs to use the C locale rules for numeric
> > +     *     formatting. This would need a double/int -> string formatter
> > +     *     that is locale independant.
> 
> Aha.
> 
> Reminds me of the story of a FORTRAN compiler that rejected only
> far-away customers' programs...
> 
> > +     *
> > +     *   - The QMP monitor needs to encode all data as UTF-8. This needs
> > +     *     to be updated to use iconv(3) to explicitly convert the current
> > +     *     locale's charset into utf-8
> 
> Really?  What locale-dependent text gets sent to QMP?

The main thing I can see would be filenames.

Though having said it is UTF-8 on looking more closely I think QEMU is
probably 8-bit clean in its handling, so will just be blindly passing
whatever filename string it get from libvirt straight on to the kernel
with no interpretation.

Libvirt has enabled UTF-8 validation in its JSON library when encoding
data it sends to QEMU, so any data libvirt is sending will be a valid
UTF-8 byte sequence at least. Libvirt doesn't axctually do any charset
conversion though, so if libvirt runs in a non-UTF8 locale it will
likely trip over this UTF-8 validation.



> 
> > +     *
> > +     *   - Lots of codes uses is{upper,lower,alnum,...} functions, 
> > expecting
> > +     *     C locale sorting behaviour. Most QEMU usage should likely be
> > +     *     changed to g_ascii_is{upper,lower,alnum...} to match code
> > +     *     assumptions, without being broken by locale settnigs.
> > +     *
> > +     * We do still have two requirements
> > +     *
> > +     *   - Ability to correct display translated text according to the
> > +     *     user's locale
> > +     *
> > +     *   - Ability to handle multibyte characters, ideally according to
> > +     *     user's locale specified character set. This affects ability
> > +     *     of usb-mtp to correctly convert filenames to UCS16 and curses
> > +     *     & GTK frontends wide character display.
> > +     *
> > +     * The second requirement would need LC_CTYPE to be honoured, but
> > +     * this conflicts with the 2nd & 3rd problems listed earlier. For
> > +     * now we make a tradeoff, trying to set an explicit UTF-8 localee
> > +     *
> > +     * Note we can't set LC_MESSAGES here, since mingw doesn't define
> > +     * this constant in locale.h Fortunately we only need it for the
> > +     * GTK frontend and that uses gi18n.h which pulls in a definition
> > +     * of LC_MESSAGES.
> > +     */
> > +    setlocale(LC_CTYPE, "C.UTF-8");
> > +
> >      module_call_init(MODULE_INIT_TRACE);
> >  
> >      qemu_init_cpu_list();
> 
> We should've stayed out of the GUI business.

This isn't only a GUI problem as above, it affects USB MTP.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



reply via email to

[Prev in Thread] Current Thread [Next in Thread]