lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re[2]: [lmi] patch: compilation fix for wx 2.9


From: Vadim Zeitlin
Subject: Re[2]: [lmi] patch: compilation fix for wx 2.9
Date: Sat, 15 May 2010 14:24:35 +0200

On Sat, 15 May 2010 11:49:07 +0000 Greg Chicares <address@hidden> wrote:

GC> See the patch below[0], which is extended to two additional files.
GC> I think this is what you intend, but I don't think it does what
GC> you expect.

 I think it does what I _need_, that is to allow compilation with MSVC. And
I don't believe it introduces any bad side effects while doing this.

GC> Is this change a prerequisite for other GUI enhancements that depend
GC> on updating wx? Even with gcc?

 No. This is a work-around for MSVC-specific standard library extension.
Normally std::ofstream only accepts char* file names (as does
std::wofstream) but MSVC extends both of them to accept wide character file
names as well by providing a non-standard ctor overload. Before we start
bashing Microsoft though, notice that this is a reasonable thing to do
under Windows where the file names are encoded using UTF-16 at OS level.
Unix systems can get away with using only char* file names because UTF-8 is
represented as char* in C (whether this is a good thing is another story
entirely and IMO it isn't unless all strings in your program are UTF-8).
But without this extension C++ programs under Windows would be completely
unable to open files with Unicode characters in them.

 Anyhow, this extension unfortunately conflicts with wxWidgets 2.9 "smart"
wxString class which is implicitly convertible to both char* and wchar_t*.
As std::ofstream ctor exists in both overloads, there is an ambiguity when
calling it with wxString or wxCStrData (which is a similarly "smart" class
returned by wxString::c_str()).

GC> Or is it just an msvc workaround that we could conditionalize?

 We could but I don't think we should. It does no harm for the other
compilers and platforms:

1. With MinGW under Windows the result of the call to c_str() is already
   converted to char* because it doesn't have (yet?) this MS-specific
   std::ofstream ctor overload. So it does exactly the same thing as
   mb_str().

2. Under non-Windows systems with any compiler this again already does the
   same thing as mb_str() does and, moreover, there it's the right thing
   to do (assuming UTF-8 is used for the file names which is the case for
   all current systems I know of).

Moreover, mb_str() at least clearly shows that we do not support Unicode
here. Using c_str() here still doesn't but it's not obvious when reading
the code.

GC> As shown below, we seem to have multiple problems with multibyte
GC> characters, so deliberately introducing them via mb_str() where
GC> not strictly necessary doesn't seem best.

 These problems are present in the code currently and should be exactly the
same without the patch.

GC> > The potential data loss I mentioned in my initial message is still
GC> > possible, of course, but we'd probably need to use wxWidgets classes to
GC> > avoid it and it would be a far less trivial change. It still seems wrong 
to
GC> > me that not only LMI currently will fail to open a file with non-ASCII
GC> > characters in name but it won't even give any proper error message for it.
GC> > Maybe we should at least correct the latter?
GC> 
GC> With the patch, I still don't see a helpful diagnostic.

 No, it doesn't implement the above suggestion. It only fixes compilation
for MSVC without changing anything at all in the run-time behaviour.

 The right thing to do would be to use wide character names for the files
under Windows. As std::ofstream doesn't support this (putting aside the
Microsoft extensions), it would mean switching to wx classes (which do
support this) from standard ones.

 Without this the only thing we can do is to give an error message if a
non-ASCII file name is used. This is not ideal, to say the least, but
would still be better than the current situation. Should I make a patch
implementing this?

GC> Furthermore, even if we got past those difficulties:
GC> 
GC>  - MinGW gcc, IIRC, doesn't support wide strings, at least not for
GC>    gcc-3.x (which is what we still use for production). Wait...
GC>    you've deliberately chosen a NTMBS and not a NTWCS, so it's okay:
GC>    it's still a narrow string. However...

 Even 4.5 still doesn't seem to have any extensions to support creating
std::fstream from a Unicode file name. So to make it really work we have no
choice but to use wx code.

GC>  - For printing, we use apache 'fop', which seems to restrict us
GC>    to a subset of the ASCII filenames that msw would accept. See the
GC>    documentation for orthodox_filename() in lmi's 'path_utility.cpp',
GC>    or click here:
GC>      
http://www.tt-solutions.com/vz/lmi/docs/path__utility_8cpp.html#a6a72c37f6e4f2b09366aa92a55d1e23a

 A very convenient link, thanks :-)

 Anyhow, I don't know about fop but I'm surprised it doesn't support
Unicode.

GC>  - lmi's problem domain really is quite USA-specific because life
GC>    insurance is governed by national regulation. I doubt it would
GC>    be useful even in Canada. No end user has ever asked us to
GC>    support non-ASCII filenames.

 I still think it's not impossible that e.g. a native Spanish-speaker might
be using LMI and want to use some non-ASCII name for the file. Of course,
it's not a vital feature of the program and I understand that implementing
it might not be a priority (which is very unusual as normally users get
pretty upset if the program doesn't allow them to use file names in their
own language -- but LMI is indeed exceptional, as you say above). But it
should at least give an understandable error message.

GC> Can we make the diagnostics say something more than
GC>   Unable to save ''.
GC> ? If not, then is there a way to constrain filenames to strings
GC> that we can at least print in diagnostics?

 AFAICS the only way for a Unicode string to find it into LMI now is via
the file selection dialog. Unfortunately LMI doesn't use this dialog itself
but gets the file names from the docview framework. So this problem could
be solved in two ways:

1. Straightforward but error-prone: find all places in the program where we
   use the file names and add checks for their ASCII-ness there. This is as
   simple as it gets but the problem could easily creep in if/when new code
   operating on file names is added.

2. Elegant but more difficult: allow virtualizing all calls to the file
   dialog and override them at LMI level to ensure that only ASCII file
   names are accepted. This is nice because it will solve the problem once
   and forever and will do it without having to intersperse the main code
   with the ugly checks (which also means that this check could be easily
   removed all at once LMI ever does decide to support Unicode filenames).
   It also has nice side benefits for testing (if the call to modal dialog
   is virtualized it becomes easy to invoke functions which use it during
   automatic tests as they can just return a predefined answer from the
   overridden version without showing the dialog interactively at all) and,
   of course, I'd be interested in doing this according to the usual
   principle of "Why spend 15 minutes on doing this manually if you can
   spend 5 months on automating it". But this will clearly take more time.

 What do you think?
VZ

reply via email to

[Prev in Thread] Current Thread [Next in Thread]