[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: SVN 1704 completely broke libedif
From: |
Chris Moller |
Subject: |
Re: SVN 1704 completely broke libedif |
Date: |
Wed, 7 Jun 2023 11:27:04 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 |
Thanks for the explanation--very much appreciated.
I more or less stumbled onto 1. and am using that in every
relevant circumstance, but I'll revise to 4. (I can't generally
use 5. because some of the strings use what I assume are non-ASCII
characters like "←".)
Thanks again,
Chris
On 6/7/23 10:24, Dr. Jürgen Sauermann wrote:
Hi Chris,
wrapping arbitrary (= UTF8-encoded) strings into UTF8_string first
is
the proper way to go. Consider the differences between:
1. UCS_string yyy(UTF8_string(xxx)); // almost proper,
but ambiguous (most vexing parse error)
2. UCS_string yyy(xxx); // now private:
so never use it
3a. UTF8_string utf(xxx); // really proper
3b. UCS_string yyy(utf);
4. UCS_string yyy((UTF8_string(xxx))); //
also proper (this is 1. without the most vexing parse error)
5. UCS_ASCII_string yyy(xxx)
If xxx is entirely ASCII then all of the above are
equivalent.
Otherwise the difference is that 1. properly decodes UTF8-encoded
strings while the old 2. (which is now disabled by private:) did
not
(and the compiler has no way to detect an incorrect usage of 2.
Even worse, C++ would sometimes do 2. automatically (and
incorrectly)
and without notice. Probably some of the recent Tokenization
Errors
reported on bug-apl were caused by this.
Although 1. was throwing an assertion when used incorrectly, some
people wrapped a try {} catch {} around it which caused
the error
to slip through unnoticed (at least up to the tokenizer).
A somewhat unfortunate decision in the C++11 ff. standards was to
resolve yyy in 1. (which is ambiguous at a closer look)
into a declaration
of function yyy() and not (as gcc still does) into two
constructor calls
UTF8_string(xxx) followed by UCS_string() with the
first. This problem
can apparently be avoided by using 4. instead of 1. (note the
extra pair
of () which is NOT redundant).
Finally, 5. is a safe replacement for 2. (and the comment in the .hh
file
is still valid (so xxx MUST be ASCII), which should
hopefully avoid the
automatic use of 2. by the compiler. It is also easier to use with
grep
in order to spot the (still possible) incorrect usage of 5.
Hope this helps,
Jürgen
On 6/6/23 22:13, Chris Moller wrote:
Yeah, I saw your comment in one of the .hh
files. What I did was wrap all the edif ASCII strings in
UTF8_string() calls. That works, but if it's circumventing
what you're trying to do, let me know and I'll think of
something else.
Even after a lot of years, I'm still not sure of the
differences between UTF, UCS, Unicode, etc, etc.
--cm
On 6/6/23 15:56, Dr. Jürgen
Sauermann wrote:
Hi,
sorry for that. The reason for making it private is to
entirely prevent its usage.
The former implementation of of it only worked for ASCII
strings. There was
a note about that in the header file, but I have seen quite a
few incorrect
usages of it (read: with UTF8-encoded strings) which then
caused other, difficult
to find, errors later on.
Best Regards,
Jürgen
On 6/6/23 17:31, Chris Moller
wrote:
Hi, Xtian,
Just pushed a fix for edif if you want to give it a try.
Works for me on SVN 1706 and yesterday's SVN 1708.
--cm
On 6/5/23 03:33, Christian
Robert wrote:
SVN
1704 completely broke libedif
Juergen made UCS_string (const char *) a private member
of the class
so a lot of compile errors in edif.cc ...
Not sure if this can be fixed. I reverted to SVN 1702
meanwhile. The is no way I'll revert to the "DEL Editor" !
Xtian.
OpenPGP_0xDA6C01938888083E.asc
Description: OpenPGP public key
OpenPGP_signature
Description: OpenPGP digital signature