[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#24206: 25.1; Curly quotes generate invalid strings, leading to a seg
From: |
Paul Eggert |
Subject: |
bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault |
Date: |
Thu, 18 Aug 2016 11:33:36 -0700 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0 |
Eli Zaretskii wrote:
The code that got removed was the easy and intuitive part: it dealt
with processing single-byte strings one byte at a time. The
hard-to-read part of the code is still with us. We have less 'if'
conditionals, but that's hardly the main complication in the original
code.
Sure, but removing unnecessary easy stuff lets the reader see the hard stuff
more clearly.
You are missing my point: the code on master now processes a string,
that could be either unibyte or multibyte, using only multibyte
methods. With the flag in place, each kind of string would have used
the method that's natural with it. The way things are now, one has to
think hard about what the code does to convince oneself it's valid.
The way things were before it was even harder, because one had to worry not only
about processing Emacs-encoded text, one also had to worry about processing
unibyte text containing non-ASCII bytes. The code is simpler now, because it
needs only to process Emacs-encoded text.
The old code might have flown despite its problems, if all the input data were
consistent (i.e., either all unibyte, or all with Emacs-encoded text). But
inputs need not be consistent, so the old approach simply did not work.
As I take it, your principal objection to the new code is not to its internals:
it's that substitute-command-keys can now return a multibyte string even when
all the input data is unibyte. I don't think that's a big deal, but if this is
the primary reason for our lengthy conversation, I can move things forward by
changing the code so that it instead returns a unibyte string when all the input
data are unibyte. Would that suffice?
I don't see why it is tricky, we do that in Emacs in other places.
Really? A call to STRING_CHAR_AND_LENGTH followed by a length test followed by a
call to memcpy for length > 1 and a special case inline copy for length == 1?
When copying multibyte data? Where else does Emacs do that?
What exactly confuses you in that snippet?
Nothing confuses me in that snippet. I know what the snippet does, now that I've
read and understood it. It is a longwinded and unnecessarily tricky way of doing
something simple.
The call to
STRING_CHAR_AND_LENGTH itself? we have that in umpteen other places.
The single-byte optimization of not calling memcpy? That's standard
practice in C.
I'm not talking about each individual line of code in that snippet. I am talking
about the entire construction. A reader must look at all 14 lines to deduce what
it does, deduce that it's unnecessarily complicated, and deduce that the
unnecessary complication is not a sign that something unusual is going on. No
place else in Emacs has this construction.
If you need an example for using
STRING_CHAR_AND_LENGTH while copying text, you can find it in
copy_text, for example.
copy_text does something quite different. When copying multibyte text, it does a
single memcpy for the entire string. copy_text does not call memcpy for each
multibyte character, and nothing in copy_text is particularly close to the
snippet in question.
you replaced it with a
fall-through, which is harder to follow and easier to introduce subtle
bugs.
True, but the fall-through is clearly marked in comments. The new way is a bit
more efficent (smaller code space, more likely to fit in cache); the old way
duplicated character-copying code, a practice that also introduces subtle bugs
-- in addition to temporarily mystifying the reader because the old code
duplicates were not identical (itself a sign of cruft). With all that in mind,
it was reasonable to switch from the old way to the new, despite the
disadvantage you mention.
Alan wanted something that he could put into his .emacs that would cause
(message PERCENTLESS) to output the string PERCENTLESS as-is, assuming
PERCENTLESS lacks %. This was the point of his original bug report; his
original
example involved ` and ' but he wanted the same behavior for ‘ and ’, a
point
that became clear during the discussion of Bug#23425.
Then why not for '..' as well? How is that different from ‘..’?
It's not different. Alan wanted the same behavior for '..', and he got that too.
But the behavior is not the same:
I was referring to Alan's desire to treat all quotes the same (i.e., to not
substitute for any of them), which is now supported by setting
text-quoting-style to grave.
(let ((text-quoting-style 'curve))
(substitute-command-keys "'foo'"))
=> ’foo’
but
(let ((text-quoting-style 'grave))
(substitute-command-keys "‘foo’"))
=> ‘foo’
I would have expected the first example to yield 'foo'
No, substitute-command-keys works on each grave accent and apostrophe
separately, without looking at the others. As I recall it's worked that way and
has been documented that way, in both master and emacs-25, ever since the
feature was installed. One could posit a "smarter" form of substitution, which
leaves 'foo' alone but which translates `foo'. Although we considered that
possibility during design, we rejected it because it is more complicated and has
more problems and quirks that are a pain to document and would surprise users in
other ways.
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, (continued)
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, John Wiegley, 2016/08/16
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Dmitry Gutov, 2016/08/16
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Eli Zaretskii, 2016/08/16
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Eli Zaretskii, 2016/08/16
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Paul Eggert, 2016/08/16
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Eli Zaretskii, 2016/08/17
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Paul Eggert, 2016/08/17
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Eli Zaretskii, 2016/08/17
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Paul Eggert, 2016/08/17
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Eli Zaretskii, 2016/08/18
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault,
Paul Eggert <=
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Eli Zaretskii, 2016/08/18
- bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Dmitry Gutov, 2016/08/17
bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Dmitry Gutov, 2016/08/14
bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, Eli Zaretskii, 2016/08/14