bug#24206: 25.1; Curly quotes generate invalid strings, leading to a seg

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#24206: 25.1; Curly quotes generate invalid strings, leading to a seg

From:	Paul Eggert
Subject:	bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault
Date:	Thu, 18 Aug 2016 11:33:36 -0700
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.2.0

Eli Zaretskii wrote:

The code that got removed was the easy and intuitive part: it dealt
with processing single-byte strings one byte at a time.  The
hard-to-read part of the code is still with us.  We have less 'if'
conditionals, but that's hardly the main complication in the original
code.

Sure, but removing unnecessary easy stuff lets the reader see the hard stuffmore clearly.

You are missing my point: the code on master now processes a string,
that could be either unibyte or multibyte, using only multibyte
methods.  With the flag in place, each kind of string would have used
the method that's natural with it.  The way things are now, one has to
think hard about what the code does to convince oneself it's valid.

The way things were before it was even harder, because one had to worry not onlyabout processing Emacs-encoded text, one also had to worry about processingunibyte text containing non-ASCII bytes. The code is simpler now, because itneeds only to process Emacs-encoded text.

The old code might have flown despite its problems, if all the input data wereconsistent (i.e., either all unibyte, or all with Emacs-encoded text). Butinputs need not be consistent, so the old approach simply did not work.

As I take it, your principal objection to the new code is not to its internals:it's that substitute-command-keys can now return a multibyte string even whenall the input data is unibyte. I don't think that's a big deal, but if this isthe primary reason for our lengthy conversation, I can move things forward bychanging the code so that it instead returns a unibyte string when all the inputdata are unibyte. Would that suffice?

    I don't see why it is tricky, we do that in Emacs in other places.

Really? A call to STRING_CHAR_AND_LENGTH followed by a length test followed by a 
call to memcpy for length > 1 and a special case inline copy for length == 1? 
When copying multibyte data? Where else does Emacs do that?


What exactly confuses you in that snippet?

Nothing confuses me in that snippet. I know what the snippet does, now that I'veread and understood it. It is a longwinded and unnecessarily tricky way of doingsomething simple.

The call to
STRING_CHAR_AND_LENGTH itself? we have that in umpteen other places.
The single-byte optimization of not calling memcpy?  That's standard
practice in C.

I'm not talking about each individual line of code in that snippet. I am talkingabout the entire construction. A reader must look at all 14 lines to deduce whatit does, deduce that it's unnecessarily complicated, and deduce that theunnecessary complication is not a sign that something unusual is going on. Noplace else in Emacs has this construction.

If you need an example for using
STRING_CHAR_AND_LENGTH while copying text, you can find it in
copy_text, for example.

copy_text does something quite different. When copying multibyte text, it does asingle memcpy for the entire string. copy_text does not call memcpy for eachmultibyte character, and nothing in copy_text is particularly close to thesnippet in question.

you replaced it with a
fall-through, which is harder to follow and easier to introduce subtle
bugs.

True, but the fall-through is clearly marked in comments. The new way is a bitmore efficent (smaller code space, more likely to fit in cache); the old wayduplicated character-copying code, a practice that also introduces subtle bugs-- in addition to temporarily mystifying the reader because the old codeduplicates were not identical (itself a sign of cruft). With all that in mind,it was reasonable to switch from the old way to the new, despite thedisadvantage you mention.

        Alan wanted something that he could put into his .emacs that would cause
        (message PERCENTLESS) to output the string PERCENTLESS as-is, assuming
        PERCENTLESS lacks %. This was the point of his original bug report; his 
original
        example involved ` and ' but he wanted the same behavior for ‘ and ’, a 
point
        that became clear during the discussion of Bug#23425.

    Then why not for '..' as well?  How is that different from ‘..’?

It's not different. Alan wanted the same behavior for '..', and he got that too.


But the behavior is not the same:

I was referring to Alan's desire to treat all quotes the same (i.e., to notsubstitute for any of them), which is now supported by settingtext-quoting-style to grave.

  (let ((text-quoting-style 'curve))
    (substitute-command-keys "'foo'"))
      => ’foo’

but

  (let ((text-quoting-style 'grave))
    (substitute-command-keys "‘foo’"))
      => ‘foo’

I would have expected the first example to yield 'foo'

No, substitute-command-keys works on each grave accent and apostropheseparately, without looking at the others. As I recall it's worked that way andhas been documented that way, in both master and emacs-25, ever since thefeature was installed. One could posit a "smarter" form of substitution, whichleaves 'foo' alone but which translates `foo'. Although we considered thatpossibility during design, we rejected it because it is more complicated and hasmore problems and quirks that are a pain to document and would surprise users inother ways.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault, (continued)

Prev by Date: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault
Next by Date: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault
Previous by thread: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault
Next by thread: bug#24206: 25.1; Curly quotes generate invalid strings, leading to a segfault
Index(es):
- Date
- Thread