bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B


From: Adam Tack
Subject: bug#13399: 24.3.50; Word-wrap can't wrap at zero-width space U-200B
Date: Wed, 13 Dec 2017 04:00:56 +0000

Sorry for not working further on this, but I didn't have time.  I will
get back to finishing this, soon.

> Hmm... not sure why you arrived at this conclusion.  E.g., what's
> wrong with the implementation at the bottom of this message?

This was very similar to my first try.  Unfortunately, it doesn't work
correctly in whitespace-mode, even with just normal spaces, regressing
on Bug#11341.

(with-current-buffer (get-buffer-create "*bar*")
  (dotimes (i 1000)
    (insert "1234 ")) ; Space
  (setq word-wrap t)
  (whitespace-mode)
  (display-buffer "*bar*"))

The spaces are displayed as `ยท', so it->c returns 183, none of the
further tests are checked and IT_DISPLAYING_WHITESPACE returns False.
(In the currently used implementation, if it->c is not one of ' ' or '\t'
then the later tests are all checked.)

I thought about changing the order of the tests to something like the
following (ignoring the special case of ' ' and '\t', here, for
brevity):

static inline bool
IT_DISPLAYING_WHITESPACE (struct it *it) {
  int c;
  if (IT_BYTEPOS (*it) < ZV_BYTE)
    c = FETCH_CHAR (IT_BYTEPOS (*it));
  else if (it->what == IT_CHARACTER)
    c = it->c;
  else if (STRINGP (it->string))
    c = STRING_CHAR (SDATA (it->string) + IT_STRING_BYTEPOS (*it));
  else if (it->s)
    c = STRING_CHAR (it->s + IT_BYTEPOS (*it));
  else
    return false;

  return !NILP (CHAR_TABLE_REF (Vword_wrap_chars, c));
}

which in the case of whitespace-mode does TRT, but I worried that
there might be situations where wrapping on the display character
is correct.  The crux (as I had previously, but very unclearly,
written) is that under "normal" circumstances, both
`(it->what == IT_CHARACTER)' and `(IT_BYTEPOS (*it) < ZV_BYTE)'
are true.

Additionally, I wasn't sure whether there should be a fall-through,
since on the one hand, it prevents emacs crashing if (weirdly) all the
previous tests return false, but on the other, it might preclude some magic
compiler optimisation.

Chaining ORs side-stepped both issues, so I settled on keeping it, though
it might have been the wrong decision.

> > ii) vim's breakat characters (default " ^I!@*-+;:,./?"), since
> > presumably they had given it some thought,

> Maybe.  I'm not sure in what modes this would be TRT.

It should almost certainly not be the default in any mode, but it
might, perhaps, be a useful, pre-defined option for some users.  (For
instance, when wrapping long URLs or paths in comments:

|;;                                                     |
|https://very.long.url/that-will-not-fit-on-a-single-lin|
|e-anyway-but-could-at-least-start-on-the-same-line-as-t|
|he-comment-sign-and-break-at-slightly-more-logical-plac|
|es                                                     |

looks (IMO at least!) less aesthetically pleasing than:

|;; https://very.long.url/that-will-not-fit-on-a-single-|
|line-anyway-but-could-at-least-start-on-the-same-line- |
|as-the-comment-sign-and-break-at-slightly-more-logical-|
|places                                                 |

where `|' is the margin.

The same sometimes holds for excessively long variable names.  I
definitely wouldn't impose this preference on others, but I assume
that some might share it.)  Using vim's choice helps avoid
bike-shedding.

> We already import several UCD files, see admin/unidata, where you will
> also find copyright.html from the Unicode Consortium.

Great! That's convenient.

> test/manual is okay.

Thanks!

> This should probably go into simple.el.

I'll move it there.


Thanks for the help!





reply via email to

[Prev in Thread] Current Thread [Next in Thread]