emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#45660: closed (28.0.50; Changed word/whitespace syntax)


From: GNU bug Tracking System
Subject: bug#45660: closed (28.0.50; Changed word/whitespace syntax)
Date: Fri, 08 Jan 2021 12:07:02 +0000

Your message dated Fri, 08 Jan 2021 14:06:11 +0200
with message-id <83czyfk4zw.fsf@gnu.org>
and subject line Re: bug#45660: 28.0.50; Changed word/whitespace syntax
has caused the debbugs.gnu.org bug report #45660,
regarding 28.0.50; Changed word/whitespace syntax
to be marked as done.

(If you believe you have received this mail in error, please contact
help-debbugs@gnu.org.)


-- 
45660: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=45660
GNU Bug Tracking System
Contact help-debbugs@gnu.org with problems
--- Begin Message --- Subject: 28.0.50; Changed word/whitespace syntax Date: Mon, 04 Jan 2021 19:25:23 +0200 User-agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (x86_64-pc-linux-gnu)
Some unidentified recent change during the last week broke the
definition of word syntax and whitespace syntax.  I noticed the
change of behavior in markchars-mode that now disregards the character
"NARROW NO-BREAK SPACE" as the word separator between thousands, i.e.:

In Emacs 27:
(and (string-match "\\<\\w+\\>" "4 096") (match-end 0))
1

In Emacs 28:
(and (string-match "\\<\\w+\\>" "4 096") (match-end 0))
5

Note there is the character "NARROW NO-BREAK SPACE" between "4" and "096".

Please close this bug report if this change was intentional
because if it provides more correct definitions
then other code could be adopted to such change.



--- End Message ---
--- Begin Message --- Subject: Re: bug#45660: 28.0.50; Changed word/whitespace syntax Date: Fri, 08 Jan 2021 14:06:11 +0200
> From: Juri Linkov <juri@linkov.net>
> Cc: 45660@debbugs.gnu.org
> Date: Tue, 05 Jan 2021 20:20:44 +0200
> 
> > Previously, many characters, including u+202F, had the punctuation
> > ('.') syntax.  I modified that to be more close to the Unicode
> > Character Database (UCD), and u+202F is not a punctuation character
> > according to the UCD.  It has the Zs general category, which means
> > "space separator", the same as SPC, NBSP, EN SPACE, and others.
> 
> So according to the Unicode standard it should have whitespace syntax?
> 
> And indeed, I see no reason for similar characters to have different syntax:
> 
>   name: NO-BREAK SPACE
>   general-category: Zs (Separator, Space)
>   syntax:     which means: whitespace
> 
>   name: NARROW NO-BREAK SPACE
>   general-category: Zs (Separator, Space)
>   syntax: w   which means: word
> 
> > Removing u+202F and other similar characters from the "punctuation"
> > group had the side effect of leaving it at the default 'w' syntax.
> >
> > Should we make all Zs characters have the ' ' (whitespace) syntax?
> > That should be easy, but we should try being consistent in this
> > regard.
> 
> Should the word characters separated by NO-BREAK SPACE by treated as one word?
> If there is no reason to treat space characters as part of words, then all
> characters with the Zs general category could have the same whitespace syntax.

No further comments, so I've now made the change on master whereby all
characters with Zs general category are given the whitespace syntax.

I'm therefore closing this bug; please reopen if there any left-overs
or undesired effects.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]