emacs-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debbugs-tracker] bug#27526: closed (25.1; Nonconformance to Unicode bid


From: GNU bug Tracking System
Subject: [debbugs-tracker] bug#27526: closed (25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator)
Date: Tue, 18 Jul 2017 15:56:02 +0000

Your message dated Tue, 18 Jul 2017 18:55:10 +0300
with message-id <address@hidden>
and subject line Re: bug#27526: 25.1; Nonconformance to Unicode 
bidirectionality algorithm due to paragraph separator
has caused the debbugs.gnu.org bug report #27526,
regarding 25.1; Nonconformance to Unicode bidirectionality algorithm due to 
paragraph separator
to be marked as done.

(If you believe you have received this mail in error, please contact
address@hidden)


-- 
27526: http://debbugs.gnu.org/cgi/bugreport.cgi?bug=27526
GNU Bug Tracking System
Contact address@hidden with problems
--- Begin Message --- Subject: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Date: Thu, 29 Jun 2017 12:16:00 +0300
According to the Emacs manual (section 37.26 Bidirectional Display)

>  Emacs provides a “Full Bidirectionality” class implementation of the
>  UBA, consistent with the requirements of the Unicode Standard v8.0.

And again (section 22.19 Bidirectional Editing)

> Emacs implements the Unicode Bidirectional Algorithm described in the Unicode 
> Standard Annex #9, for reordering of bidirectional text for display.

However these statements are false. Emacs does not implement the Unicode
Bidirectional Algorithm correctly, and therefore does not even provide
'Implicit bidirectionality', which is the minimal level of conformance
listed in section 4.2 'Explicit Formatting Character' of the Unicode
8.0.0 Bidirectional Algorithm specifications
(www.unicode.org/reports/tr9/tr9-33.html), let alone 'Full bidirectionality'.

The reason has to do with the way the Emacs bidi implementation
recognizes separate paragraphs, which is inconsistent with the Unicode
specifications.

The unicode Bidirectional Algorithm, specify (section 3 'Basic
Display Algorithm')

> The algorithm reorders text only within a paragraph; characters in one
> paragraph have no effect on characters in a different
> paragraph. Paragraphs are divided by the Paragraph Separator or
> appropriate Newline Function (for guidelines on the handling of CR,
> LF, and CRLF, see Section 4.4, Directionality, and Section 5.8,
> Newline Guidelines of [Unicode]).

However Emacs, by its own admition (section 22.19 Bidirectional
Editing), take the following approach:

> Paragraph boundaries are empty lines, i.e., lines consisting entirely of 
> whitespace characters.

I'll repeat: according to Unicode a paragraph ends with a paragraph
separator. What constitutes a paragraph separator is specified precisely
in section 5.8 'Newline Guidelines' of The Unicode Standard version
8.0.0. For instance, on a MacOS X system, it is `LF` (line feed,
Unicode 000A). The formatting effects of the bidi algorithm must not
cross the paragraph separator boundary.

And yet in Emacs the formatting extend beyond the paragraph separator,
and this is the case on all operating systems. Consider, for instance,
the following example.

ILLUSTRATION: An English paragraph directly following a Hebrew paragraph
is formatted like Hebrew text.
http://imgur.com/3eyrUfA

The first, Hebrew paragraph is formatted correctly, however the second,
English paragraph is formatted wrongly, as though it was a Hebrew
paragraph: it is right justified, the question mark appears on the left,
and so does the cursor. Once an empty paragraph is inserted between the two
paragraph, the English paragraph is formatted correctly.

ILLUSTRATION: When paragraphs are separated by an empty paragraph, they
are formatted correctly.
http://imgur.com/ZsHGkwf

This is not just a theoretical question of conformance to standards;
this problem has practical consequences.

Consider, for
instance, a LaTeX document for typesetting Hebrew
text. Normally in order to eliminate the usual leading indentation of
the first line of a paragraph, a `\noinent` command is placed at the
beginning of the paragraph. However, because the Unicode bidi algorithm
determins the directionality of a paragraph based on its first word, the
Hebrew text is formatted like English text. This is not a problem; it is
to be expected.

ILLUSTRATION: A LaTeX document for typesetting a Hebrew paragraph with
no indentation of the first line.
http://imgur.com/xYUkZKr

One way to resolve this is to explicitly change the directionality of the
paragraph, however, disregarding the fact that this is not currently
possible due to a separate Emacs bug, even if it were possible, it would
affect the placement of the backslash at the beginning of the
`\noindent` command, which will no longer look like a LaTeX command.

ILLUSTRATION: Explicitly changing the directionality of the
paragraph.
http://imgur.com/sPcVReA

(Note: This is a screenshot of a Microsoft Word application,
since due to a bug, Emacs doesn't currently enable to change the
automatically determined directionality of a paragraph.)

So the best way to resolve this problem would be to place the `\noindent`
command on a separate paragraph. Unfortunately, here Emacs' faulty
implementatino of the Unicode bidi algorithm rears its ugly
head. Since Emacs doesn't recognize the paragraph separator for what it
is, it will format the Hebrew text wrongly as though it were an English text.

ILLUSTRATION: Putting the `\noindent` on a separate paragraph results in
the Hebrew text being formatted like English text
http://imgur.com/44ds6rK

Placing an empty paragraph between the `\noindent' command and the
Hebrew text will resolve the formatting problem inside the Emacs editor, but
now the `\indent` command, which only affects the current LaTeX
paragraphs (LaTeX paragraphs are ended by an empty line), no longer
eliminates the indentation of the first line of the Hebrew paragraph in
the typeset file.



In GNU Emacs 25.1.1 (x86_64-apple-darwin13.4.0, NS appkit-1265.21
Version 10.9.5 (Build 13F1911))
 of 2016-09-21 built on builder10-9.porkrind.org
Windowing system distributor 'Apple', version 10.3.1504
Configured using:
 'configure --with-ns '--enable-locallisppath=/Library/Application
 Support/Emacs/${version}/site-lisp:/Library/Application
 Support/Emacs/site-lisp' --with-modules'

Configured features:
NOTIFY ACL GNUTLS LIBXML2 ZLIB TOOLKIT_SCROLL_BARS NS MODULES

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Fundamental

Minor modes in effect:
  ivy-mode: t
  shell-dirtrack-mode: t
  projectile-mode: t
  helm-descbinds-mode: t
  async-bytecomp-package-mode: t
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  buffer-read-only: t
  column-number-mode: t
  line-number-mode: t
  transient-mark-mode: t

Recent messages:
ad-handle-definition: ‘ibuffer’ got redefined
Turn on helm-projectile key bindings
For information about GNU Emacs and the GNU system, type C-h C-a.

Load-path shadows:
/Users/itaiberli/.emacs.d/elpa/seq-2.20/seq hides
/Applications/Emacs.app/Contents/Resources/lisp/emacs-lisp/seq

Features:
(shadow sort mail-extr emacsbug message rfc822 mml mml-sec epg mm-decode
mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils mailheader
sendmail rfc2047 rfc2045 ietf-drums mail-utils colir color counsel
jka-compr esh-util etags xref project swiper reftex reftex-vars
two-column ivy delsel ivy-overlay helm-projectile helm-files rx
image-dired tramp tramp-compat tramp-loaddefs trampver shell pcomplete
format-spec dired-x dired-aux ffap helm-tags helm-bookmark helm-adaptive
helm-info bookmark pp helm-external helm-net browse-url xml url
url-proxy url-privacy url-expand url-methods url-history url-cookie
url-domsuf url-util url-parse auth-source gnus-util mm-util help-fns
mail-prsvr password-cache url-vars mailcap helm-buffers helm-grep
helm-regexp helm-utils helm-locate helm-help helm-types projectile grep
compile comint ansi-color ring ibuf-ext ibuffer thingatpt helm-descbinds
helm easy-mmode helm-source cl-seq eieio-compat eieio eieio-core
helm-multi-match helm-lib dired helm-config helm-easymenu cl-macs
async-bytecomp async advice edmacro kmacro finder-inf tex-site info
package epg-config seq byte-opt gv bytecomp byte-compile cl-extra
help-mode easymenu cconv cl-loaddefs pcase cl-lib time-date mule-util
tooltip eldoc electric uniquify ediff-hook vc-hooks lisp-float-type
mwheel ns-win ucs-normalize term/common-win tool-bar dnd fontset image
regexp-opt fringe tabulated-list newcomment elisp-mode lisp-mode
prog-mode register page menu-bar rfn-eshadow timer select scroll-bar
mouse jit-lock font-lock syntax facemenu font-core frame cl-generic cham
georgian utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao
korean japanese eucjp-ms cp51932 hebrew greek romanian slovak czech
european ethiopic indian cyrillic chinese charscript case-table epa-hook
jka-cmpr-hook help simple abbrev minibuffer cl-preloaded nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote kqueue cocoa ns multi-tty
make-network-process emacs)

Memory information:
((conses 16 312045 13704)
 (symbols 48 30403 0)
 (miscs 40 88 192)
 (strings 32 51754 11765)
 (string-bytes 1 1669992)
 (vectors 16 50218)
 (vector-slots 8 844617 7052)
 (floats 8 564 218)
 (intervals 56 242 111)
 (buffers 976 18))



--- End Message ---
--- Begin Message --- Subject: Re: bug#27526: 25.1; Nonconformance to Unicode bidirectionality algorithm due to paragraph separator Date: Tue, 18 Jul 2017 18:55:10 +0300
> Resent-Sender: address@hidden
> From: Itai Berli <address@hidden>
> Date: Tue, 18 Jul 2017 18:22:12 +0300
> 
> How can you tell how many people would like to use it, or indeed if anyone 
> uses it at all?

By reading most of the Emacs-related traffic out there.

> At any rate, thanks for this fix. It is extremely helpful, and even provides 
> a workaround -- to a degree! -- for the
> line-wrapping problem, as long as one is writing a document in a markup 
> language like TeX/LaTeX or XML
> where line breaks are treated the same as spaces.

Thanks, so I'm closing this bug report.

> However, the line-wrapping bug is still a major
> annoyance, at best, and until it is fixed, Emacs cannot claim to be Unicode 
> compliant.

I disagree, as I already said many times.  In any case, that's a
separate bug report.

> I saw that Mr. Stallman
> chimed in on the line-wrapping bug, does this mean that there's hope that it 
> will get fixed in the forseeable
> future?

Richard chimed in on a tangent, it wasn't about the wrapping of bidi
text when paragraph direction is the opposite one.


--- End Message ---

reply via email to

[Prev in Thread] Current Thread [Next in Thread]