[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#20140: 24.4; M17n shaper output rejected
From: |
K. Handa |
Subject: |
bug#20140: 24.4; M17n shaper output rejected |
Date: |
Wed, 25 Mar 2015 23:25:54 +0900 |
Hi, thank you for the detailed explanation.
In article <20150321175818.1b125eba@JRWUBU2>, Richard Wordingham
<richard.wordingham@ntlworld.com> writes:
> What I ought to want is SIL's split cursor scheme, which indicated the
> next ('point') and previous characters, even in bidirectional text.
> Unfortunately, that's not compatible with m17n, which seems to assume
> that cursor position will be a single number. The Emacs functions
> forward-char-intrusive and backward-char-intrusive provided a pleasant,
> more intuitive, alternative, and I am sad to hear they are gone.
> Perhaps I'll have to start using toggle-auto-composition.
Those Emacs functions are just my idea for improving Emacs
for CTL users, and have never been included in the official
Emacs verison. I check the code and found two problems:
(1) When the command sets disable-point-adjustment to t,
command_loop_1 should force updating the display if point is
within a grapheme cluster. So we need this patch:
diff --git a/src/keyboard.c b/src/keyboard.c
index bf65df1..13125c1 100644
--- a/src/keyboard.c
+++ b/src/keyboard.c
@@ -1636,6 +1636,16 @@ command_loop_1 (void)
adjust_point_for_property (last_point_position,
MODIFF != prev_modiff);
}
+ else if (current_buffer == prev_buffer
+ && last_point_position != PT)
+ {
+ if (PT > BEGV && PT < ZV
+ && (composition_adjust_point (last_point_position, PT) != PT))
+ /* Now point is within a grapheme cluster. We must update
+ the display so that this cluster is discomosed on the
+ screen and the cursor is correctly placed at point. */
+ windows_or_buffers_changed = 22;
+ }
/* Install chars successfully executed in kbd macro. */
(2) We should break a grapheme cluster at point. So we need
this patch.
diff --git a/src/xdisp.c b/src/xdisp.c
index a17f5a9..0c56395 100644
--- a/src/xdisp.c
+++ b/src/xdisp.c
@@ -3408,6 +3408,9 @@ compute_stop_pos (struct it *it)
pos = next_overlay_change (charpos);
if (pos < it->stop_charpos)
it->stop_charpos = pos;
+ /* If point is in front of the current stop pos, stop there. */
+ if (charpos < PT && PT < it->stop_charpos)
+ it->stop_charpos = PT;
/* Set up variables for computing the stop position from text
property changes. */
@@ -8166,7 +8169,12 @@ next_element_from_buffer (struct it *it)
&& IT_CHARPOS (*it) >= it->redisplay_end_trigger_charpos)
run_redisplay_end_trigger_hook (it);
- stop = it->bidi_it.scan_dir < 0 ? -1 : it->end_charpos;
+ /* Set stop position considering the bidi direction and point. */
+ if (it->bidi_it.scan_dir < 0)
+ stop = (PT < IT_CHARPOS (*it)) ? PT : -1;
+ else
+ stop = ((IT_CHARPOS (*it) < PT && PT < it->end_charpos)
+ ? PT : it->end_charpos);
if (CHAR_COMPOSED_P (it, IT_CHARPOS (*it), IT_BYTEPOS (*it),
stop)
&& next_element_from_composition (it))
Could you try these patches and test the usability of
forward-char-intrusive and backward-char-intrusive?
> > Please try to move cursor over this Devanagri text "हिंदी" on
> > Emacs, gedit, and, for instance, firefox. They all treat
> > that text as 2 grapheme clusters "हिं" and "दी". The first
> > one corresponds to character the sequence U+935 U+93F, and
> > U+93F (vowel I) is displayed before U+935 (base cosonant).
> Note that those clusters are only 3 and 2 characters long. Retyping
> them is tolerable. Now consider the Sanskrit Devanagari text स्त्री,
> which contains two consonant-combining viramas. Emacs moves across it
> in 1 step, but Claws e-mail (GTK-based, I believe) and LibreOffice
> (HarfBuzz-based, at least for linux) both take 3 steps to move across
> it. Claws and LibreOffice use different algorithms to position the
> cursor. That of LibreOffice seems more reasonable, but that of
> Claws works better! The reason is that Unicode did not declare virama
> as forming grapheme clusters.
Ah, hmmm, that a problem of DEVA-OTF.flt and DEV2-OTF.flt of
the m17n library. I'll try to fix them.
> It seems to have solved all of them. When I reported the bug, I was
> having problems with my font because libotf was silently ignoring half
> the lookups in my font.
Could you please send me (not on this list) an appropriate
bug/problem report if libotf should be fixed?
> I though I might have problems with U+1A58 TAI THAM SIGN MAI KANG LAI,
> which in Lao visually groups (usually) with the following base
> consonant and in Tai Khuen groups with the preceding base consonant. My
> clustering in Emacs follows the Tai Khuen scheme. (I compose two
> orthographic clusters together in Emacs, but declare two grapheme
> clusters in the FLT processing.) However, my font follows a major
> Northern Thai dictionary and places it on the following base consonant
> if there is nothing above it, but otherwise places it on the preceding
> base consonant. However, my implementation is too dirty to cause
> problems - the second cluster is not reported as deriving from the
> mai kang lai character.
> I wonder, though, what will happen if I manage to implement the
> Universal Shaping Engine's (USE) rphf feature. The author of a Lao-style
> Tai Tham font wanted this feature in HarfBuzz. The desired effect seems
> easy to achieve in m17n-flt, but placing it under font control is more
> difficult. I'm studying MLM2-OTF.flt to see how to do it.
I've just started to study the Universal Shaping Engine. It
seems that we can implement it by a proper FLT file.
> > > However, it then makes editing of the 'clusters' more
> > > difficult. Note that there are examples above with 5
> > > characters in a cluster, and this is by no means the
> > > limit.
> >
> > But, it seems that the current behavior is accepted, at
> > least, by Indic people.
> Who do you mean by 'Indic people'?
I just mean that I have not heard any complaints about that
"too long cluster problem" of Emacs. No one is using Emacs
for Indic scripts?
> New Tai Lue is an interesting case. Microsoft delayed support for this
> simple Indic script for so long that most apparently Unicode-encoded
> New Tai Lue text was actually encoded in visual order. With Unicode
> 8.0, New Tai Lue is changing from phonetic order to visual order, and
> it will no longer need any clusters at all!
Wow, I didn't know that.
> Emacs 23.3 (which is what is in long-term support Ubuntu
> 12.04) offers no support for New Tai Lue, so I am not sure
> that there is yet a New Tai Lue view on composition in
> Emacs.
We may be able to provide supports for new scripts in elpa.
---
K. Handa
handa@gnu.org