The HarfBuzz rendering of Arabic is the correct one in this screenshot.
Thanks. So here's the status so far:
Rendering of Namaste as seen in C-h h (M-x view-hello-file):
| | harfbuzz | m17b |
|----------+----------+---------|
| Hindi | correct | correct |
| Gujarati | wrong | correct |
| Arabic | correct | wrong |
For debugging the such rendering differences, the actual font used by
Emacs for a given part of the text need to be known,
The string being rendered is "નમસ્તે".
By placing the cursor on each of those characters and doing C-u x = (on the m17n build), I get:
(1)
ન
position: 1610 of 3509 (46%), column: 32
character: ન (displayed as ન) (codepoint 2728, #o5250, #xaa8)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x3968
script: gujarati
syntax: w which means: word
category: .:Base, L:Left-to-right (strong)
to input: type "C-x 8 RET aa8" or "C-x 8 RET GUJARATI LETTER NA"
buffer code: #xE0 #xAA #xA8
file code: #xE0 #xAA #xA8 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x234)
Character code properties: customize what to show
name: GUJARATI LETTER NA
general-category: Lo (Letter, Other)
decomposition: (2728) ('ન')
There are text properties here:
charset mule-unicode-0100-24ff
(2) મ
position: 1611 of 3509 (46%), column: 33
character: મ (displayed as મ) (codepoint 2734, #o5256, #xaae)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x396E
script: gujarati
syntax: w which means: word
category: .:Base, L:Left-to-right (strong)
to input: type "C-x 8 RET aae" or "C-x 8 RET GUJARATI LETTER MA"
buffer code: #xE0 #xAA #xAE
file code: #xE0 #xAA #xAE (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x239)
Character code properties: customize what to show
name: GUJARATI LETTER MA
general-category: Lo (Letter, Other)
decomposition: (2734) ('મ')
There are text properties here:
charset mule-unicode-0100-24ff
(3)
સ્તે
position: 1612 of 3509 (46%), column: 34
character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x3978
script: gujarati
syntax: w which means: word
category: .:Base, L:Left-to-right (strong)
to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER SA"
buffer code: #xE0 #xAA #xB8
file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix)
display: composed to form "સ્તે" (see below)
Composed with the following character(s) "્તે" using this font:
xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
by these glyphs:
[0 3 0 645 8 0 11 11 0 [0 0 8]]
[0 3 2724 560 11 1 11 11 1 nil]
[0 3 2759 589 0 -9 -2 16 -11 [-1 0 0]]
Character code properties: customize what to show
name: GUJARATI LETTER SA
general-category: Lo (Letter, Other)
decomposition: (2744) ('સ')
There are text properties here:
charset mule-unicode-0100-24ff
=====
On harfbuzz build, the "સ્તે" part is different.. I can place the cursor separately on સ્ and તે, do C-u x = and I get:
(3.1) સ્
position: 1612 of 3509 (46%), column: 34
character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x3978
script: gujarati
syntax: w which means: word
category: .:Base, L:Left-to-right (strong)
to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER SA"
buffer code: #xE0 #xAA #xB8
file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x241)
Character code properties: customize what to show
name: GUJARATI LETTER SA
general-category: Lo (Letter, Other)
decomposition: (2744) ('સ')
There are text properties here:
charset mule-unicode-0100-24ff
(3.2) તે
position: 1614 of 3509 (46%), column: 35
character: ત (displayed as ત) (codepoint 2724, #o5244, #xaa4)
charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x3964
script: gujarati
syntax: w which means: word
category: .:Base, L:Left-to-right (strong)
to input: type "C-x 8 RET aa4" or "C-x 8 RET GUJARATI LETTER TA"
buffer code: #xE0 #xAA #xA4
file code: #xE0 #xAA #xA4 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x230)
Character code properties: customize what to show
name: GUJARATI LETTER TA
general-category: Lo (Letter, Other)
decomposition: (2724) ('ત')
There are text properties here:
charset mule-unicode-0100-24ff
then the text and
the font can be checked against vanilla HarfBuzz (e.g. using the hb-view
command line tool); if it gives the same rendering then it is either a
HarfBuzz or font issue, if not then it is a bug in the HarfBuzz
integration code in Emacs.