bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbu

On Thu, Dec 13, 2018 at 3:31 PM Khaled Hosny <dr.khaled.hosny@gmail.com> wrote:

The HarfBuzz rendering of Arabic is the correct one in this screenshot.

Thanks. So here's the status so far:

Rendering of Namaste as seen in C-h h (M-x view-hello-file):

|          | harfbuzz | m17b    |
|----------+----------+---------|
| Hindi    | correct | correct |
| Gujarati | wrong    | correct |
| Arabic   | correct | wrong   |

For debugging the such rendering differences, the actual font used by
Emacs for a given part of the text need to be known,

I am using Mukta Vaani font for Gujarati. It is a free font and be downloaded from https://ektype.in/mukta-vaani.html.

The string being rendered is "નમસ્તે".

By placing the cursor on each of those characters and doing C-u x = (on the m17n build), I get:

(1) ન

             position: 1610 of 3509 (46%), column: 32
            character: ન (displayed as ન) (codepoint 2728, #o5250, #xaa8)
              charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x3968
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aa8" or "C-x 8 RET GUJARATI LETTER NA"
          buffer code: #xE0 #xAA #xA8
            file code: #xE0 #xAA #xA8 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x234)

Character code properties: customize what to show
name: GUJARATI LETTER NA
general-category: Lo (Letter, Other)
decomposition: (2728) ('ન')

There are text properties here:
charset              mule-unicode-0100-24ff

(2) મ

             position: 1611 of 3509 (46%), column: 33
            character: મ (displayed as મ) (codepoint 2734, #o5256, #xaae)
              charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x396E
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aae" or "C-x 8 RET GUJARATI LETTER MA"
          buffer code: #xE0 #xAA #xAE
            file code: #xE0 #xAA #xAE (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x239)

Character code properties: customize what to show
name: GUJARATI LETTER MA
general-category: Lo (Letter, Other)
decomposition: (2734) ('મ')

There are text properties here:
charset              mule-unicode-0100-24ff

(3) સ્તે

             position: 1612 of 3509 (46%), column: 34
            character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8)
              charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x3978
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER SA"
          buffer code: #xE0 #xAA #xB8
            file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix)
              display: composed to form "સ્તે" (see below)

Composed with the following character(s) "્તે" using this font:
xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1
by these glyphs:
[0 3 0 645 8 0 11 11 0 [0 0 8]]
[0 3 2724 560 11 1 11 11 1 nil]
[0 3 2759 589 0 -9 -2 16 -11 [-1 0 0]]

Character code properties: customize what to show
name: GUJARATI LETTER SA
general-category: Lo (Letter, Other)
decomposition: (2744) ('સ')

There are text properties here:
charset              mule-unicode-0100-24ff

=====

On harfbuzz build, the "સ્તે" part is different.. I can place the cursor separately on સ્ and તે, do C-u x = and I get:

(3.1) સ્

             position: 1612 of 3509 (46%), column: 34
            character: સ (displayed as સ) (codepoint 2744, #o5270, #xab8)
              charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x3978
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET ab8" or "C-x 8 RET GUJARATI LETTER SA"
          buffer code: #xE0 #xAA #xB8
            file code: #xE0 #xAA #xB8 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x241)

Character code properties: customize what to show
name: GUJARATI LETTER SA
general-category: Lo (Letter, Other)
decomposition: (2744) ('સ')

There are text properties here:
charset              mule-unicode-0100-24ff

(3.2) તે

             position: 1614 of 3509 (46%), column: 35
            character: ત (displayed as ત) (codepoint 2724, #o5244, #xaa4)
              charset: mule-unicode-0100-24ff (Unicode characters of the range U+0100..U+24FF.)
code point in charset: 0x3964
               script: gujarati
               syntax: w     which means: word
             category: .:Base, L:Left-to-right (strong)
             to input: type "C-x 8 RET aa4" or "C-x 8 RET GUJARATI LETTER TA"
          buffer code: #xE0 #xAA #xA4
            file code: #xE0 #xAA #xA4 (encoded by coding system utf-8-unix)
              display: by this font (glyph code)
    xft:-unknown-Mukta Vaani-normal-normal-normal-*-18-*-*-*-*-0-iso10646-1 (#x230)

Character code properties: customize what to show
name: GUJARATI LETTER TA
general-category: Lo (Letter, Other)
decomposition: (2724) ('ત')

There are text properties here:
charset              mule-unicode-0100-24ff

then the text and
the font can be checked against vanilla HarfBuzz (e.g. using the hb-view
command line tool); if it gives the same rendering then it is either a
HarfBuzz or font issue, if not then it is a bug in the HarfBuzz
integration code in Emacs.

From:	Kaushal Modi
Subject:	bug#33729: 27.0.50; Partial glyphs not rendered for Gujarati with Harfbuzz enabled (renders fine using m17n)
Date:	Thu, 13 Dec 2018 15:43:50 -0500