[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freebangfont-devel] Ankur's bangla unicode statement (was: Update on Na
From: |
Deepayan Sarkar |
Subject: |
[Freebangfont-devel] Ankur's bangla unicode statement (was: Update on National ...) |
Date: |
Wed, 29 Oct 2003 12:36:26 -0600 |
User-agent: |
KMail/1.5.3 |
Since there has been no adverse reaction, let me post a (proposed) final
version of the draft before I get swamped by assignments again. I doubt this
will have much effect, but Indranil, please forward this (after any suggested
amendments and a bit of polishing up) to whoever it should be forwarded to.
(Note: I have dropped one issue based on some discussion on the address@hidden
list, namely the one regarding the use of the dotted circle.)
<draft>
Problems (or the lack thereof) with the current unicode model for Bengali
=========================================================================
We (the Ankur Group) feel that the current unicode encoding model for Bengali
is reasonably good, with no major changes needed (except perhaps some related
to khanda ta encoding, see issue 1 below). However, the encoding of some
features of the language are non-trivial, and the official recommendations of
the unicode consortium are in a few cases outdated or non-existent. We urge
the unicode consortium to update their webpages / documentation so that
implementers as well as users can be pointed to an 'authentic' source of
information.
issue 1: khanda ta
------------------
The current recommendation to encode khanda ta is 'ta + hasanta + ZWJ', which
is clearly insufficient. For example,
cons1 + ta + hasanta + ZWJ + cons2 + ikaar
will be rendered wrongly (the reordered ikaar would go before and not after
the khanda ta). A possible fix (already suggested by Paul Nelson) to the
problem is to encode khanda ta as 'ta + hasanta + ZWJ + ZWNJ'. However, we
believe (at least I do. Comments anyone ?) that a much more elegant and
natural solution is the one outlined by Andy White in a recent post to the
address@hidden list, and described in
http://www.exnet.btinternet.co.uk/uniprop/KhandaTaEncode.pdf
I realise that this would need all fonts to be modified slightly (though not
opentype rendering implementations), but given that the current recommended
encoding needs a change anyway, I think the gain in simplicity will be worth
the little extra trouble.
Issue 2: ra + yaphala
---------------------
The proposal by Paul Nelson for handling ra+yafala/ya-reph also seems to be
quite reasonable to us. This is available at
http://www.unicode.org/review/pr-9.pdf
Issue 3: ra + rikaar
--------------------
During the discussion, we came across another unusual glyph. In the Bengali
word Nairrhit (meaning south west), there's a glyph that looks like
Vocalic R (098B) + reph
and one might wonder whether this should be encoded as
ra + hasanta + Vocalic R [ 09B0 09CD 098B ]
or
ra + Vowel sign Vocalic R (09C3) [ 09B0 09C3 ]
The consensus seems to be that the second solution is more natural and
consistent with the unicode model --- we would like the Unicode consortium to
explicitly mention and endorse this solution. (Note that this wouldn't need
any change anywhere, except to add a below base substitution for ra+rikaar in
fonts.)
issue 4: a + yaphala
--------------------
Could be mentioned for the sake of completeness. The current recommendation
seems adequate.
Other than those outlined above, at this moment, we are concerned about No
other issues and, in our opinion, a consensus on these would solve all
current bengali ambiguities (although if anyone thinks otherwise, we would
definitely like to hear their reasons).
</draft>