[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freebangfont-devel] Re: [Ankur-core] RE: Update on National Unicode Wor
From: |
Deepayan Sarkar |
Subject: |
[Freebangfont-devel] Re: [Ankur-core] RE: Update on National Unicode Workshop & Bengali Proposal on Indic m/l |
Date: |
Sat, 11 Oct 2003 13:32:25 -0500 |
User-agent: |
KMail/1.5.3 |
On Wednesday 08 October 2003 02:33, Sayamindu Dasgupta wrote:
> On Wed, 2003-10-08 at 02:32, Deepayan Sarkar wrote:
> > [replying just to ankur-core]
> >
> > I think there's nothing for us to do really, since we are happy with the
> > way things currently are. It's difficult to say that we oppose such and
> > such proposals even before they are made. So let's just wait and watch
> > for now.
>
> Ditto from me.
> Just as an addition, I would like the Nairrhit issue to be publicly
> documented somewhere.
Now that you mention it, there seems to have been no reaction to Sayamindu's
prior post to the address@hidden list. Maybe we should submit that more
formally, and hope to get better results this time.
(Incidentally, does anyone know if address@hidden mails are archived
anywhere ?)
So here's an initial draft (modified and updated from sdg-s earlier post) ---
comments welcome. (We should run this by Paul before finalizing it.):
----------------------------------------
<draft>
Problems (or the lack thereof) with the current unicode model for Bengali
=========================================================================
We (the Ankur Group) feel that the current unicode encoding model for Bengali
is reasonably good, with no major changes needed (except perhaps some related
to khanda ta encoding, see issue 1 below). However, the encoding of some
features of the language are non-trivial, and the official recommendations of
the unicode consortium are in a few cases outdated or non-existent. We urge
the unicode consortium to update their webpages / documentation so that
implementers as well as users can be pointed to an authentic source of
information.
issue 1: khanda ta
------------------
The current recommendation to encode khanda ta is 'ta + hasanta + ZWJ', which
is clearly insufficient. For example,
cons1 + ta + hasanta + ZWJ + cons2 + ikaar
will be rendered wrongly (the reordered ikaar would go before and not after
the khanda ta). A possible fix (already suggested by Paul Nelson) to the
problem is to encode khanda ta as 'ta + hasanta + ZWJ + ZWNJ'. However, we
believe (at least I do. Comments anyone ?) that a much more elegant and
natural solution is the one outlined by Andy White in a recent post to the
address@hidden list, and described in
http://www.exnet.btinternet.co.uk/uniprop/KhandaTaEncode.pdf
I realise that this would need all fonts to be modified slightly (though not
opentype rendering implementations), but given that the current recommended
encoding needs a change anyway, I think the gain in simplicity will be worth
the little extra trouble.
Issue 2: ra + yaphala
---------------------
The proposal by Paul Nelson for handling ra+yafala/ya-reph also seems to be
quite reasonable to us.
[What was the rule again ? Sayamindu ?]
Issue 3: ra + rikaar
--------------------
During the discussion, we came across another unusual glyph. In the Bengali
word Nairrhit (meaning south west), there's a glyph that looks like
Vocalic R (098B) + reph
and one might wonder whether this should be encoded as
ra + hasanta + Vocalic R [ 09B0 09CD 098B ]
or
ra + Vowel sign Vocalic R (09C3) [ 09B0 09C3 ]
The consensus seems to be that the second solution is more natural and
consistent with the unicode model --- we would like the Unicode consortium to
explicitly mention and endorse this solution. (Note that this wouldn't need
any change anywhere, except to add a below base substitution for ra+rikaar in
fonts.)
issue 4: use of dotted circle
-----------------------------
Microsoft's Uniscribe, and possibly by it's influence some other
implementations, inserts the 'dotted circle' character before vowel signs
whenever it thinks that they are inappropriately placed. I'm not sure whether
this has ever been recommended by the Unicode consortium (some people think
not. I don't have the time to search for it. Useful pointers would be very
welcome), but Paul Nelson recently said in a post to address@hidden that
he would need an explicit statement from the Unicode consortium against it if
he is to withdraw this behaviour.
Here's a real example where this creates a problem:
http://www.stat.wisc.edu/~deepayan/Bengali/ucp.jpg
A relevant quote from Andy:
[http://mail.nongnu.org/archive/html/freebangfont-devel/2003-09/msg00025.html]
> One point I must respond to is this. Vowel signs do not need to be rendered
> on a dotted circle. I think it is the fact that Microsoft's implementation
> did this first that a general misconception has occurred.
>
> In fact, Microsoft's USP provides the facility of entering stand alone
> vowel signs by entering a sequence such as SPACE ZWJ IKAAR etc. Unicode
> does not endorse this scheme because they say that entering just IKAAR on
> its own should do the same thing.
[This is related to the larger issue of whether the rendering engine should
make statements about the validity of a set of characters. Seems to me that
it should be the task of the spell-checker or some similar tool.]
issue 5: a + yaphala
--------------------
Could be mentioned for the sake of completeness. The current recommendation
seems adequate.
Other than those outlined above, at this moment, we are concerned about No
other issues and, in our opinion, a consensus on these would solve all
current bengali ambiguities (although if anyone thinks otherwise, we would
definitely like to hear their reasons).
</draft>
- [Freebangfont-devel] Re: [Ankur-core] RE: Update on National Unicode Workshop & Bengali Proposal on Indic m/l,
Deepayan Sarkar <=