freebangfont-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Freebangfont-devel] Re: [Ankur-core] RE: Update on National Unicode Wor


From: Deepayan Sarkar
Subject: [Freebangfont-devel] Re: [Ankur-core] RE: Update on National Unicode Workshop & Bengali Proposal on Indic m/l
Date: Sat, 11 Oct 2003 13:32:25 -0500
User-agent: KMail/1.5.3

On Wednesday 08 October 2003 02:33, Sayamindu Dasgupta wrote:
> On Wed, 2003-10-08 at 02:32, Deepayan Sarkar wrote:
> > [replying just to ankur-core]
> >
> > I think there's nothing for us to do really, since we are happy with the
> > way things currently are. It's difficult to say that we oppose such and
> > such proposals even before they are made. So let's just wait and watch
> > for now.
>
> Ditto from me.
> Just as an addition, I would like the Nairrhit issue to be publicly
> documented somewhere.


Now that you mention it, there seems to have been no reaction to Sayamindu's 
prior post to the address@hidden list. Maybe we should submit that more 
formally, and hope to get better results this time. 

(Incidentally, does anyone know if address@hidden mails are archived 
anywhere ?)

So here's an initial draft (modified and updated from sdg-s earlier post) --- 
comments welcome. (We should run this by Paul before finalizing it.):



           ----------------------------------------

<draft>

Problems (or the lack thereof) with the current unicode model for Bengali
=========================================================================

We (the Ankur Group) feel that the current unicode encoding model for Bengali 
is reasonably good, with no major changes needed (except perhaps some related 
to khanda ta encoding, see issue 1 below). However, the encoding of some 
features of the language are non-trivial, and the official recommendations of 
the unicode consortium are in a few cases outdated or non-existent. We urge 
the unicode consortium to update their webpages / documentation so that 
implementers as well as users can be pointed to an authentic source of 
information.


issue 1: khanda ta
------------------

The current recommendation to encode khanda ta is 'ta + hasanta + ZWJ', which 
is clearly insufficient. For example,  

cons1 + ta + hasanta + ZWJ +  cons2 + ikaar

will be rendered wrongly (the reordered ikaar would go before and not after 
the khanda ta). A possible fix (already suggested by Paul Nelson) to the 
problem is to encode khanda ta as 'ta + hasanta + ZWJ + ZWNJ'. However, we 
believe (at least I do. Comments anyone ?) that a much more elegant and 
natural solution is the one outlined by Andy White in a recent post to the 
address@hidden list, and described in

http://www.exnet.btinternet.co.uk/uniprop/KhandaTaEncode.pdf

I realise that this would need all fonts to be modified slightly (though not 
opentype rendering implementations), but given that the current recommended 
encoding needs a change anyway, I think the gain in simplicity will be worth 
the little extra trouble.


Issue 2: ra + yaphala
---------------------

The proposal by Paul Nelson for handling ra+yafala/ya-reph also seems to be 
quite reasonable to us. 

[What was the rule again ? Sayamindu ?]


Issue 3: ra + rikaar
--------------------

During the discussion, we came across another unusual glyph. In the Bengali 
word Nairrhit (meaning south west), there's a glyph that looks like 

Vocalic R (098B) + reph

and one might wonder whether this should be encoded as 

ra + hasanta + Vocalic R  [ 09B0 09CD 098B ]

or 

ra + Vowel sign Vocalic R (09C3) [ 09B0 09C3 ] 

The consensus seems to be that the second solution is more natural and 
consistent with the unicode model --- we would like the Unicode consortium to 
explicitly mention and endorse this solution. (Note that this wouldn't need 
any change anywhere, except to add a below base substitution for ra+rikaar in 
fonts.)




issue 4: use of dotted circle
-----------------------------

Microsoft's Uniscribe, and possibly by it's influence some other 
implementations, inserts the 'dotted circle' character before vowel signs 
whenever it thinks that they are inappropriately placed. I'm not sure whether 
this has ever been recommended by the Unicode consortium (some people think 
not. I don't have the time to search for it. Useful pointers would be very 
welcome), but Paul Nelson recently said in a post to address@hidden that 
he would need an explicit statement from the Unicode consortium against it if 
he is to withdraw this behaviour.

Here's a real example where this creates a problem:

http://www.stat.wisc.edu/~deepayan/Bengali/ucp.jpg

A relevant quote from Andy:

[http://mail.nongnu.org/archive/html/freebangfont-devel/2003-09/msg00025.html]

> One point I must respond to is this. Vowel signs do not need to be rendered
> on a dotted circle. I think it is the fact that Microsoft's implementation
> did this first that a general misconception has occurred.
>
> In fact, Microsoft's USP provides the facility of entering stand alone
> vowel signs by entering a sequence such as SPACE ZWJ IKAAR etc. Unicode
> does not endorse this scheme because they say that entering just IKAAR on
> its own should do the same thing.

[This is related to the larger issue of whether the rendering engine should 
make statements about the validity of a set of characters. Seems to me that 
it should be the task of the spell-checker or some similar tool.]



issue 5: a + yaphala
--------------------

Could be mentioned for the sake of completeness. The current recommendation 
seems adequate.


Other than those outlined above, at this moment, we are concerned about No
other issues and, in our opinion, a consensus on these would solve all
current bengali ambiguities (although if anyone thinks otherwise, we would 
definitely like to hear their reasons).

</draft>





reply via email to

[Prev in Thread] Current Thread [Next in Thread]