bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] [PATCH] armscii8 bugfix


From: Gayane Sargssian
Subject: Re: [bug-gnu-libiconv] [PATCH] armscii8 bugfix
Date: Mon, 12 Jul 2010 00:51:11 +0500
User-agent: Freenet.am WebMail

Hello, thank you for analyzing. I am sorry for re-posting multiple times,
the mail archive was claiming that it updates the archive every 2 hours,
so, when I couldn't find my mail after more than 20 hours, I decided to
re-post the mail using another mail account.

So, those were my convertions:

0xA1    0xFFFD  ->      0x00A9
0xA2    0x0587  ->      0x00A7
0xA8    0x2014  ->      0x0587

0x0587  0xA2    ->      0xA8
0x2014  0xA8    ->      0x00

>What are your references?

My main reference is the True Type font "Arial Armenian" [1].
It was a very popular font in Armenia in pre-unicode era in MS Windows for
ARMSCII-8 encoding. Maybe, as many proprietary things, the font is not fully
standard-compliant. I can't be sure.

>The mapping of 0xA1 should, according to [1], be the ARMENIAN ETERNITY SIGN.
>But this sign is not in Unicode. It does not seem appropriate to be to use
>U+00A9 COPYRIGHT SIGN or (like done in some encodings [2]) the
>U+2741 EIGHT PETALLED OUTLINED BLACK FLORETTE for it. So, I'd better
leave it
>as is.

Hmm, in the font mentioned above, 0xA1 is used as the copyright sign.
There is some discussion here [2] about that sign, and it is not mapped to
anything in AST 34.002 draft [3]. Anyhow, I agree not to map it to the
copyright sign (and not because I prefer copyleft ;)

>About the mapping of 0xA2, [1] says: "The code value A2 was used for
encoding
>the Armenian ligature ew (used as a symbol), but was later replaced by the
>section sign punctuation. Some Armenian fonts display this ligature at the
>position of the ASCII ampersand symbol..."

Yes, actually I've replaced it to "section sign punctuation" (0x00a7) in
my patch,
as it is used in the "Arial Armenian" font.

>About the mapping of 0xA8, [1] says that it maps to em-dash (U+2014) but
>then says U+2015 (which is HORIZONTAL BAR). In any case, I don't see a
reason
>to map it to U+0587 ARMENIAN SMALL LIGATURE ECH YIWN.

The thing is, if A2 was the "ARMENIAN SMALL LIGATURE ECH YIWN" and was
later replaced
by "section sign punctuation", then what has happened with "ARMENIAN SMALL
LIGATURE ECH
YIWN"? In the mentioned font, the "ARMENIAN SMALL LIGATURE ECH YIWN" is
mapped to 0xA8.
So, maybe it was also replaced?

However, maybe the issue is that the popular fonts in Armenia are not
standard-compliant,
and now we, who want to convert the old documents to unicode, get some
errors,
espceically with widely used ARMENIAN SMALL LIGATURE ECH YIWN, which,
after conversion,
becomes em-dash.

We must find the source of the information in Wikipedia about replacing
0xA2 to section sign.

[1] http://www.armsite.com/software/fonts/arialarmenian.zip
[2] http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML025/1232.html
[3] http://tools.ietf.org/html/draft-melikyan-armenian-charsets-00

[AST 34.002 ?] http://users.freenet.am/~vm/AST/002-ArmSCII-8-Encoding.PDF



-----------------------------------------
This email was sent using Freenet.am WebMail.
Welcome to Armenian Freenet Community!
http://freenet.am/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]