freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] How to map a char to glyph with pcf


From: Werner LEMBERG
Subject: Re: [ft-devel] How to map a char to glyph with pcf
Date: Thu, 22 Mar 2018 07:13:26 +0100 (CET)

>>   CHARSET_REGISTRY: `KSC5601.1987'
>>   CHARSET_ENCODING: `0'
>>
>> in the debug messages.  The characters in the font are indeed encoded
>> as plain KSC5601.
> 
> Is this very different from KS C 5601-1992, aka FT_ENCODING_JOHAB
> <https://www.freetype.org/freetype2/docs/reference/ft2-base_interface.html#FT_ENCODING_JOHAB>?

Yes, see below.

> I also see KS C 5601-1987 referenced as WANSUNG
> <https://en.wikipedia.org/wiki/Unified_Hangul_Code>. I am confused.

Regarding the *character sets*, the 1987 and 1992 editions are the
same.  However, the 1992 edition comes with an additional annex that
describe an alternative encoding.

The standard Korean encoding was WANSUNG (a Korean word meaning
`pre-composed'), also known as EUC-KR.  It contains two character
sets, covering ASCII, Hanja (Korean CJK), and 2,350 pre-composed
Hangul characters:

  ASCII (or KS C 5636-1993)   0x21-0x7E (one-byte codes)

  KS C 5601-1992              0xA1-0xFE × 0xA1-0xFE (two-byte codes)

An alternative encoding (defined in the 1992 annex) was JOHAB (a
Korean word meaning `combine'), allowing to access all 11,172 possible
Jamo (i.e., Korean syllable elements) combinations.  Note that most
combinations have only nonsense readings – compare this to all
permutations of English two- and three-letter words, where only a
small fraction represents real words.

  ASCII (or KS C 5636-1993)   0x21-0x7E (one-byte codes)

  pre-combined Hangul         0x84-0xD3 × 0x41-0x7E (two-byte codes)
                              0x84-0xD3 × 0x81-0xFE

  symbols and Hanja           0xD8-0xDE × 0x31-0x7E
                              0xD8-0xDE × 0x91-0xFE
                              0xE0-0xF9 × 0x31-0x7E
                              0xE0-0xF9 × 0x91-0xFE

Note that JOHAB was virtually unused.

BTW, here lies the main difference between Unicode v1 and later
versions: The Korean standard body enforced the encoding of *all* Jamo
combinations; to do that, it was necessary to change the Hangul
character codes, making v1 incompatible to later versions.

Today, it would be unthinkable to add more than 11,000 pre-composed
combinations to Unicode if a set of less than 70 components would do.


    Werner

reply via email to

[Prev in Thread] Current Thread [Next in Thread]