freetype
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freetype] How do I tell if a font supports a given Unicode set of


From: Antoine Leca
Subject: Re: [Freetype] How do I tell if a font supports a given Unicode set of characters?
Date: Tue, 03 Jul 2001 15:09:19 +0200

Paul Pedriana wrote:
> 
> Given a font, I want to know if it will be able to draw
> (e.g.) Cyrillic characters with it. [...]
> Is there a machanism within Freetype to help out telling
> what character sets a font supports? Or do I just pick
> some representative characters and call FT_Get_Char_Index
> and check the return value?

The latter.
Specifically ith Cyrillic, there is a known hack (which traces
back to Windows 3.1) for TrueType fonts only, that make some
fonts that otherwise looks like as "normal" Latin1 one, to
be detected as Cyrillic one. My advise is to drop entirely
any support for this kind of hack, which are not very much
used these days, and stick with "standard" Unicode-encoded
fonts.

 
> Unrelated question: How do you tell from a C program what
> the code page is of the input?

You should use some kind of heuristic (that is, give
probabilistic weight for every possibilities, and elect the
most probable).
If you are interested in this area, please take a look inside
Mozilla: it has the mechanism to do that, at least for Japanese
and probably for other East Asian code too (It is written in
C++, and not famous for being the most beautiful program ever
written, so you are warned).
Another possibility, also related to CJK, probably richer but
that I did not know well enough, is to take the reference on
this topic, i.e. Ken Lunde's CJKV Information Processing.
<URL:http://www.oreilly.com/people/authors/lunde/>


> If I write a program and start
> receiving characters (multi-byte in particular), I want to know
> what code page they are so I can convert them to Unicode
> and get on with processing. Microsoft's VC++ has a function
> called _getmbcp (in mbctype.h) that does this, but I don't
> thing this exists on other platforms.

This gives you only the codepage with which the current system
operates. Unfortunately, you have no way to know for a given
file (or stream), if it really s encoded according to this
very codepage, or if it is something else entirely.

The only special case here are IBM systems (using EBCDIC), where
each file is normally stored with an external attribute which is
precisely the EBCDIC codepage of the informations inside the file.

Other than that, as David pointed out, you can ask for the
current default locale settings, i.e. the string returned with
  was = setlocale(LC_CTYPE, "");
  default = setlocale(LC_CTYPE, was);  /* which restores the previous value */
  return strdup(default);
Often, on Unix, the returned string will look like
  lg_CN[.codeset]
where lg is a two-letter language code (allow space for more than
two letters), CN is a two-letter country code, and codeset is an
optionnal field that describes the encoding used, the same thing
_getmbcp() returns.


Hope it helps,
Antoine



reply via email to

[Prev in Thread] Current Thread [Next in Thread]