freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] Re: Preliminary support to CMap files on CID-keyed fonts.


From: Werner LEMBERG
Subject: Re: [ft-devel] Re: Preliminary support to CMap files on CID-keyed fonts.
Date: Sun, 17 Dec 2006 08:55:23 +0100 (CET)

It seems that I've forgotten to answer this mail.  So here it is.
Because of a typo in the Cc list this mail has never reached the
freetype-devel list, thus I'm citing it completely.

> I summarize a few technical issues to be discussed before design the
> behaviour of FT_Attach_File() for CMap.
>
> I understand as: FT_Attach_File()/FT_Attach_Stream of CMap is
> working as a PS operator: "composefont". There are several levels of
> support for genuine CMap, ToUnicode CMap, and rearranged font CMap.

Hmm, I was rather thinking about the most trivial case only: read a
CID-keyed font, then read a CMap separately to create a input
character -> glyph index.  AFAIK, there is no format difference
between genuine and ToUnicode CMaps; however, for FreeType, ToUnicode
CMaps are useless.

> However, Hidetoshi wrote as:
>
> http://lists.gnu.org/archive/html/freetype-devel/2003-08/msg00002.html
> >4. We cannot read any ToUnicode mapping files and any rearrange
> >   font files.  And I will not support these files because of there
> >   are some difficulties.
>
> So, the issues of ToUnicode and rearranged font CMaps can be future
> issues (no support can be OK, at present).

Exactly.

> 1. about genuine CMap
> ---------------------
>
> a) minimum level
>
> FT_Attach_File()/FT_Attach_Stream() can handle single CMap at once,
> doesn't resolve the dependency of "usecmap".  The information of
> attached CMaps are stored in face object as a linked list. "usecmap"
> can be emulated by multiple attaching of CMap. (e.g.:
> FT_Attach_File(EUC-H), then FT_Attach_File(EUC-V)).  When
> FT_Attach_File() receives a CMap refering other CMap which is not
> applied yet, FT_Attach_File() returns an error with the name of CMap
> that should be attached.  Finding a CMap file/stream from given name
> is not the task of FT2.

This sounds sensible.

> b) with fallback by implicit lookup
>
> When FT_Attach_File() receives a CMap that refers another CMap by
> "usecmap", FT_Attach_File() lookup the required CMap in the same
> directory of given CMap.  If the required CMap is not found the
> scanned directory (or, found but invalid), FT_Attach_File() returns
> an error with the name of CMap that should be attached.
>
> FT_Attach_Stream() does not lookup required CMap implicitly, only
> checks if the required CMap is attached to the given FT_Face object.

Hmm.  I prefer a).

> c) with DB of available CMap
>
> A list of available CMap pathnames/streams/objects is stored in
> FT_Face object (or FT_Library). Both of FT_Attach_File() and
> FT_Attach_Stream() resolve the dependency by the list.
>
> # I think: b > a >> c
> #
> # 1-b might be expected, but the behaviour of 1-a is
> # easy to understand.

This is something for a poll...

> # 1-c is difficult to use in incremental parsing of PDF.
> # If PDF includes embedded CMap objects, 1-c requires
> # a prescan of PDF for embedded CMap.
>
>
> 2. about rearranged font
> ------------------------
> rearranged font is a composite font that mixes multiple faces by
> mapping a specific face to specified codepoints (e.g. Courier to
> US-ASCII codepoints and Ryumin-Light--EUC-H to 16bit euc-jp
> codepoints).  I think there's no suitable existing API to do such
> composition procedure.

Yep.

> a) no real support, only indicates rearranged font.
>
> All existing FT2 API does not handle rearranged font at all.  When
> rearranged font file is passed to FT_New_Face() etc, they returns an
> error with information as "this is rearranged font"
>
> b) add new function to resolve the component font.
>
> add new rearranged-font-specific function which just resolve a
> character code into component font and a string for the component
> font.  the recursive resolving, loading of component font, and
> getting the glyph data should be controlled by the application, not
> by FT2.
>
> # I think: b > a.

I have no opinion here since I've never seen that in real use.  It's
probably only relevant for CJK countries.  Patches welcome :-)

> 3. about ToUnicode CMap
> -----------------------
>
> In a PDF document with embedded TrueType, usually the encoding is
> Identity-H, as, CJK-TrueType-Font--Identity-H.  The charcode is the
> glyphID of subsetted TrueType font, but it is non standard at all
> character encoding.  Possibly there are many people who want to use
> the embedded font as a font object with charmap of standard
> character code, like UCS-2 or UTF-16 (although I'm not sure if such
> indepth work is the role of FT2).

Good question.  Since handling of PDF is beyond the scope of FreeType
we should ask people who have written programs which manipulate or
display PDF documents.

> ToUnicode CMap was introduced for such purpose to convert glyphID
> to UTF-16 string (instead of UTF-16 codepoint).
>
> There are 2 points we must be careful.
>
> P1: The UTF-16 string/codepoint assigned to a glyph are not
>     guaranteed to be unique.  It is possible that single UTF-16
>     codepoint is assigned to multiple glyphIDs.
>
>     a) FT_Attach_File()/FT_Attach_Stream() can detect ToUnicode
>        mapping file, but does not accept it.  FT2 provides new
>        function to resolve ToUnicode mapping file/stream, and leave
>        the handling of ToUnicode mapping to each application.

Sounds good.

>     b) Among the glyphIDs for a specified glyphID, the entry with
>        lowest glyphID in the ToUnicode mapping is used.  When
>        FT_Attach_File()/FT_Attach_Stream() receive a ToUnicode
>        mapping file, they return error (or issue some warning).
>
>     c) Provide new function something like FT_Load_Char_Variant()
>        which we can specify the entry.
>
> # I think: a > b >> c

I agree.

> # I'm afraid that b is mistaken as a pragmatic solution, but it
> # misguides the developers into oversimplified implementations.
>
> P2: The UTF-16 string entry cannot be treated as a codepoint in
>     multibyte encoding, because the codepoints in the string can be
>     included in other entries.
>
>     a) FT_Attach_File()/FT_Attach_Stream() accepts only
>        ToUnicode mapping by UTF-16 codepoints.
>
>     b) Only UTF-16 codepoint entries are used.  Other entries of
>        UTF-16 string are ignored with warning.
>
>     c) Provide new function something like FT_Load_String().
>
> # I think: b > a >> c

OK.

> # Most of FT2 API receives a 32bit codepoint, instead of string.
> # The function like FT_Load_String() looks quite exceptional and too
> # specific for this purpose.


    Werner




reply via email to

[Prev in Thread] Current Thread [Next in Thread]