freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ft-devel] [FYI] MingLiU identifier


From: mpsuzuki
Subject: Re: [ft-devel] [FYI] MingLiU identifier
Date: Mon, 22 Nov 2010 13:48:01 +0900

Hi David and Werner,

I've checked the sfnt table checksums for fonts
in Microsoft Windows XP SP2, Microsoft Windows 7,
Apple Mac OS X 10.3/4/5/6, Debian GNU/Linux (squeeze),
Dynalab 150 TrueType font pack CD and Ricoh's font
pack CD. The checksums of tricky fonts I have are
following:

MingLiU
cvt  length=0x000002e4 checkSum=0x05bcf058
fpgm length=0x000087c4 checkSum=0x28233bf1
prep length=0x000001e1 checkSum=0xa344a1ea (1995)
prep length=0x000001e1 checkSum=0xa344a1eb (1996)

DFKaiShu
cvt  length=0x00000350 checkSum=0x11e5ead4
fpgm length=0x00009063 checkSum=0x5a30ca3b
prep length=0x0000007e checkSum=0x13a42602

HuaTian{Kai|Song}Ti
cvt  length=0x00000008 checkSum=0xfffbfffc
prep length=0x00000008 checkSum=0x70020112
fpgm length=0x0000bea2 checkSum=0x9c9e48b8 (HuaTianKaiTi)
fpgm length=0x00017c39 checkSum=0x0a5a0483 (HuaTianSongTi)

In the fonts I've checked, there are no fonts including
cvt/fpgm/prep whose length/checksum show collision with
the tricky fonts in above, except of the quite short
"prep" tables in HuaTian{Song|Kai}Ti (only 8 bytes).
They conflict with Founder's fonts. Many fonts by Founder
have "prep" table in 8 bytes.

--

However, as you had concerned, the sfnt table length and
checksum are not good identifier. Nothing to say, some
fonts in same family (e.g. DejaVu Serif Regular/Bold/
Italic/BoldItalic) share same cvt/fpgm/prep, but more non-
intuitive conflicts are found.

I attached the list of collisions found on Microsoft
Windows 7. A "fpgm" table in 1310 byte with checksum
0xcb26a26f are found in 22 fonts; Bodoni, Bladley Hand,
Gill Sans, Rockwell, ... they are hard to be recognized
as 1 group.

If we use a combination of the tables, the collisions
are improved, but it is not perfect; 3 tables (cvt/fpgm/
prep) has 5 collisions at max. Using more tables (e.g.
cvt/fpgm/prep/hhea) improves further, but the number
of TrueType tables that are difficult to be modified in
subsetting for Type42 format is only cvt/fpgm/prep at
max.

--

My previous patch is designed to identify the tricky
font by 1 table, and it won't misunderstand about
MingLiU/DFKaiShu/HuaTian. But when I support the user
defined blacklist of tricky fonts, the identification
the combination of multiple tables would be expected.

Werner, could you give me comment about the introduction
of new API for FT2-clients to control the blacklist of
tricky fonts? The APIs I drafted are following:

  typedef enum  FT_Tricky_Level_
  {
    FT_TRICKY_NO,       /* match nothing in blacklist */
    FT_TRICKY_EXACT,    /* exact match in blacklist */
    FT_TRICKY_PARTIAL   /* partial match in blacklist */ 
  } FT_Tricky_Level;

  typedef enum  FT_Face_Tricky_CheckType_
  {
    FT_TRICKY_FACE,        /* check all rules */
    FT_TRICKY_FAMILYNAME,  /* check family name only */
    FT_TRICKY_SFNT_TLS     /* check TrueType tag/len/checkSum only */
  } FT_Tricky_CheckType;

  /*
   * rule would be:
   *   "FAMILYNAME=MingLiU"
   *   "SFNT_TAG_LEN_SUM_NONAME=fpgm,0x000087c4,0x28233bf1"
   */

  FT_EXPORT_DEF( FT_Error )
  FT_Library_TrickyFontList_Add( FT_Library  library,
                                 FT_String*  rule );

  FT_EXPORT_DEF( FT_Error )
  FT_Library_TrickyFontList_Remove( FT_Library  library,
                                    FT_String*  rule );

  FT_EXPORT_DEF( FT_Error )
  FT_Library_CheckTricky( FT_Library           library,
                          FT_Tricky_CheckType  type,
                          FT_Pointer           data,
                          FT_Tricky_Level*     level );

My current working source increases the size 10-15k bytes
for GNU/Linux on i386. If I remove the code for FT2-client
to manipulate the blacklist, the size of increase would be
reduced. Of course, it must be configurable to be enabled/
disabled.

-rw-r--r-- 1 sssa sssa 2792896 2010-11-19 16:09 root-original/lib/libfreetype.a
-rw-r--r-- 1 sssa sssa 2805862 2010-11-19 16:11 root-mpsuzuki/lib/libfreetype.a

-rwxr-xr-x 1 sssa sssa 1910587 2010-11-19 16:09 
root-original/lib/libfreetype.so.6.6.1
-rwxr-xr-x 1 sssa sssa 1919853 2010-11-19 16:11 
root-mpsuzuki/lib/libfreetype.so.6.6.1

Another solution would be... now tt_check_trickyness() is
defined an internal function, but changing it to a callback
function that FT2-client can register for FT_Library.


If further discussion is needed, I will remove the code
for FT2-client to manipulate the blacklist, and commit
MingLiU identification by hardwired blacklist with sfnt
table tag/length/checksum.

Regards,
mpsuzuki

On Wed, 17 Nov 2010 22:41:10 +0900
address@hidden wrote:

>On Wed, 17 Nov 2010 07:55:17 -0500
>David Bevan <address@hidden> wrote:
>>> In general scope, I think, you raised a concern that
>>> the checksum in TTF header is too simple (it's a sum
>>> of 32-bit values of the table) to guarantee the identity.
>>> It's reasonable.
>>
>>My concern is that the (small) tables may actually be
>>the same in a variety of fonts.
>
>Ah, I see. Your concern is not the conflict of the
>checksum (or other hash value), but the conflict of
>the table itself. To identify more correctly, other
>tables (that are preserved in the subsetting) should
>be compared... I'm understanding correctly?
>
>>> If I check the fonts bundled to Microsoft Windows,
>>> Mac OS (Classic & OS X), and distributed in GNU/Linux
>>> distribution and I find no conflict, is it sufficient
>>> guarantee? 
>>
>>That seems reasonable.
>
>OK, I will try.
>
>>> If not, I cannot access wider coverage
>>> of the fonts, so, the possible solution would be...
>>> 
>>> 1) identify by family name comparison is used too,
>>>    and add a fallback by sfnt table checksum.
>>> 
>>> 2) in addition to the tag-name of the table and
>>>    the checksum, the length should be checked.
>>>
>>> 3) if it's still insufficient... should we use
>>>    our own hash value instead of the checksum
>>>    in sfnt header.
>>
>>Only 1) would address my concern.
>
>1) is already implemented in my proof of concept patch.
>
>http://lists.nongnu.org/archive/html/freetype-devel/2010-08/msg00019.html
>
>Regards,
>mpsuzuki
>
>_______________________________________________
>Freetype-devel mailing list
>address@hidden
>http://lists.nongnu.org/mailman/listinfo/freetype-devel

Attachment: collision-1-sfnt-sum-win7.txt.rz
Description: Binary data

Attachment: collision-3-sfnt-sum-win7.txt.rz
Description: Binary data

Attachment: collision-4-sfnt-sum-win7.txt.rz
Description: Binary data

Attachment: mps20101119a.diff.rz
Description: Binary data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]