freetype-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Devel] Advice on Font Subsetting


From: Salman Khilji
Subject: [Devel] Advice on Font Subsetting
Date: Wed, 24 Mar 2004 18:49:48 -0800
User-agent: KMail/1.5.1

Okay.  So I have equipped myself with knowledge on how to subset Type1 fonts 
for embedding in PDF.  For this, I consulted the dvipdfmx project.

Unfortunately, I don't want to use the code from dvipdfmx because the code is 
1)  dependent on kpathsea, and 2) the code relies on the existence of a 
corresponding tfm file---some metrics are read from the tfm file instead of 
the afm file.  This would make my project dependent on an installed
TeX distribution---which is what I don't want.  Moreover, FreeType in my 
opinion is much easier to read than dvipdfmx.

So I want to use basically the same logic to create a Type1 subsetted font
using FreeType.  I familiarized myself with how FT parses Type 1 fonts and
would like some advice please.

1)  First of all, we need an array that stores the indices of the used
 glyphs. We can use something like:

FT_Byte * used_glyphs = calloc( sizeof(FT_Byte), face->num_glyphs );

Then we use the  FT_Get_Char_Index() repeatedly on a text script to get the 
glyph index corresponding to a charcode.  We can then mark the used glyphs in 
the used_glyphs array.  Anything that is 0 in the used_glyphs can be thrown 
out.


2)  Then basically you start reading the pfb file and start copying it into
another buffer as is.  dvipdfmx uses a memory based buffer for this purpose.
Shall I used a file based buffer?  pfb files can be potentially large so I am
concerned about wasting memory here.  Though file based buffer requires 
creation of a temp file, which the client has to read back.  Which one shall
I use?


3)  The parser needs modification.  We have functions like:  
T1_Skip_PS_Token() and T1_Skip_Spaces().  These functions increment the 
cursor from its current location.  Functions that read an integer token seem 
to consume the current token and increment the cursor.  I would have to 
modify it so that instead of throwing away the cursor's location, I would 
have to store the contents in the buffer from Step 2).

Right now I am thinking that I might need mods like this:

cur_before = cur;
T1_Skip_PS_Token( parser );
cur_after = cur;
Copy_Into_Buffer( buffer, cur_after - cur_before );

However, I don't feel like throwing all these subsetting related logic
throughout the parser.  Any suggestions from the FreeType gurus?  Maybe we 
need to create a specialized parser for subsetting (say T1_ParserEmbed).  It 
could have different functions for parsing that would store the input buffer 
and do other stuff.



4)  If I encounter an /Encoding entry that happens to be an array, then I
 must throw out all the glyphs that are not being used.  I will need access
 to the used_glpys array for this purpose?  Where shall we store the 
used_glyphs array?  Shall I modify the FT_FaceRec_ struct?


5)  The private dictionary needs to be decrypted.  FT currently does this.
After decryption, we throw away the glyphs that we don't need.  Everything
else in the private dictionary is copied verbatim.  We then need to reencrypt
the whole thing.  dvipdfmx does this.  I will need to add t1_enrcypt function
from the dvipdfmx project.  This is basically just a few lines of code.


6)  After the private dictionary, we copy everything verbatim to the buffer.



The hardest part is the parser of course.  Another approach would be to
 create a new parser that is a specialization of the current one.  The new
 parser's methods like skip_spaces can be modified to copy into the new
 buffer rather than simply skipping the input.  I definitely feel like I can
 use some suggestions here from the gurus.


7)  Would it make sense for me to contribute the code back and try to get it
into official FreeType distribution?  Is there any interest?


Salman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]