groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Merging the new gropdf


From: G. Branden Robinson
Subject: Re: Merging the new gropdf
Date: Tue, 7 Nov 2023 13:48:51 -0600

Hi Deri,

At 2023-11-07T17:33:06+0000, Deri wrote:
> On Monday, 6 November 2023 17:47:13 GMT G. Branden Robinson wrote:
> > Slowly.  I landed two small changes this weekend, but they're not
> > things most folks are looking for.
> 
> Thanks for this.

Hoping to do more soon!

> > * The sheer magnitude of changes to gropdf.pl itself, and my own
> >   ignorance of details of PDF that the new functionality is
> >   implementing.
> 
> There are two main areas of change. The first is rectifying my design
> mistake in the original gropdf. It used the "t" command from groff as
> the primary command as a series of input characters which would be
> converted to postscript glyphs, all other text commands (for example
> "c") were converted back to their input character and treated as a
> single character "t" command. I was focussed on the groff font rather
> than the postscript font.
> 
> While thinking about font subsetting it became clear it made more
> sense to convert all input to postscript glyph names immediately, and
> use them as the "common currency" rather than focus on words. This
> particularly makes sense when dealing with non-latin input which has
> been processed with preconv. It is also makes it much more natural
> when dealing with font subsetting. Previously this was not necessary
> because the whole font was embedded by gropdf.
> 
> The second major change is the addition of a type 1 font parser and
> code to generate a font which only contains the glyphs required by the
> document being processed. This is the area which needs the most
> testing. I have tested with dozens of fonts that this parser is robust
> enough, but there are thousands of fonts out there. It seems to be
> happy with fonts produced by fontforge, which is promising.

This is super helpful!  I plan to quote a lot of this in the commit
message when I land the changes to gropdf.pl (except for the bit you
want me to hold off on).

So that I understand you, gropdf-ng basically emits _only_ "c" and "C"
commands to render text, and never "t" anymore?  What happens if track
kerning is required (the "u" command)?

> > * An uneasiness I feel about some of the solutions you adopted
> >   insofar as they have effect outside of gropdf.pl itself.  For
> >   instance:
> > 
> >   1.  Changing the format of font description files to add yet
> >       another field, mapping character names to Unicode code points.
> >       In the rest of groff, this is not necessary because we have
> >       glyphuni.cpp.
> > 
> >      
> > https://git.savannah.gnu.org/cgit/groff.git/tree/src/libs/libgroff/glyphuni
> > .cpp
> > 
> >       I'd like to honor the DRY principle here.  What's a good way
> >       to achieve that?
> 
> I'm sure you have noticed that glyphuni.cpp has 433 mappings from
> groff character names to unicodes, and afmtodit has 4089 mappings
> between postscript glyph names and unicodes (the mappings are also
> algorithmically generated rather than manually hard coded as in the
> case of glyphuni.cpp).

I certainly had not counted them.

> The mapping from postscript glyph to unicode is more appropriate for
> gropdf, the mapping from groff character names is meant for the input
> side of groff, where "glyph_name_to_unicode" is used solely by
> input.cpp. The use of the word glyph in the subroutine name is a bit
> confusing since it gives the false impression it is concerned with
> postscript glyphs.

Fair point.  I've had my own issues with groff's internal names for
things...repeatedly.

> I could add the mapping tables from afmtodit to the gropdf code and
> drop the new column from the groff fonts created by afmtodit, but my
> understanding of DRY principles is to avoid such duplication, have I
> got it wrong?
> 
> When I looked at the code for the rest of groff, adding an extra
> column had no effect on code which processed the groff font files.

I don't have a fundamental objection; my concern at this point is
understanding what you did well enough to document it.  If I succeed,
then I usually feel I grasp what's going on well enough to have an
opinion about the design.  It's hard for anyone but the implementor to
go the other direction.

> >   2.  I don't know the provenance of a new font you have proposed
> >       for shipping with groff, StandardSymSL.pfb.  We need to make
> >       sure it is freely licensed.  If it is mechanically generated
> >       from a PostScript Type 1 font that we can expect to find on
> >       the system, maybe we should perform that procedure during the
> >       build.  (On the other hand, I'm not sure I love the idea of
> >       adding a build-dependency on fontforge or similar.)
> 
> This is the font which you said:-
> 
> "Looks great! It's not led astray by the superscripts or anything."
> 
> In this message to the list:-
> 
> https://lists.gnu.org/archive/html/groff/2023-06/msg00114.html
> 
> I described the provenance of the font. I don't think it is a good
> idea to generate the font and introduce a dependency on fontforge. I
> would put it in a similar category to the Euro font we currently
> distribute, i.e. a font to make everything work smoothly.

Thank you--that conversation had aged out of my cache.  I did remember
being happy with improved equation typography.

> >   3.  The new `stringhex` request you've proposed for troff.  As
> >       noted elsewhere, I'd prefer to solve this a different way.
> > 
> >       https://savannah.gnu.org/bugs/index.php?63074
> > 
> >       ...but I haven't implemented my idea yet, so I don't object to
> >       `stringhex` as a temporary measure.
> 
> I'm happy to drop stringhex for a better solution, if it handles the
> problem in this bug and the problems in these bugs:-
> 
> https://savannah.gnu.org/bugs/?62264
> https://savannah.gnu.org/bugs/?64576

I think that none of these issues are tightly coupled with the others in
the design sense, but in practical application, one might well employ a
string iterator to decide how to rewrite a string that one subsequently
sends to the device with `.output` or `\X`.

I got fairly far along with my `for` request; I need to pick that back
up.

> > * There were _tons_ of seemingly unrelated whitespace changes to
> > gropdf.pl, which frustrates code review.  (This has happened before;
> > I don't remember when, but Dave might.)  I went through the file and
> > attempted to impose a consistent style on it, but I'm not sure how
> > you'll feel about it.  More importantly, it would be helpful to get
> > your text editor to do better here.
> 
> I have attached an html version of how gropdf looks on my system. Tabs
> are spaced at 8 character intervals, indents are 4 characters. 

Okay.  Unfortunately that didn't help me much.

The commit a320b1357a807988c720bf482f17901ee5a24baf replaced tabs with
spaces with very high reliability, as if you'd reconfigured your text
editor.  There are significant amounts of new code, and those used
spaces for indentation as well, except in a few places.

I'll paste including the (UTF-8-encoded) Vim "listchars" that I use to
try to keep myself from making whitespace errors.

+        if ($options & SUBSET)
+        {
+            $lenIV=$1 if $head=~m'/lenIV\s+(\d+)';
+            my $l=length($body);
+            my 
$b=($gotinline)?decrypt_exec_C($body,$l):decrypt_exec_P(\$body,$l);
+»‥‥‥‥‥‥    $body=substr($body,$lenIV);
+»‥‥‥‥‥‥    $body=~m/begin([\r\n]+)/;
+»‥‥‥‥‥‥    $term=$1;
+»‥‥‥‥‥‥    if (defined($term))
+»‥‥‥‥‥‥    {
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥(@bl)=split("$term",$body);
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥map_subrs(\@bl);
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥Subset(\@bl,$glyphs);
+            }
+            else
+            {
+                Warn("Unable to parse font '$fnt->{internalname}' for 
subsetting")
+            }
+        }

+        my $miss=-1;
+        my $CharSet=join('',@{$fnt->{CHARSET}->[$j]});
+»‥‥‥‥‥‥push(@{$chars->[$j]},'u0020') if $j==0 and 
$fnt->{NAM}->{u0020}->[PSNAME];

+    if (exists($fnt->{fontfile}))
+    {
+»‥‥‥‥‥‥if ($options & SUBSET and !($options & NOFILE))
+»‥‥‥‥‥‥{
+»‥‥‥‥‥‥    if (defined($term))
+»‥‥‥‥‥‥    {
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥$body=encrypt(\@bl);
+»‥‥‥‥‥‥    }
+»‥‥‥‥‥‥}
+
+»‥‥‥‥‥‥if (defined($fobj))
+»‥‥‥‥‥‥{
+»‥‥‥‥‥‥    $obj[$fobj]->{STREAM}=$head.$body.$tail;
+»‥‥‥‥‥‥    $obj[$fobj]->{DATA}->{Length1}=length($head);
+»‥‥‥‥‥‥    $obj[$fobj]->{DATA}->{Length2}=length($body);
+»‥‥‥‥‥‥    $obj[$fobj]->{DATA}->{Length3}=length($tail);
+        }
+
+        foreach my $o (@fontdesc)
+        {

+    if (!exists($obj[$o]->{XREF}))
+    {
+»‥‥‥‥‥‥if ($PDFver==5 and !exists($obj[$o]->{STREAM}) and 
ref($obj[$o]->{DATA}) eq 'HASH')
+»‥‥‥‥‥‥{
+»‥‥‥‥‥‥    # This can be put into an ObjStm
+»‥‥‥‥‥‥    my $maj=int(++$objidx/128);
+»‥‥‥‥‥‥    my $min=$objidx % 128;
+
+»‥‥‥‥‥‥    if ($maj > $omaj)
+»‥‥‥‥‥‥    {
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥$omaj=$maj;
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥BuildObj(++$tobjct,
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥{
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥    'Type' => '/ObjStm',
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥}
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥);
+
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥$obji[$maj]=[$tobjct,0,'',''];
+»‥‥‥‥‥‥»‥‥‥‥‥‥‥$obj[$tobjct]->{DATA}->{Extends}=($tobjct-1)." 0 R" if $maj > 0;
+»‥‥‥‥‥‥    }
+
+»‥‥‥‥‥‥    $obj[$o]->{INDIRECT}=[$tobjct,$min];
+»‥‥‥‥‥‥    $obji[$maj]->[1]++;
+»‥‥‥‥‥‥    $obji[$maj]->[2].=' ' if $obji[$maj]->[2];
+»‥‥‥‥‥‥    $obji[$maj]->[2].="$o ".length($obji[$maj]->[3]);
+»‥‥‥‥‥‥    PutObj($o,\$obji[$maj]->[3]);
+»‥‥‥‥‥‥}
+»‥‥‥‥‥‥else
+»‥‥‥‥‥‥{
+»‥‥‥‥‥‥    PutObj($o);
+»‥‥‥‥‥‥}
+    }

+»‥‥‥‥‥‥»‥‥‥‥‥‥‥$fnt{NAM}->{$r[0]}=$fnt{NAM}->{$lastnm};

+        # Real font needs subsetting
+»‥‥‥‥‥‥$fnt{fontfile}=$download{$fontkey};
+#         my ($head,$body,$tail)=GetType1($download{$fontkey});
+#         $head=~s/\/Encoding .*?readonly def\b/\/Encoding StandardEncoding 
def/s;
+#         $fontlst{$fontno}->{HEAD}=$head;
+#         $fontlst{$fontno}->{BODY}=$body;
+#         $fontlst{$fontno}->{TAIL}=$tail;
+        #         $fno=++$objct;
+        #       EmbedFont($fontno,\%fnt);

You could fix this up in your branch--but you'll likely have to fix
merge conflicts in subsequent commits--and then force-push.  Or I can
apply the tab-restoring indentation fix-up commit I attached as part of
my previous mail.

What would you like to do?

> > Also your most recent commit to your branch says that it's starting
> > work on a new thing.  Should I omit that from my merge?
> > 
> > https://git.savannah.gnu.org/cgit/groff.git/commit/?h=deri-gropdf-ng&id=a2b5
> > 541142a1571e9f9f5a8321c1e21c721469aa
> 
> Yes, please drop this. It is my next project, text decorations. Peter
> asked for underlining a long time ago. Mom postscript has a nifty
> piece of postscript code (courtesy of Tadziu I think) which underlines
> text. In PDFs underline is one of the text decorations, so I'm hoping
> to expose an API for text decoration as the next project.

Understood, will do.

> > I'm attaching a "git log -p --format=fuller" of my staging branch so
> > you can see where I am.
> > 
> > I look forward to hearing your thoughts on next steps.
> 
> Keep going. :-) 

I have another proposal: use *roff-compatible register format syntax
when assigning page number identifiers to bookmarks.  If I understand
your code correctly, no impedance matching with a PDF feature is
necessary; the page number labels are arbitrary.

I refer to this:

+            elsif (lc($xprm[1]) eq 'pagenumbering')
+            {
+                # 2=type of [D=decimal,R=Roman,r=roman,A=Alpha 
(uppercase),a=alpha (lowercase)

I think it would be more comfortable for experienced *roff users to use
_its_ convention for number formats.  See CSTR #54 §8, or:

https://www.gnu.org/software/groff/manual/groff.html.node/Assigning-Register-Formats.html

Popping the stack...

As weird as it may sound, I think fixing the code style confusion may
make the rest easier to understand.  Let me know what tack you want to
take.

Regards,
Branden

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]