chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Chicken-users] compiling jbogenturfa'i .scm => .c takes 2.5 hours


From: Alan Post
Subject: [Chicken-users] compiling jbogenturfa'i .scm => .c takes 2.5 hours
Date: Fri, 26 Nov 2010 09:07:01 -0700

Yesterday I tried to create the infrastructure of my package
jbogenturfa'i.  This package uses my also under development package
genturfa'i to compile Lojban's PEG grammar into a parser.

The process goes like this:

  $ cat gerna.scm \
        rafske_gumgau.scm \
        rafske.scm \
    | time genturfahi /dev/fd/0 > jbogerna.scm
  3m29.78s real ...

So it takes me 3.5 minutes to compile my grammar file, which is 78k
in size and contains ~880 non-terminal definitions.  The resulting
output file is 368k, which is certainly quite large.  I'd rather
this be faster, but I can certainly work with it.

I then try to compile this file as part of the jbogenturfa'i library. 
The .c file generated for my library is 60.6MB!  It takes 2.5 hours
to generate, at which point my build fails:

  cc1: out of memory allocating 262144 bytes after a total of 0 bytes

I suspect an OpenBSD ulimit is responsible for this out of memory
error, as by default OpenBSD does run with restrictive ulimits.
I haven't yet tried running the code with no ulimit to see if I can
actually compile the code.  Taking 2-3 hours to compile isn't
something I can work with, so I need to find out why my compile is
generating such large files and what I can do about it.

Let me describe the compiler using a toy example, so you can see the
basic framework I'm trying to compile.  In Chicken Gazette #5, the
Tony Garnock-Jones' packrat parser was introduced, and it contained
roughly this grammar, though I've optimized it for features present
in the PEG specification that are not present in the packrat egg:

  expr   <- mulexp #\+ mulexp
          / mulexp
  mulexp <- simple #\* simple
          / simple
  simple <- digits
          / #\( expr #\)
  digits <- [[:digit:]]+

You can see that expr uses mulexp, which uses simple, which uses
digits and expr.  The point here is that there are both backward and
forward references in the grammar.  expr indirectly uses simple,
which directly uses expr.  One non-terminal may refer to another
non-terminal yet to be defined, and rules may be mutually recursive.

genturfa'i generates the following parser for this grammar:

  (define gerna
    (letrec ((expr (nunjavni-morji
                     (nunjavni-jonai
                       (nunjavni-je
                         (nunjavni-naselci mulexp)
                         (nunjavni-lerfu #\+)
                         (nunjavni-naselci mulexp))
                       (nunjavni-naselci mulexp))))
             (mulexp
               (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-je
                     (nunjavni-naselci simple)
                     (nunjavni-lerfu #\*)
                     (nunjavni-naselci simple))
                   (nunjavni-naselci simple))))
             (simple
               (nunjavni-morji
                 (nunjavni-jonai
                   (nunjavni-naselci digits)
                   (nunjavni-je
                     (nunjavni-lerfu #\()
                     (nunjavni-naselci expr)
                     (nunjavni-lerfu #\))))))
             (digits (nunjavni-morji (nunjavni-re "[[:digit:]]+"))))
      expr))

Notice that all of my non-terminals are defined as records in
letrec: expr, mulexp, simple, and digits.  When I need to refer to
another non-terminal, I use a macro, nunjavni-naselci.  This macro
is essentially:

  (nunjanvi-naselci foo) => (lambda x (apply foo x))

I'm capturing the definition of foo in a closure, so I can refer to it
even if it hasn't been defined in the letrec yet.  

This is all working just fine on grammars the size of this example
here.  It is taking far too long on Lojban's substantially more complex
grammar.

Looking at the above code, can anyone guess why the C file I'm
generating is so large?  Does an ~880 element letrec cause problems?
Can you suggest a modification to my compiler which will compile in a
reasonable amount of time?

I've checked in the latest code for this package:

  http://bugs.call-cc.org/browser/release/4/jbogenturfahi/trunk

Note that this is going to try to run and generate a salmonella
report.  If adding 2-3 hours to the build time for the salmonella
report is a  problem, I'll add a syntax error to my .setup file until
I figure out how to reduce the build time.  I am curious as to whether
another machine is able to compile this egg.

I can also provide the .c file generated by the failed compile, if anyone
would like to look at it.  Because it is 61MB, I'll have to upload it to
my webhost for download, so please reply here or privately if you would
like to see it and I'll provide the URL.

The .c file does end with: /* end of file */

So it is fully generating.

-Alan
-- 
.i ko djuno fi le do sevzi



reply via email to

[Prev in Thread] Current Thread [Next in Thread]