emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

compiled lisp file format (Re: Skipping unexec via a big .elc file)


From: Ken Raeburn
Subject: compiled lisp file format (Re: Skipping unexec via a big .elc file)
Date: Sun, 21 May 2017 04:44:01 -0400

I haven’t had much time to further the work on the big-elc approach recently, 
but there is one idea I want to toss out there for possibly improving the load 
time further: Changing the .elc file format to a binary one.  I’m not talking 
about a memory image like Daniel is working on.  I mean a file representing a 
sequence of S-expressions, but optimized for loading speed rather than for 
human readability.

The Guile project has taken this idea pretty far; they’re generating ELF object 
files with a few special sections for Guile objects, using the standard DWARF 
sections for debug information, etc.  While it has a certain appeal (making C 
modules and Lisp files look much more similar, maybe being able to link Lisp 
and C together into one executable image, letting GDB understand some of your 
data), switching to a machine-specific format would be a pretty drastic change, 
when we can currently share the files across machines.

I haven’t got a complete, concrete proposal, but I see at least a couple 
general approaches possible:

1) Follow the model of flat object file formats: Some file sections have data 
of various types (string content, symbol names, integer or floating constants); 
others (the equivalent of standard object file “relocation” data) would provide 
info on how to allocate and fill in the container objects (pairs, vectors, etc) 
desired, with references to the symbols or strings or other container objects.

2) Continue to use the current recursive processing, but with a binary format.  
Some (byte? word?) value indicates “this is string data”, it’s followed by a 
byte count and that many bytes of string content (always using the Emacs 
internal encoding, so we don’t have to translate when reading).  Another value 
indicates an integer constant.  Another value indicates a vector, and is 
followed by a length and then that many other values, which are each processed 
recursively before we get back to the object following the vector.  Each 
object’s initializer’s length is dependent on the type, and for container 
types, the values contained within.

Either way, getting away from the expensive one-character-at-a-time processing, 
multibyte coding, escape processing, etc., and pushing around groups of bytes 
whenever possible should save us time.

This would be useable not just for the dumped.elc file, but for other compiled 
Lisp files as well, whether in the distribution or from ELPA or the user’s own 
code.

I did throw together a half-baked attempt to try some of this out.  I added a 
new “#” construct for unibyte strings, putting the byte count into the file so 
that the string data could be copied with fread() instead of a READCHAR loop.  
I also added a new version of the “#n#” syntax that uses a fixed number of 
READCHAR calls and avoids the decimal arithmetic.  So, the file can no longer 
be processed as Lisp, and it still requires some text parsing, though not 
nearly as much as before; some of the worst of both worlds.  But the load time 
for dumped.elc did drop by another 12% in my tests (start in batch mode, print 
a message and exit, from 0.227s down to 0.2s or less per run, still loading a 
couple of standard-elc-format files during startup).

I’m curious if people think this might be an approach worth pursuing.  Or if 
the Lisp-based elc format is seen as advantageous in ways I’m not seeing….

Ken


reply via email to

[Prev in Thread] Current Thread [Next in Thread]