emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: maximum buffer size exceeded


From: Stefan Monnier
Subject: Re: maximum buffer size exceeded
Date: Wed, 05 Sep 2007 11:00:43 -0400
User-agent: Gnus/5.11 (Gnus v5.11) Emacs/23.0.50 (gnu/linux)

>> I think the current view in Emacs development is that 64-bit platforms
>> solve this problem so easily that its solution for 32-bit machines is
>> much less important than working on other Emacs features.

> Actually, I think a small trick could increase the buffer size to 1 GB
> on 32 bit machines at the cost of a little(?) wasted memory.

> [Note: Assuming USE_LSB_TAG is defined]

> Currently, we have the lowest 3 bits reserved for the Lisp Type,
> meaning that the largest positive Emacs integer is 2^28-1 (256MB).

> Now, consider if we reserve 4 bits for the Lisp Type, but
> in such a way the Lisp_Int == 0, while the other Lisp types
> are odd numbers 1,3,5,7,...

> In this setup, an integer can be recognized by looking at the lowest
> bit alone (== 0), while the other Lisp types are recognized using the
> current methods (looking at all 4 type bits).

> The only drawback I can see is that Lisp_Objects have to be allocated
> on 16 byte boundaries rather than the current 8 byte boundary, so a
> little space may be wasted (and maybe not...).

> I haven't tried this, but given that Lisp_Objects are usually accessed
> via suitable macros, it looks quite doable.

Increasing from 8 to 16 bytes alignment may be a non-trivial problem:
1 - cons cells use 8 bytes right now, so you'd waste a lot of space for them.
2 - same for floats.
3 - in many places, we rely on malloc to align objects on multiple of 8, so
    we'd have to use some other approach.

Numbers 1 and 2 can be solved by giving two tags to cons and floats, so they
only need alignment on multiple of 8.

Number 3 is more work.  But this work may be the same as the one needed to
allow us to use USE_LSB_TAG everywhere (even on machines where malloc and
static-vars do not guarantee mult-of-8 alignment).

We currently have 7 different types (of the 8 possible tag we only use 7).

My own local Emacs build uses the trick you suggest but on the 3bits of
tags, so I gave 2 tags to integers to allow them to grow up the 2^29
(i.e. max buffer size = 512MB).  That's a very simple change.

What you suggest would be to use 4 bits i.e. 16 possible tags:
- 8 tags for integers (i.e. 8 tags left for the 6 other types)
- 2 tags for cons cells (6 tags left for the 5 other types)
- 2 tags for floats
- one tag each for the remaining 4 types (arrays, symbols, strings, misc).

One other problem: currently `misc' objects need 5 32bit words which
USE_LSB_TAG forced to round up to 6 32bit words and symbols use 6 32bit
words.  So rounding up to mult-of-16 would round them both up to
8 32bit words.

The two subtypes of misc which use up 5 words are markers and overlays.
So with your rounding up, an overlay would use up 3*8=24 words (3 because
there's the overlay object plus the two associated marker objects) instead
of 15 (without USE_LSB_TAG) or 18.

I had plans to try and squeeze `misc' objects down to 4 words (and hence
overlays down to 12 words), but this is a non-trivial change.  One possible
approach is to replace the linked lists of overlays and markers by arrays
(managed just like buffer text: with a gap).

Another option is to remove the `symbol' and `string' tags and make symbols
and strings subtype of `misc'.  Then we could keep 3 tag bits and give 4 of
the 8 tags to integers.  This would simplify the alloc.c code but would also
waste more memory (6 words for string objects) and slow down SYMBOLP and
STRINGP slightly.

Still, the fundamental problem remains the same: files larger than 256MB
are most likely not generated manually.  So they may very likely grow to
more than 4GB tomorrow.  Bumping the limit to 512MB or 1GB (or even 4GB for
that matter) is only going to help in some fraction of the cases.

I think a better approach to handle this problem is to create a special
package to visit arbitrarily large files which would work by loading only
parts of the file at a time and do manual "swapping".  This would not work
as smoothly, but then again manipulating 256MB files in Emacs is currently
not that smooth either.


        Stefan


PS: You can supposedly open >4GB files in Emacs with 64bit systems, but
looking at the C code, it's clear that you'll bump into bugs where we cast
EMACS_INT values to and from `int' (which on many 64bit systems are only
32bit).  I tend to fix those bugs when I bump into them, but they're
everywhere and I've fixed only a tiny fraction of them.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]