Re: [Openexr-devel] EXR texture memory overhead

From:

Larry Gritz

Subject:

Date:

Fri, 16 Sep 2016 10:53:42 -0700

Aha, thanks, Karl.

So it's allocating an internal (retained, as long as the file is kept open) framebuffer that's a whole *row* of tiles. This is probably fine when you have a small number of files open at once and they are mostly reading whole images (so, sure, the app is probably asking for whole rows, or more, of tiles at a time).

But it's not optimal for a use pattern like TextureSystem where the typical request is ONE tile, and the next tile it wants may not even be adjacent.

Back-of-envelope: let's say our textures are RGBA, half data, 4k x 4k, with 64x64 tiles. So a row of tiles is 2MB of this internal framebuffer overhead, per open texture file (which, as we've said, could be hundreds or thousands at once, so could easily be multiple GB of unaccounted overhead).

Perhaps, in light of this, it might be a good idea for this kind of access pattern if IlmImf had a way to communicate that a particular file was going to tend to read individual tiles independently, in which case the internal framebuffer scratch space could be just a single tile's worth, not a whole "tile row" of scratch space?

Wait, I'm not quite sure how threads play into this. Is this allocated framebuffer part of the ImageInptut itself? Do threads lock to use it? Or is this per thread, per file?

On Sep 16, 2016, at 10:31 AM, Karl Rasche <address@hidden> wrote:

1. The amount of memory that libIlmImf holds *per open file* as overhead or internal buffers or whatever (I haven't tracked down exactly what it is) is much larger than what libtiff holds as overhead per open file.

I *think* this is related to what you're seeing, at least in part

ImfInputFile.cpp line 678

2. libIlmImf seems to have a substantial amount of memory overhead *per thread*, and that can really add up if you have a large thread pool. In contrast, libtiff doesn't have a thread pool (for better or for worse), so there isn't a per-thread component to its memory overhead.

Some of that probably stems from the framebuffer model -- You don't decode directly into the user-provided buffer, but instead into a temp buffer which is copied into the user-provided buffer and reformatted as requested.

That avoids things like tons of extra decodes of a scanline strip of you are walking it scanline by scanline. But in the context of texturing, where you're always reading a full tile into cache, it's just overhead. There might be a sneaky way to flush that backing data, like assigning a null framebuffer or something.

--
Larry Gritz
address@hidden