bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#21509: 25.0.50; X11 error: BadPixmap when creating first emacsclient


From: Dima Kogan
Subject: bug#21509: 25.0.50; X11 error: BadPixmap when creating first emacsclient frame; and memory leak
Date: Sat, 14 Oct 2017 20:48:33 -0700
User-agent: mu4e 0.9.19; emacs 25.2.2

Hi. Sorry it took me this long to get back to you. I'll try to reply in
a more timely way if you have more requests. Notes inline


martin rudalics <rudalics@gmx.at> writes:

>  >> IIUC your numbers of Lucid with scrollbars now coincide with the numbers
>  >> of Lucid without scrollbars before the "fix".
>  >
>  > No, that's not right.
>  >
>  > Lucid with scrollbars post-fix is the blue line: memory usage is stable
>  > as frames are created/destroyed: the leak is ~0
>  >
>  > Lucid without scrollbars pre-fix is the green line. Memory usage is
>  > climbing. We aren't leaking as badly as the other cases, but we ARE
>  > leaking.
>  >
>  > So the fix resolved the large leakages in the other cases and also the
>  > small leakages that weren't scroll-bar-related.
>
> I can't follow you.  before.no.scroll.log has
>
> 16112 19608 S ?        00:00:00 emacs
> ...
> 16112 29700 S ?        00:00:12 emacs
>
> while after.yes.scroll.log has
>
> 17508 19652 S ?        00:00:00 emacs
> ...
> 17508 29160 S ?        00:00:13 emacs
>
> which strike me as very similar (especially given the figures of the two
> other logs).  What am I missing?

I wasn't consistent in how long I was running each trial, so by simply
looking at the final memory usage values as you're doing above, you're
looking at different runtimes. The plot attached the last time shows all
of the raw data, and you should clearly see the different slopes of the
two trials: i.e. one is leaking and the other is not.



>  >> OTOH the numbers for GTK largely coincide with those of Lucid with
>  >> scrollbars before the "fix". So X itself seems much more dominant than
>  >> any toolkit particularities.
>  >
>  > I don't think this is right either. Lucid with scrollbars pre-fix is the
>  > purple line. We leak memory at a high, constant rate. GTK memory usage
>  > (yellow) is noisy and fragmented (I bet we're invoking malloc/free much
>  > more often). The baseline memory consumption is higher, the past that,
>  > the leak rate isn't nearly as bad as the purple. The higher
>  > fragmentation means that the internals of malloc() matter too: I invoked
>  > malloc_trim() just after t=450s, and we see the memory usage dropped
>  > sharply as a result.
>
> I can't tell that since it's not in your figures.  But again, the raw
> data from before.log and after.yes.scroll.gtk.log seem similar too.

Again, the plot is much more descriptive. The GTK case uses a lot more
memory upfront and then leaks slowly. Conversely, the lucid
no-fix-yes-scrollbars case uses much less upfront, but then leaks much
faster. In the arbitrary-length trials from the previous email, they end
up at roughly the same place. But that's a coincidence.



>  >> This does not explain any difference between the GTK (before the "fix")
>  >> and Lucid (after the "fix") behaviors.  What happens with GTK when you
>  >> allow it to delete the terminal by allowing terminal->reference_count
>  >> drop to zero?  If it does not crash immediately, is the memory leak more
>  >> heavy than it is now?
>  >
>  > I haven't run that experiment, but I could do that. Probably won't get
>  > to it for a week, though.
>
> Please try.  If you succeed running it, the resulting difference between
> the "before" GTK and Lucid runs should IMHO show the noise introduced by
> malloc/free more clearly.  And obviously running the before/after
> experiments for Motif would be interesting too: The Motif build crashed
> in the menu bar section while the Lucid build crashed in the scroll bar
> section.
>
>  >> Also, could you try whether changing  gc-cons-threshold  in either
>  >> direction has any impact on the occurrence of the toolkit bug or the
>  >> growth of the memory leak?  Once I thought that this could affect the
>  >> frequency of the error but didn't get any conclusive results.

I ran a few more trials to get the data you asked for. All of these:

- Are at the same baseline revision as before: 979797b9eca0ab. This is
  after your lucid "fix"

- Are 4 minutes long

- invoke malloc_trim() at the end, so the noticeable drop in resident
  memory use at the end of each trial is visible

The different trials are

- GTK stock: has the reference-counting logic to prevent
  terminal->reference_count reaching 0

- GTK without the refcount logic: has NO reference-counting logic to
  prevent terminal->reference_count reaching 0

- Lucid stock

- Lucid with no refcount logic and stock 800k gc-cons-threshold

- Lucid with no refcount logic and 80k gc-cons-threshold

- Lucid with no refcount logic and 8000k gc-cons-threshold

Raw data and accompanying plot attached. Plot made thusly:

cat \
    <(< gtk.stock.log awk '{print NR,"gtk.stock.log",             $2}') \
    <(< gtk.no.refcount.logic.log awk '{print NR,"gtk.no.refcount.logic.log",   
$2}') \
    <(< lucid.no.refcount.logic.gc80k.log awk '{print 
NR,"lucid.no.refcount.logic.gc80k.log",   $2}') \
    <(< lucid.no.refcount.logic.gc800k.log awk '{print 
NR,"lucid.no.refcount.logic.gc800k.log",  $2}') \
    <(< lucid.no.refcount.logic.gc8000k.log awk '{print 
NR,"lucid.no.refcount.logic.gc8000k.log", $2}') \
    <(< lucid.yes.refcount.logic.log awk '{print 
NR,"lucid.yes.refcount.logic.log",           $2}') \
    | feedgnuplot --lines --dataid --domain --autolegend --xlabel 'frame index 
(2 seconds per frame)' --ylabel 'Memory consumed (kB)'


As expected, turning off the refcounting logic for the GTK terminal
makes it crash long before the trial ends. And it leaks lots of memory
in the meantime: ~ 430kB/frame

For whatever reason, the gtk trial is much more consistent this time. It
leaks at ~ 15kB/frame, although malloc_trim() gives back 1300kB at the
end, so the 15kB/frame is an overestimate, as far as emacs is concerned
at least.

As before, adding the refcounting logic to the lucid terminal drops the
leak rate to 0.

Tweaking gc-cons-threshold does have an effect, but it's not clear what.
The default is 800kB. Looks like the 80k and 800k settings produce
similar leaks at 54kB/frame.

An 8000kB setting produces a sharp climb of about 8000kB above the
baseline, and then a slower leak of ~44kB/frame, although this isn't as
linear as the others.

Does any of this speak to you?



Attachment: lucid.no.refcount.logic.gc80k.log
Description: Binary data

Attachment: lucid.no.refcount.logic.gc800k.log
Description: Binary data

Attachment: lucid.no.refcount.logic.gc8000k.log
Description: Binary data

Attachment: lucid.yes.refcount.logic.log
Description: Binary data

Attachment: gtk.stock.log
Description: Binary data

Attachment: gtk.no.refcount.logic.log
Description: Binary data

Attachment: emacs21509_2017-10-14.pdf
Description: Adobe PDF document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]