bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#74547: 31.0.50; igc: assertion failed in buffer.c


From: Geza Herman
Subject: bug#74547: 31.0.50; igc: assertion failed in buffer.c
Date: Wed, 4 Dec 2024 20:11:18 +0100
User-agent: Mozilla Thunderbird


On 12/1/24 22:15, Pip Cet wrote:
"Geza Herman" <geza.herman@gmail.com> writes:
    On 12/1/24 16:48, Pip Cet wrote:
Gerd M¶llmann <gerd.moellmann@gmail.com> writes:

    Back then, the future of the new GC was a question, so Gerd said
    (https://lists.gnu.org/archive/html/emacs-devel/2024-03/msg00544.html)
    that
    "Please don't take my GC efforts into consideration. That may succeed
    or not. But this is also a matter of good design, using the stack,
    (which BTW pdumper does, too), vs. bad design." That's why we went with
    the fastest implementation that doesn't use lisp vectors for storage.
    But we suspected that this JSON parser design will likely cause a
    problem with the new GC. So I think even if it turned out that the
    current problem was not caused by the parser, I still think that there
    should be something done about this JSON parser design to eliminate
    this potential problem. The lisp vector based approach was reverted
    because it added an extra pressure to the GC. For large JSON messages,
    it doesn't matter too much, but when the JSON is small, the extra GC
    time made the parser measurably slower. But, as far as I remember, that
    version hadn't have the small internal storage optimization yet. If we
    convert back to the vector based approach, the extra GC pressure will
    be smaller (compared to the original vector based approach without the
    internal storage), as for smaller sizes the vector won't be actually
    used.
    G©za
Thank you for the summary, that makes sense. Is there a standard corpus
of JSON documents that you use to benchmark the code? That would be very
helpful, I think, since Eli correctly points out JSON parsing
performance is critical.

I'm not aware of such a corpus. When I developed the new JSON parser, the performance difference was so large so it was obvious that the new parser is faster. But I did benchmarks on JSONs which was generated by LSP communication (maybe I can share this one, if there is interest, but I need to anonymize it first), and also I did a benchmark on all the JSONs I found on my computer.

But this time, the performance difference is expected to be smaller, using lisp vectors shouldn't have a very large effect on performance. I'd check the performance with small JSONs, but large enough ones where the (non-internal) object_workspace is actually get used (make sure to run a lot of iterations, so the amortized GC time will be included in the result). For larger JSONs, we shouldn't have a difference, as all the other allocations (which store the actual result of the parsing) should hide the additional lisp vector allocation cost. At least, this is my theory.


My gut feeling is that we should get rid of the object_workspace
entirely, instead modifying the general Lisp code to avoid performance
issues (and sacrifice some memory in the process, on most systems).
object_workspace is only grown once for the lifetime of one parsing. Once it is grown to the needed size, the only extra cost when parsing a value is to copy the data to its final place from the object_workspace. Truncate based solution does the same copy, but it also needs to grow the hashtable/array for each value, so it executes more allocations and copies than the current solution. So I'd prefer if we kept object_workspace. If the only solution is to convert it to a lisp vector, then I think we should do that. But again, this is just my theory. If we try the truncate based solution, and if it turns out that it's not significantly slower, then it can be a good solution as well.






reply via email to

[Prev in Thread] Current Thread [Next in Thread]