|
From: | Geza Herman |
Subject: | bug#74547: 31.0.50; igc: assertion failed in buffer.c |
Date: | Wed, 4 Dec 2024 20:11:18 +0100 |
User-agent: | Mozilla Thunderbird |
On 12/1/24 22:15, Pip Cet wrote:
"Geza Herman" <geza.herman@gmail.com> writes:On 12/1/24 16:48, Pip Cet wrote: Gerd M¶llmann <gerd.moellmann@gmail.com> writes: Back then, the future of the new GC was a question, so Gerd said (https://lists.gnu.org/archive/html/emacs-devel/2024-03/msg00544.html) that "Please don't take my GC efforts into consideration. That may succeed or not. But this is also a matter of good design, using the stack, (which BTW pdumper does, too), vs. bad design." That's why we went with the fastest implementation that doesn't use lisp vectors for storage. But we suspected that this JSON parser design will likely cause a problem with the new GC. So I think even if it turned out that the current problem was not caused by the parser, I still think that there should be something done about this JSON parser design to eliminate this potential problem. The lisp vector based approach was reverted because it added an extra pressure to the GC. For large JSON messages, it doesn't matter too much, but when the JSON is small, the extra GC time made the parser measurably slower. But, as far as I remember, that version hadn't have the small internal storage optimization yet. If we convert back to the vector based approach, the extra GC pressure will be smaller (compared to the original vector based approach without the internal storage), as for smaller sizes the vector won't be actually used. G©zaThank you for the summary, that makes sense. Is there a standard corpus of JSON documents that you use to benchmark the code? That would be very helpful, I think, since Eli correctly points out JSON parsing performance is critical.
I'm not aware of such a corpus. When I developed the new JSON parser, the performance difference was so large so it was obvious that the new parser is faster. But I did benchmarks on JSONs which was generated by LSP communication (maybe I can share this one, if there is interest, but I need to anonymize it first), and also I did a benchmark on all the JSONs I found on my computer.
But this time, the performance difference is expected to be smaller, using lisp vectors shouldn't have a very large effect on performance. I'd check the performance with small JSONs, but large enough ones where the (non-internal) object_workspace is actually get used (make sure to run a lot of iterations, so the amortized GC time will be included in the result). For larger JSONs, we shouldn't have a difference, as all the other allocations (which store the actual result of the parsing) should hide the additional lisp vector allocation cost. At least, this is my theory.
object_workspace is only grown once for the lifetime of one parsing. Once it is grown to the needed size, the only extra cost when parsing a value is to copy the data to its final place from the object_workspace. Truncate based solution does the same copy, but it also needs to grow the hashtable/array for each value, so it executes more allocations and copies than the current solution. So I'd prefer if we kept object_workspace. If the only solution is to convert it to a lisp vector, then I think we should do that. But again, this is just my theory. If we try the truncate based solution, and if it turns out that it's not significantly slower, then it can be a good solution as well.My gut feeling is that we should get rid of the object_workspace entirely, instead modifying the general Lisp code to avoid performance issues (and sacrifice some memory in the process, on most systems).
[Prev in Thread] | Current Thread | [Next in Thread] |