emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I created a faster JSON parser


From: Eli Zaretskii
Subject: Re: I created a faster JSON parser
Date: Fri, 08 Mar 2024 21:57:28 +0200

> From: Herman, Géza <geza.herman@gmail.com>
> Cc: geza.herman@gmail.com, emacs-devel@gnu.org
> Date: Fri, 08 Mar 2024 19:34:28 +0100
> 
> > If you want to use a 32-bit type, use 'int' or 'unsigned int'.
> I want to use a datatype which is likely to have the same size as 
> a CPU register. On 32-bit machines, it should be 32-bit, on 64-bit 
> architectures it should be 64-bit.

Is there a reason for you to want it to be 64-bit type on a 64-bit
machine?  If the only bother is efficiency, then you can use 'int'
without fear.  But if a 64-bit machine will need the range of values
beyond INT_MAX (does it?), then I suggest to use ptrdiff_t.

> This is not a strong requirement, but it makes the parser faster.

Are you sure?  AFAIK, 32-bit arithmetics on a 64-bit machine is as
fast as 64-bit arithmetics.  So if efficiency is the only
consideration, using 'int' is OK.

> During number parsing, this part of the code calculates the value of
> the number. If it overflows, or it ends up being a float, then the
> workspace buffer is re-parsed to find the value. But if it is an
> integer without overflow, the parser just can use the value without
> any further processing (a certain kind of LSP message is basically a
> large array of integers, so it makes sense to optimize this case -
> this optimization makes parsing such messages 1.5-2x faster).

If the value needs to fit in a fixnum, use EMACS_INT, which is a type
that already takes the 32-but vs 64-bit nature of the machine into
consideration.

> > We need to decide what to do with characters outside of the
> > Unicode range.  The encoding of those characters is different,
> > btw, it doesn't follow the UTF-8 scheme.
> >
> > In any case, the error in those cases cannot be just "json parse
> > error", it must be something more self-explanatory.
> I am open to suggestions, as even after reading "Text 
> Representations" in the manual, I have no idea what to do here.

We should begin by deciding whether we want to support characters
outside of the Unicode range.  The error message depends on that
decision.

> I didn't find any code which handles this case in the jansson parser
> either.

The jansson code required encoding/decoding strings to make sure we
submit to jansson text that is always valid UTF-8.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]