emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JSON/YAML/TOML/etc. parsing performance


From: Eli Zaretskii
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Tue, 03 Oct 2017 19:26:53 +0300

> From: Philipp Stephani <address@hidden>
> Date: Tue, 03 Oct 2017 15:52:03 +0000
> Cc: address@hidden
> 
> > > The callbacks (especially insert and before-/after-change-hook) can exit
> > > nonlocally, but these nonlocal exits may not escape the Jansson callback.
> > > Therefore all nonlocal exits must be caught here.
> >
> > Why can't you use record_unwind_protect, as we normally do in these
> > situations?
> >
> 
> How would that help? record_unwind_protect can't stop nonlocal exits.

Why do you need to stop them?  Maybe that's the part I didn't
understand: I thought you needed that to avoid leaking resources.
This is what record_unwind_protect is for.

> > > Maybe, but it's probably not worth it because I don't think we have many
> > > architectures where PTRDIFF_MAX > SIZE_MAX.
> >
> > Then why do we punish all the platforms with this runtime check?
> >
> 
> If you think this cannot happen we can turn it into a runtime or
> compile-time assertion.

That's fine with me.  (You meant eassert, right?  Because it is not a
compile-time assertion.)

> > > Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C
> > > standard guarantees PTRDIFF_MAX <= SIZE_MAX.
> >
> > I wasn't talking about PTRDIFF_MAX, I was talking about 'size', which
> > is the number of bytes in a Lisp string.  Since that Lisp string is a
> > valid Lisp object, how can its size be greater than SIZE_MAX?  I don't
> > think there's a way of creating such a Lisp string in Emacs, because
> > functions that allocate memory for strings will prevent that.
> 
> Then I think we should at least add an assertion to document this.

OK.

> > Yes, but what problems can cause these return value to be invalid?
> > AFAICT, only out-of-memory conditions, and that can be checked only
> > once, there's no need to check every single allocation, because once
> > an allocation fails, all the rest will too.
> 
> But if the first succeeds, the second can still fail, so we do need to
> check all of them.

Only the last of them.

> > > > I don't think we need this test here, as 'insert' already has the
> > > > equivalent test in one of its subroutines.
> > >
> > > It can't, because it takes the byte length as ptrdiff_t. We need to check
> > > before whether the size is actually in the valid range of ptrdiff_t.
> >
> > I'm sorry, but I don't see why we should support such exotic
> > situations only for this one feature.  In all other cases we use
> > either ptrdiff_t type or EMACS_INT type, and these issues disappear
> > then.  Trying to support the SIZE_MAX > PTRDIFF_MAX situation causes
> > the code to be much more complicated, harder to maintain, and more
> > expensive at run time than it should be.
> 
> We can't avoid these checks. The API returns size_t, so we can only assume
> that the numbers are in the valid range of size_t, which is larger than the
> ones for positive ptrdiff_t's. There's no way around this.

We do this everywhere else.

We can discuss whether the low-level subroutines which allocate memory
for these objects should be modified to accept a size_t arguments
instead of ptrdiff_t or EMACS_INT, and then make these tests in those
subroutines.  But doing these tests on the level of your code, and
only in those few functions, makes absolutely no sense to me.  Either
these problems are important for Emacs's stability, or they aren't.
Considering them important only in a small number of places means we
are confused about our own code.

> > I'm not even sure there are
> > such platforms out there that Emacs supports,
> 
> All platforms that I know of have SIZE_MAX > PTRDIFF_MAX.

Ha-ha, very funny.  I meant size_t is wider than ptrdiff_t.

> > but if there are, we
> > already have a gazillion problems like that all over our code.
> 
> Just because other parts of the codebase are buggy doesn't mean we need to
> introduce more bugs in new code.

If we consider them bugs, we should fix them, instead of adding
wrappers around the most basic functions in our arsenal.

> > I object to having such code just for this reason, sorry.
> 
> We can't avoid it.

On the level of this functionality, we certainly can and should.

> > > It can't, because json_int_t can be larger than EMACS_INT.
> >
> > OK, but then I think we should consider returning a float value, or a
> > cons of 2 integers.  If these situations are frequent enough, users
> > will thank us, and if they are very infrequent, they will never see
> > such values, and we gain code simplicity and less non-local exits.
> 
> Returning a float (using make_natnum_or_float) might work, but in the end
> I've decided against it because it could silently drop precision. I think
> that's worse than signaling an error.

It cannot be worse, because we already do this elsewhere.  And this
situation is infrequent enough to be unimportant.  Signaling an error,
OTOH, disrupts what could be a valid program, so I cannot agree that
it's better.  The corresponding Lisp implementation certainly doesn't
signal an error in this case, right?

> > > And once again, we need to check at least whether the size fits into
> > > ptrdiff_t.
> >
> > No, we don't, as we don't in other similar cases.
> 
> I don't understand why you think these checks aren't necessary. Converting
> between integral types when the number is out of range for the destination
> type results in an implementation-defined result, i.e. it's unportable.

I'm saying that this code is the wrong place for doing these checks.
We can discuss whether these checks are needed in general, and if we
agree they are, we should change all the related allocation
subroutines to do that there.

> > > > Likewise here: Fmake_vector makes sure the size is not larger than
> > > > allowed.
> > >
> > > Same as above: It can't.
> >
> > It can and it does.
> 
> No, it can't. make_natnum takes a ptrdiff_t argument, and when passing a
> value that's out of range for ptrdiff_t, it will receive an incorrect,
> implementation-defined value.

Even if that can happen (and I'm not sure it can) that is already
happening in all the other cases where we call make_natnum and
Fmake_vector.

> > > > Likewise here: make_natnum does the equivalent test.
> > >
> > > It doesn't and can't.
> >
> > Yes, it does:
> >
> >   INLINE Lisp_Object
> >   make_natnum (EMACS_INT n)
> >   {
> >     eassert (0 <= n && n <= MOST_POSITIVE_FIXNUM);  <<<<<<<<<<<<<<<
> >     EMACS_INT int0 = Lisp_Int0;
> >
> 
> We're not talking about the same thing. What if make_natnum is called with
> a value that doesn't fit in EMACS_INT?

How can that happen?  If that's because you pass it a wrongly typed
value, we should fix that type.

> Also an assertion is incorrect here because the overflowing value comes
> from user data.

Once again, we do that all the time, and this code is not the right
place for making such checks, IMO.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]