emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JSON/YAML/TOML/etc. parsing performance


From: Philipp Stephani
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Tue, 03 Oct 2017 12:26:32 +0000



Eli Zaretskii <address@hidden> schrieb am So., 1. Okt. 2017 um 20:06 Uhr:
> From: Philipp Stephani <address@hidden>
> Date: Sat, 30 Sep 2017 22:02:55 +0000
> Cc: address@hidden
>
> Subject: [PATCH] Implement native JSON support using Jansson

Thanks, a few more comments/questions.

> +#if __has_attribute (warn_unused_result)
> +# define ATTRIBUTE_WARN_UNUSED_RESULT __attribute__ ((__warn_unused_result__))
> +#else
> +# define ATTRIBUTE_WARN_UNUSED_RESULT
> +#endif

Hmm... why do we need this attribute?  You use it with 2 static
functions, so this sounds like a left-over from the development stage?

It's not strictly needed (and if you don't like it, I can remove it), but it helps catching memory leaks.
 

> +static Lisp_Object
> +internal_catch_all_1 (Lisp_Object (*function) (void *), void *argument)

Can you tell why you needed this (and the similar internal_catch_all)?
Is that only because the callbacks could signal an error, or is there
another reason?  If the former, I'd prefer to simplify the code and
its maintenance by treating the error condition in a less drastic
manner, and avoiding the call to xsignal.

The callbacks (especially insert and before-/after-change-hook) can exit nonlocally, but these nonlocal exits may not escape the Jansson callback. Therefore all nonlocal exits must be caught here.
 

> +static _GL_ARG_NONNULL ((2)) Lisp_Object
> +lisp_to_json_toplevel_1 (Lisp_Object lisp, json_t **json)
> +{
> +  if (VECTORP (lisp))
> +    {
> +      ptrdiff_t size = ASIZE (lisp);
> +      eassert (size >= 0);
> +      if (size > SIZE_MAX)
> +        xsignal1 (Qoverflow_error, build_string ("vector is too long"));

I think this error text is too vague.  Can we come up with something
that describes the problem more accurately?

Maybe, but it's probably not worth it because I don't think we have many architectures where PTRDIFF_MAX > SIZE_MAX. 
 

And btw, how can size be greater than SIZE_MAX in this case?  This is
a valid Lisp object, isn't it?  (There are more such tests in the
patch, e.g. in lisp_to_json, and I think they, too, are redundant.)

Depends on the range of ptrdiff_t and size_t. IIUC nothing in the C standard guarantees PTRDIFF_MAX <= SIZE_MAX. If we want to guarantee that, we should probably add at least a static assertion.
 

> +      *json = json_check (json_array ());
> +      ptrdiff_t count = SPECPDL_INDEX ();
> +      record_unwind_protect_ptr (json_release_object, json);
> +      for (ptrdiff_t i = 0; i < size; ++i)
> +        {
> +          int status
> +            = json_array_append_new (*json, lisp_to_json (AREF (lisp, i)));
> +          if (status == -1)
> +            json_out_of_memory ();
> +          eassert (status == 0);
> +        }
> +      eassert (json_array_size (*json) == size);
> +      clear_unwind_protect (count);
> +      return unbind_to (count, Qnil);

This, too, sounds more complex than it should: you record
unwind-protect just so lisp_to_json's subroutines could signal an
error due to insufficient memory, right?  Why can't we have the
out-of-memory check only inside this loop, which you already do, and
avoid the checks on lower levels (which undoubtedly cost us extra
cycles)?  What do those extra checks in json_check buy us? the errors
they signal are no more informative than the one in the loop, AFAICT.

I don't understand what you mean. We need to check the return values of all functions if we want to to use them later.
 

> +static Lisp_Object
> +json_insert (void *data)
> +{
> +  const struct json_buffer_and_size *buffer_and_size = data;
> +  if (buffer_and_size->size > PTRDIFF_MAX)
> +    xsignal1 (Qoverflow_error, build_string ("buffer too large"));
> +  insert (buffer_and_size->buffer, buffer_and_size->size);

I don't think we need this test here, as 'insert' already has the
equivalent test in one of its subroutines.

It can't, because it takes the byte length as ptrdiff_t. We need to check before whether the size is actually in the valid range of ptrdiff_t.
 

> +    case JSON_INTEGER:
> +      {
> +        json_int_t value = json_integer_value (json);
> +        if (FIXNUM_OVERFLOW_P (value))
> +          xsignal1 (Qoverflow_error,
> +                    build_string ("JSON integer is too large"));
> +        return make_number (value);

This overflow test is also redundant, as make_number already does it.

It can't, because json_int_t can be larger than EMACS_INT. Also, make_number doesn't contain any checks.
 

> +    case JSON_STRING:
> +      {
> +        size_t size = json_string_length (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error, build_string ("JSON string is too long"));
> +        return json_make_string (json_string_value (json), size);

Once again, the overflow test is redundant, as make_specified_string
(called by json_make_string) already includes an equivalent test.

And once again, we need to check at least whether the size fits into ptrdiff_t.
 

> +    case JSON_ARRAY:
> +      {
> +        if (++lisp_eval_depth > max_lisp_eval_depth)
> +          xsignal0 (Qjson_object_too_deep);
> +        size_t size = json_array_size (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error, build_string ("JSON array is too long"));
> +        Lisp_Object result = Fmake_vector (make_natnum (size), Qunbound);

Likewise here: Fmake_vector makes sure the size is not larger than
allowed.

Same as above: It can't.
 

> +    case JSON_OBJECT:
> +      {
> +        if (++lisp_eval_depth > max_lisp_eval_depth)
> +          xsignal0 (Qjson_object_too_deep);
> +        size_t size = json_object_size (json);
> +        if (FIXNUM_OVERFLOW_P (size))
> +          xsignal1 (Qoverflow_error,
> +                    build_string ("JSON object has too many elements"));
> +        Lisp_Object result = CALLN (Fmake_hash_table, QCtest, Qequal,
> +                                    QCsize, make_natnum (size));

Likewise here: make_natnum does the equivalent test.

It doesn't and can't.
 

> +    /* Adjust point by how much we just read.  Do this here because
> +       tokener->char_offset becomes incorrect below.  */
> +    bool overflow = INT_ADD_WRAPV (point, error.position, &point);
> +    eassert (!overflow);
> +    eassert (point <= ZV_BYTE);
> +    SET_PT_BOTH (BYTE_TO_CHAR (point), point);

It's better to use SET_PT here, I think.

That's not possible because we don't have the character offset. (And I think using SET_PT (BYTE_TO_CHAR (point)) would just require needlessly recalculating point.)
 

> +  define_error (Qjson_out_of_memory, "no free memory for creating JSON object",

I'd prefer "not enough memory for creating JSON object".


Done. 

reply via email to

[Prev in Thread] Current Thread [Next in Thread]