emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: JSON/YAML/TOML/etc. parsing performance


From: Eli Zaretskii
Subject: Re: JSON/YAML/TOML/etc. parsing performance
Date: Tue, 19 Sep 2017 22:09:06 +0300

> From: Philipp Stephani <address@hidden>
> Date: Tue, 19 Sep 2017 08:18:14 +0000
> 
> Here's a newer version of the patch. The only significant difference is that 
> now the Lisp values for JSON null
> and false are :null and :false, respectively. Using a dedicated symbol for 
> :null reduces the mental overhead of
> the triple meaning of nil (null, false, empty list), and is more 
> future-proof, should we ever want to support lists. 

Thanks, a few comments below.

> +static _Noreturn void
> +json_parse_error (const json_error_t *error)
> +{
> +  xsignal (Qjson_parse_error,
> +           list5 (build_string (error->text), build_string (error->source),
> +                  make_natnum (error->line), make_natnum (error->column),
> +                  make_natnum (error->position)));
> +}

I think error->source could include non-ASCII characters, in which
case you need to use make_specified_string with its last argument
non-zero, not build_string, which has its own ideas about when to
produce a multibyte string.

> +static  _GL_ARG_NONNULL ((2)) Lisp_Object
> +lisp_to_json_1 (Lisp_Object lisp, json_t **json)
> +{
> +  if (VECTORP (lisp))
> +    {
> +      ptrdiff_t size = ASIZE (lisp);
> +      eassert (size >= 0);
> +      if (size > SIZE_MAX)
> +        xsignal1 (Qoverflow_error, build_pure_c_string ("vector is too 
> long"));

I don't think you can allocate pure storage at run time, only at dump
time.  (There are more of this elsewhere in the patch.)

> +  /* LISP now must be a vector or hashtable.  */
> +  if (++lisp_eval_depth > max_lisp_eval_depth)
> +    xsignal0 (Qjson_object_too_deep);

This error could mislead: the problem could be in the nesting of
surrounding Lisp being too deep, and the JSON part could be just fine.

> +  Lisp_Object string
> +    = make_string (buffer_and_size->buffer, buffer_and_size->size);

This is arbitrary text, so I'm not sure make_string is appropriate.
Could the text be a byte stream, i.e. not human-readable text?  If so,
do we want to create a unibyte string or a multibyte string here?

> +  insert_from_string (string, 0, 0, SCHARS (string), SBYTES (string), false);

Hmmm... if you want to insert the text into the buffer, you need to
make sure it has the right representation.  What kind of text is this?
It probably should be decoded.

In any case, going through a string sounds gross.  You should insert
the text directly into the gap, like we do in a couple of places
already.  See insert_from_gap and its users, and maybe also
decode_coding_gap.

> +DEFUN ("json-parse-string", Fjson_parse_string, Sjson_parse_string, 1, 1, 
> NULL,
> +       doc: /* Parse the JSON STRING into a Lisp object.
> +This is essentially the reverse operation of `json-serialize', which
> +see.  The returned object will be a vector or hashtable.  Its elements
> +will be `:null', `:false', t, numbers, strings, or further vectors and
> +hashtables.  If there are duplicate keys in an object, all but the
> +last one are ignored.  If STRING doesn't contain a valid JSON object,
> +an error of type `json-parse-error' is signaled.  */)
> +  (Lisp_Object string)
> +{
> +  ptrdiff_t count = SPECPDL_INDEX ();
> +  check_string_without_embedded_nulls (string);
> +
> +  json_error_t error;
> +  json_t *object = json_loads (SSDATA (string), 0, &error);

Doesn't json_loads require the string to be encoded in some particular
encoding?  If so, passing it our internal representation might not be
TRT.

> +  /* First, parse from point to the gap or the end of the accessible
> +     portion, whatever is closer.  */
> +  ptrdiff_t point = d->point;
> +  ptrdiff_t end;
> +  {
> +    bool overflow = INT_ADD_WRAPV (BUFFER_CEILING_OF (point), 1, &end);
> +    eassert (!overflow);
> +  }
> +  size_t count;
> +  {
> +    bool overflow = INT_SUBTRACT_WRAPV (end, point, &count);
> +    eassert (!overflow);
> +  }

Why did you need these blocks in braces?

> +(provide 'json-tests)
> +;;; json-tests.el ends here

IMO, it would be good to test also non-ASCII text in JSON objects.

Finally, this needs documentation: NEWS and the ELisp manual.

Thanks again for working on this.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]